# Stats

 individuals Individuals can be people,animals, plants, or any object of interest. variable any characteristic of anindividual. A variable variesamong individuals. distribution tells us what values thevariable takes and how often it takes these values. quantitative variable Something that takesnumerical values for which arithmetic operations, such as adding and averaging,make sense. categorical variable Something that falls intoone of several categories. What can be counted is the count or proportion ofindividuals in each category Ways to chart categorical data Bar graphs and pie charts Bar graphs Each category isrepresented by one bar. The bar’s height shows the count (or sometimes thepercentage) for that particular category Pie charts Each slice represents a piece of one whole. The size ofa slice depends on what percent of the whole this category represents. Ways to chart quantitative data Histograms and stemplots, and Line graphs: time plots Line graphs: time plots Use when there is ameaningful sequence, like time. The line connecting the points helps emphasizeany change over time Histograms and stemplots These are summary graphsfor a single variable. They are very useful to understand the pattern ofvariability in the data Histograms - The range of values that avariable can take is divided into equal size intervals. - The histogram shows thenumber of individual data points that fall in each interval. stem plots -To compare two related distributions, a back-to-back stem plot with commonstems is useful.-Stem plots do not work well for largedatasets.-When the observed values have too manydigits, trim the numbers before makinga stem plot.-When plotting a moderate number ofobservations, you can split eachstem. Interpreting histograms We can describe the overall pattern of a histogram by its shape, center, and spread. A distribution is symmetric if ... the right and left sidesof the histogram are approximately mirror images of each other. A distribution is skewed to the right if the right side ofthe histogram (side with larger values) extends much farther out than the leftside skewed to the left if the left side ofthe histogram extends much farther out than the right side An important kind of deviation is an outlier Outliers are observations that lie outside the overallpattern of a distribution A trend is a rise or fall that persists over time, despite small irregularities. seasonal variation A pattern that repeats itself at regular intervals of time mean add all values, then divide by the number of individuals. It is the “center ofmass.” median the midpoint of adistribution—the number such that half of the observations are smaller and half are larger Comparing the mean and the median Themean and the median are the same only if the distribution is symmetrical. Themedian is a measure of center that is resistant to skew and outliers. The meanis not first quartile, Q1 the valuein the sample that has 25% of the data at or below it third quartile, Q3 is thevalue in the sample that has 75% of the data at or below it “1.5 * IQR rule for outliers if it falls more than 1.5times the size of the interquartile range (IQR) above the first quartile orbelow the third quartile variance s2. ﻿ standard deviation s. -s measures spread about the mean and should beused only when the mean is the measure of center.-s = 0 only when all observations have the samevalue and there is no spread.Otherwise, s > 0.-s is not resistant to outliers. -s has the same units of measurement as theoriginal observations. linear transformation do not change the basic shape of a distribution (skew, symmetry,multimodal). But they do change the measures of center and spread: density curve Thetotal area under the curve, by definition, is equal to 1, or 100%. The area under thecurve for a range of values is the proportion of all observations for thatrange median of a density curve is the equal-areas point thepoint that divides the area under the curve in half. mean of a density curve is the balance point at which the curve would balance if it were made of solid material. Normal – or Gaussian – distributions a family of symmetrical,bell-shaped density curves defined by a mean m (mu) and a standard deviations (sigma) : N(m,s). z-score measures the number of standard deviations that a data value xis from the mean m. AuthorAnonymous ID6102 Card SetStats DescriptionStats Chapter 1 Updated2010-02-05T07:21:20Z Show Answers