1. individuals
    • Individuals can be people,
    • animals, plants, or any object of interest.
  2. variable
    • any characteristic of an
    • individual. A variable varies
    • among individuals.
  3. distribution
    • tells us what values the
    • variable takes and how often it takes these values.
  4. quantitative variable
    • Something that takes
    • numerical values for which arithmetic operations, such as adding and averaging,
    • make sense.
  5. categorical variable
    • Something that falls into
    • one of several categories. What can be counted is the count or proportion of
    • individuals in each category
  6. Ways to chart categorical data
    Bar graphs and pie charts
  7. Bar graphs
    • Each category is
    • represented by one bar. The bar’s height shows the count (or sometimes the
    • percentage) for that particular category
  8. Pie charts
    • Each slice represents a piece of one whole. The size of
    • a slice depends on what percent of the whole this category represents.
  9. Ways to chart quantitative data
    Histograms and stemplots, and Line graphs: time plots
  10. Line graphs: time plots
    • Use when there is a
    • meaningful sequence, like time. The line connecting the points helps emphasize
    • any change over time
  11. Histograms and stemplots
    • These are summary graphs
    • for a single variable. They are very useful to understand the pattern of
    • variability in the data
  12. Histograms
    • - The range of values that a
    • variable can take is divided into equal size intervals.
    • - The histogram shows the
    • number of individual data points that fall in each interval.
  13. stem plots
    • -To compare two related distributions, a back-to-back stem plot with common
    • stems is useful.
    • -Stem plots do not work well for large
    • datasets.
    • -When the observed values have too many
    • digits, trim the numbers before making
    • a stem plot.
    • -When plotting a moderate number of
    • observations, you can split each
    • stem.
  14. Interpreting histograms
    We can describe the overall pattern of a histogram by its shape, center, and spread.
  15. A distribution is symmetric if ...
    • the right and left sides
    • of the histogram are approximately mirror images of each other.
  16. A distribution is skewed to the right
    • if the right side of
    • the histogram (side with larger values) extends much farther out than the left
    • side
  17. skewed to the left
    • if the left side of
    • the histogram extends much farther out than the right side
  18. An important kind of
    deviation is an outlier
    • Outliers are observations that lie outside the overall
    • pattern of a distribution
  19. A trend is
    a rise or fall that persists over time, despite small irregularities.
  20. seasonal variation
    A pattern that repeats itself at regular intervals of time
  21. mean
    • add all values, then divide by the number of individuals. It is the “center of
    • mass.”
  22. median
    • the midpoint of a
    • distribution—the number such that half of the observations are smaller and half are larger
  23. Comparing the mean and the median
    • The
    • mean and the median are the same only if the distribution is symmetrical. The
    • median is a measure of center that is resistant to skew and outliers. The mean
    • is not
  24. first quartile, Q1
    • the value
    • in the sample that has 25% of the data at or below it
  25. third quartile, Q3
    • is the
    • value in the sample that has 75% of the data at or below it
  26. “1.5 * IQR rule for outliers
    • if it falls more than 1.5
    • times the size of the interquartile range (IQR) above the first quartile or
    • below the third quartile
  27. variance s2.
  28. standard
    deviation s.
    • -s measures spread about the mean and should be
    • used only when the mean is the measure of center.
    • -s = 0 only when all observations have the same
    • value and there is no spread.
    • Otherwise, s > 0.
    • -s is not resistant to outliers.
    • -s has the same units of measurement as the
    • original observations.
  29. linear transformation
    • do not change the basic shape of a distribution (skew, symmetry,
    • multimodal). But they do change the measures of center and spread:
  30. density curve
    • The
    • total area under the curve, by definition, is equal to 1, or 100%.

    • The area under the
    • curve for a range of values is the proportion of all observations for that
    • range
  31. median of a density curve is
    the equal-areas point
    • the
    • point that divides the area under the curve in half.
  32. mean of a density curve is
    the balance point
    at which the curve would balance if it were made of solid material.
  33. Normal – or Gaussian –
    • a family of symmetrical,
    • bell-shaped density curves defined by a mean m (mu) and a standard deviation
    • s (sigma) : N(m,s).
  34. z-score
    • measures the number of standard deviations that a data value x
    • is from the mean m.
Card Set
Stats Chapter 1