Elementary Statistics

  1. Individuals
    • objects described by a set of data
    • people, animals, things
  2. Variable
    • any characteristic of an individual
    • can take different values for different individuals
  3. Categorical Variable
    places an individual into one of several groups or categories
  4. Quantitative Variable
    • takes numerical values for which artihmetic operations like adding and averaging make sense
    • usually recorded in a unit of measurement
  5. Distribution of a Variable
    what values it takes and how often it takes these values
  6. Distribution of a categorical  Variable
    lists categories and gives count or % of individuals who fall in each category
  7. Roundoff error
    don't point to mistakes but to effects of rounding off results
  8. Pie Charts
    • show distribution of a categorical variable as pie slices
    • must include all the categories that make up the whole
    • use pie chart only when you want to emphasize each categories relation to whole
  9. Bar graphs
    • represent each category as a bar
    • bar height shows category counts or %'s
    • can compare any set quantities measured in the same units
  10. Histogram
    Most common graph of the distribution of one quantitative variable
  11. Shape of a distribuition
    • symmetric-right and left side are roughly mirror images
    • Skewed to the right-right side of histogram extends much farther than the left
    • Skewed to the left- left side of histogram extends much farther than the right
  12. Center of a distribuition
    midpoint of distribuition
  13. Spread of a distribuition
    The range of a distribuition between the smallest value to the largest
  14. Stemplot
    • like a histogram turned on end using #'s
    • cannot chose classes they are given to you
    • preserves actual value of each observation
    • do not work well with large data sets
  15. stems and leaves
    • Parts of a stemplot
    • stems-all but the final (rightmost) digit
    • leaf- the final digit
  16. Split stems
    • double the # of stems when all the leaves would fall on just a few stems
    • each stem appears twice
    • 0 to 4 go on upper stem, 5 to 9 go on lower stem
  17. Timeplot
    • plots each observation of a variable against the time at which it was measured
    • Always put time on horizontal
    • always put variable you are measuring on vertical
  18. Cycles
    regular up and down movements in a timeplot
  19. Trend
    long term up or down movement over time
  20. Time series data
    • change in one variable at a specific location over time
    • timeplot
  21. Cross sectional data
    displays a variable at many locations at the same time
  22. Exploratory data analysis
    using graphs and and numerical summaries to describe the variables in a data set and the relations among them
  23. mean is not a resistant measure of center
    • mean is sensitive to a few extreme observations
    • also sensitive to skewed distribution without outliers pulled towards tail
  24. Median M
    • midpoint of distribution¬†
    • such that half the observations are smaller and half larger
  25. five number summary
    • gives the smallest and largest as well the median and 1st and 3rd Q
    • Minimum Q1 M Q3 Maximum
  26. Boxplot
    • central box spans Q1 andQ3
    • line in box marks median M
    • lines extend from box out to smallest and largest observations
    • best used for side by side comparison of more than one distribution
  27. IQR
    • Interquartile range
    • distance between Q1 and Q3
    • IQR = Q3 - Q1
  28. 1.5 x IQR rule for outliers
    call an observation an outlier if it is more than 1.5 x IQR above Q3 or below Q1
Card Set
Elementary Statistics
Elementary Statistics