AP Stats: Chapter 1

  1. statistics
    • getting information out of numerical data gotten from an experiment or from a sample
    • creating the experiment or sampling procedure, collecting and analyzing data, and making inferences (statements) about the population
  2. descriptive statistics
    methods for organizing, displaying, and describing data by using tables, graphs, and summary measures
  3. inferential statistics
    methods that use sample results to help make inferences (decisions or predictions) about a population
  4. data analysis
    process of describing data using graphs and numerical summaries
  5. individuals
    objects described by a set of data; may be people, animals, or things
  6. variables
    any characteristic of an individual
  7. categorical variable
    places an individual into one of several groups or categories; can be numerical in some cases (zip codes, classes of age)
  8. quantitative variable
    takes numerical values for which it makes sense to find an average, should always specify the unit
  9. distribution
    tells what values a variable takes and how often it takes these values
  10. inference
    drawing conclusions that go beyond the data at hand
  11. frequency table
    displays the count (frequency) of observations in each category or class
  12. relative frequency table
    shows the percents (relative frequencies) of observations in each category or class
  13. roundoff error
    the difference between the calculated approximation of a number and its exact mathematical value
  14. pie chart
    • shows the distribution of a categorical variable as a "pie" whose slices are sized by the counts or percents for the categories
    • must include all of the categories that make up the whole
  15. when can you not use pie charts
    • if you don't have all the categories that make up the whole
    • if you're dealing with individuals that represent a category (e.g. 10-12yrs) since those are different groups, not part of a whole
  16. bar graph
    used to display the distribution of categorical variable or to compare the sizes of different quantities. The categories or quantities being compared is on the horizontal axis. Has blank spaces between the bars.
  17. how can graphs be misleading
    • bars with different widths
    • x-axis and y-axis intervals
  18. two-way table
    table of counts that organizes data about two categorical variables
  19. marginal distribution
    • distribution of values in one of the categorical variables in a two-way table among all of the individuals described in the table
    • in a two-way table, calculating percentages of the distribution of one variable
    • say nothing about the relationship between two variables
  20. conditional distribution
    • describes the values of one variable among individuals who have a specific value of another variable
    • percentage of distribution calculated between the two variables in a two-way table
  21. segmented bar graph
    • compares the distribution of a categorical variable in each of several groups. There is a bar for each group with segments that correspond to the different values of the categorical variable.
    • height of each segment is determined by the percent of individuals in the group with that value, each bar has a total height of 100%
  22. four steps to answer a statistics problem
    • STATE the question you want to answer
    • PLAN how you will answer the question and which statistical techniques the problem requires
    • DO make graphs and calculate stuff
    • CONCLUDE be practical given the setting of the real-world problem
  23. side by side bar graph
    • used to compare the distribution of a categorical variable in each of several groups. There is a bar corresponding to each group for each categorical variable.
    • height of each bar is determined by the count or percent of individuals in the group with that value
  24. association
    occurs between two variables if specific values of one variable tend to occur in common with specific values of the other
  25. qualitative data
    values of categorical data
  26. dotplot
    a simple graph that shows each data value as a dot above its location on a number line
  27. overall pattern
    • in any graph of data, this can be describes by the direction, form, and strength of the relationship
    • SOCS: shape, outliers, center, and spread
  28. center
    the midpoint/median represents the typical value, and the calculated mean is the average
  29. spread
    indicates the variability of the data, includes the maximum and minimum values and the range
  30. range
    maximum-minimum values
  31. outlier
    an observation that lies outside the overall pattern of other observations
  32. residuals
    in outliers, residuals are present if outliers are outliers in the y direction but not the x direction
  33. shape
    • peaks (modes) and the number of which
    • skewed results or symmetry
    • number of clusters + gaps
  34. mode
    the value or class in a statistical distribution having the greatest frequency
  35. unimodal
    describes a graph of quantitative data with a single peak
  36. bimodal
    describes a graph of quantitative data with two clear peaks
  37. multimodal
    describes a graph of quantitative data with more than two clear peaks
  38. symmetry
    left and right sides of the graph are approximately mirror images of each other
  39. skewed to the right
    right side of the graph is much longer than the left side, tail is extended to the right
  40. skewed to the left
    left side of the graph is much longer than the right side, tail is on the left
  41. stemplot
    observations are separated into stems (numbers that have all but final digit) and leaves (the final digit), arranged in a vertical column with increasing order out of the stem (down)
  42. splitting stems
    • a method for spreading out a stemplot that has too few stems
    • should use asterisks (e.g. 5* and 5**)
  43. back-to-back stemplot
    used to compare the distribution of a quantitative variable for two groups, one variable is a leaf on one side of the stem and the other variable is a separate leaf on the other side of the stem
  44. truncate
    removing one or more digits from a value if it has too many digits, like in creating stemplots
  45. histogram
    type of bar graph without spaces that displays the class/relative frequency of a quantitative variable; horizontal axis shows the classes of the variable, vertical axis has the scale of counts/percents; do not preserve raw data because it has been grouped into classes
  46. time plots
    used to show bivariate (2-variable quantitative data) where the independent variable (x) represents time
  47. independent/dependent variable on graph axes
    • dependent=y-axis
    • independent=x-axis
  48. mean formula
    Image Upload 1
  49. mean
    arithmetic average, non-resistant measure, represents size of observations if they were equally split among all observations
  50. resistant measure
    statistic that is not affected very much by extreme observations
  51. median
    midpoint M of a distribution, half the observations are smaller than this and half are larger, represents typical value, resistant measure
  52. median position formula
    Image Upload 2

    • n=# observations in data set
    • after arranging data in increasing order, move this number inward to find median
  53. mean > median
    right skewed
  54. mean = median
  55. mean < median
    left skewed
  56. mode
    value that occurs the most
  57. 68-95-99.7 Rule aka Empirical Rule
    in a bell-shaped distribution, 68% of the data lies within one standard deviation of the mean, 95% lies within two standard deviations of the mean, and 99.7% lies within three standard deviations of the mean
  58. interquartile range (IQR)
    • measures the range of the middle 50% of the data, resistant measure
    • IQR= Q3-Q1
  59. first quartile
    median of observations to the left of the median
  60. third quartile
    median of observations to the right of the median
  61. percentile implication
    95th percentile means that 95% of the population got that score or lower
  62. IQR rule for calculating outliers
    an observation is an outlier if it falls more than 1.5 x IQR above the third quartile or below the first quartile
  63. how to use IQR to calculate bottom cutoff value
    Q1-1.5 x IQR
  64. how to use IQR to calculate top cutoff value
    Q3+1.5 x IQR
  65. standard deviation
    • measure of spread that looks out how far observations are from the mean, typical scores are found above and below the standard deviation of the mean, non-resistant measure
    • standard deviation of 0 indicates no variability, greater when observations are more spread out
  66. degrees of freedom
    (n-1) observations
  67. variance
    Sxthe average squared distance of the observations in a data set from their mean
  68. standard deviation formula
    Image Upload 3
  69. variance formula
    Image Upload 4
  70. how to calculate variance and standard deviation
    • find mean of data, find the deviations of the observations from the mean, square these, and add them up, then divide by degrees of freedom (n-1) observations to find the variance
    • to find standard deviation, take the square root of variance
  71. five-number summary
    • minimum, first quartile, median, third quartile, maximum
    • gives a summary of both center and spread, roughly divides the distribution into quarters
  72. boxplot
    graphs the five-number summary, box spans the quartiles and whiskers extend to the min/max values, center line represents median
  73. modified boxplots
    boxplots that always show the outliers as dots
  74. side-by-side boxplots
    show the boxplots next to each other using the same scale, used to compare distributions of two data sets
  75. detecting skewedness in boxplots
    the longer whisker is where the distribution is skewed, a larger difference in lengths means a more strongly skewed distribution
  76. detecting range and IQR in boxplots
    range is represented by full length of boxplot, IQR is represented by length of box
  77. options for measuring center and spread, resistant or non-resistant
    • median and IQR are resistant, use when analyzing skewed data and/or outliers
    • average and standard deviation are non-resistant and sensitive to skewed results and outliers
  78. sigma
    Σ represents a summation, "add them up"
  79. index
    variable i
  80. lower limit and upper limit
    the numbers above and below a sigma, represent the range of numbers you are plugging into i and adding up
  81. summand
    in sigma notation, what you're adding up (e.g. i2)
  82. solution
    in sigma notation, the answer that you solve for (your sum after you add everything up)
  83. Image Upload 5
    bar graph
  84. Image Upload 6
    two-way table
  85. Image Upload 7
    marginal distribution
  86. Image Upload 8
    conditional distribution
  87. Image Upload 9
    segmented bar graph
  88. Image Upload 10
  89. Image Upload 11
    back-to-back stemplot
  90. Image Upload 12
  91. frequency table categories
    class and count
  92. relative frequency table categories
    class and percent
  93. Image Upload 13
    frequency histogram; relative frequency histogram
  94. Image Upload 14
  95. Image Upload 15
    side-by-side boxplot
  96. Image Upload 16
    side-by-side bar graph
Card Set
AP Stats: Chapter 1