Statistics Fall 2010

  1. What is statistics?
    Statistics is the science of conducting studies to collect, organize, summarize, and draw conclusions from data. *(this is not the same as a statistic)*
  2. What is a variable?
    A variable is a characteristic or attribute that can take on different values. If it's obtained by chance, then it's a "random" variable. Different values are called DATA. Variable = imprint on the coin. Random Variable is the process of recording the results. Data are the individual recorded results.
  3. What are the two main areas of STATS?
    • 1) Descriptive-strictly observational
    • 2) Inferential - draw conclusion now; hypothesis testing; predictive about the future
  4. What it is a sample?
    This consists of some collection of the subject being studied. It's any subset of the entire group being studied. A sample obtained by chance (or random #'s) is a random sample.
  5. What is a population?
    A population consists of all subjects (or objects) being studied.

    EX: A recent survey found that, on average, about 33% of Americans vote for president every 4 years. POPULATION: Americans SAMPLE: Americans over 50
  6. What is a discrete variable?
    This is a variable that takes on a countable number of values.
  7. What is a continuous variable?
    A continuous variable take on an uncountable number of values.
  8. What is a countable set?
    This is when you can pair your set of values with the positive integers (1,2,3,4,5...)
  9. What is an uncountable set?
    Too many values to pair
  10. What are the types of studies?
    Observational Study and Experimental Study
  11. What is an independent variable?
    The variable being manipulated.
  12. What is the dependent variable?
    This is the resultant variable (Y)
  13. What is a frequency distribution?
    This is the organization of raw data using classes (categories) and frequencies (#of things in a class).
  14. What is the method for constructing a frequency distribution?
    • 1. Use between 5 and 20 classes (categories) and, if possible, make the class width odd. (so the mid will be a whole number)
    • 2. make the classes mutually exclusive
    • 3. make the classes exhaustive
    • 4. use continuous classes
    • 5. make the width of the classes equal
  15. what is a frequency?
    the number of things in a class
  16. what is a class?
    a category
  17. what are the steps to constructing a frequency distribution
    • 1) get points
    • 2) create chart
    • 3) determine width of each class
    • 4)
  18. bimodal 111
    two values occur with the same greatest frequency
  19. boxplot 162
    A graph of a data set obtained by drawing a horizontal line from the minimum data value to Q1, drawing a horizontal line from Q3 to the maximum data value, and drawing a box whose vertical sides pass through Q1 and Q3 with a vertical line inside the box passing thorugh the median or Q2
  20. Chebyshev's Theorem 134
    The proportion of values from a data set that will fall within k standard deviations of the mean will be at least 1 -1/k^2, where k is a number greater than 1 (k is not necessarily an integer).
  21. coefficient of variation 132
    this is denoted by CVar, is the standard deviation divided by the mean. The result is expressed as a percentage.
  22. data array 109
    Data set in order.
  23. decile 151
    deciles divide the data distribution into 10 groups
  24. empirical rule 136
    • in normal distribution (bell shaped):
    • 1) 68% of the data will fall within 1 standard deviation of the mean
    • 2) 95% of the data will fall between 2 standard deviations of the mean
    • 3) 99.7% of the data will fall between 3 standard deviations of the mean
  25. exploratory data analysis (EDA) 162
    the act of analyzing data to determine what information can be obtained by using stem and leaf plots, medians, interquartile ranges, and boxplots
  26. five-number summary 162
    five specific values for a data set that consist of the lowest and highest values, Q1, Q3, and the median.
  27. interquartile range (IQR) 151
    the difference between Q1 and Q3 and is the range of the middle 50% of the data
  28. mean 106
    the mean is the sum of values divided by the number of values. SigmaX/n The symbol is X with a line over the top of it
  29. median 109
    the mid point of the data array. the symbol is MD
  30. midrange 114
    the sum of the lowest and the highest values in the data set, divided by 2. The symbol is MR. lowest value + highest value/2
  31. modal class 112
    class with the largest frequency
  32. mode 111
    the value that occurs the most
  33. multimodal 111
    more than two values that occur with the same greatest frequency
  34. negatively skewed or left skewed distribution 117
    majority of the data falls to the right of the median with a tail to the left
  35. Outlier 151
    an extremely high or an extremely low data value when compared with the rest of the data values
  36. parameter 106
    characteristic or measure obtained by using all the data values from a specific population
  37. percentile 143
    divide the data set into 100 equal groups
  38. positively skewed or right skewed distribution 117
    majority of the data is left of the median. tail is to the right of the median.
  39. quartile 149
    divide the distribution into 4 groups separated by: Q1, Q2, Q3
  40. range 124
    the highest value minus the lowest value. Sybol R is used.
  41. range rule of thumb 133
    a rough estimate of standard deviation s~range/4
  42. resistant statistic 165
    relatively less affected by outliers.
  43. standard deviation 127
    square root of the variance. the symbol is the lower case sigma. o with the line from the forehead.

    square root of data value minus population mean divided by population size
  44. statistic 106
    characteristic or measure obtained by using the data values from a sample.
  45. symmetric distribution 117
    data values are evenly distributed on both sides of the mean.
  46. unimodal 111
    one value that occurs with the greatest frequency
  47. variance 127
    the average of the squares of the distance each value is from the mean. The symbol for the population variance is lowercase sigma squared.

    population variance equation = sigma(Data value - population mean)/population size
  48. weighted mean 115
    used when the values are not all equally represented.

    equation = sum of the weights*values/sum of the weights
  49. z score or standard score 142
    obtained by subtracting the mean from the value and dividing the result by the standard deviation. This is used to compare relative positions of data from the results of others
  50. categorical frequency distribution 38
    used when the data are categorical (nominal)
  51. class 37
    raw data value is placed into a quantitative or qualitative category
  52. class boundaries 39
    the upper and lower values of a class for a grouped frequency distribution whose values have one additional decimal place more than the data and end in the digit 5
  53. class midpoint 40
    found by adding the upper limit and lower limit and dividing by 2.
  54. class width 39
    found by subtracting the upper limit from the lower limit
  55. cumulative frequency 54
    the sum of the frequencies accumulated up to the upper boundary of a class in the distribution
  56. cumulative frequency distribution 42
    a distribution that shows the number of data values less than or equal to a specific value
  57. frequency 37
    the number of data values contained in a specific class
  58. frequency distribution 37
    the organization of raw data in table form, using classes and frequencies
  59. frequency polygon 53
    a graph that displays the data by using lines that connect points plotted for the frequencies at the midpoints of classes. The frequencies are represented by the heights of the points
  60. grouped frequency distribution 39
    when the range of the data is large, the data must be grouped into classes that are more than one unit in width.
  61. histogram 51
    a graph that displays the data by using contiguous vertical bars (unless the frequency of the class is 0) of various heights to represent the frequencies of the classes
  62. lower class limit 39
    smallest data value that can be included in the class
  63. ogive 54
    a graph that represents the cumulative frequencies for a the classes in a frequency distribution
  64. open ended distribution 41
    no specific end or no specific beginning
  65. Pareto Chart 70
    used to represent a frequency distribution for a categorical variable, and the frequencies are displayed by the heights of vertical bars, which are arranged in order from highest to lowest
  66. pie graph 73
    a circle divided into sections or wedges according to the percentage of frequencies in each category of the distribution
  67. raw data 37
    data are in original form
  68. relative frequency graph 56
    proportions instead of raw data as frequencies
  69. stem and leaf plot 80
    a data plot that uses part of the data value as the stem and part of the data value as the leaf to form groups or classes
  70. Time Series Graph 72
    represents data that occur over a specific period of time
  71. ungrouped frequency distribution 43
    range of data is relatively small, a frequency distribution can be constructed using single data values for each class
  72. upper class limit 39
    largest data value included in the class
Card Set
Statistics Fall 2010
Saturday class going over statistics