E1 statistics flashcards.txt

  1. Statistics
    • is a collection of methods for planning experiments, obtaining data, and then
    • organizing, summarizing, presenting, analyzing, interpreting and drawing conclusions based on the data. It is the science of data.
  2. Statistical thinking
    involves applying rational thought and the science of statistics to critically assess data and inferences. In this course we will divide our study of statistics into two categories:
  3. Descriptive statistics
    is where we will organize and summarize the data
  4. Inferential statistics
    is where we use data to make predictions and decisions about a population based on information from a sample.
  5. Descriptive statistics
    • utilizes numerical and graphical methods to look for patterns in a data set, to summarize the information revealed in a data set and to present that information in a convenient form.
    • 
    • Inferential statistics
    • utilizes sample data to make estimates, decisions, predictions or other generalizations about a larger set of data.
  6. population
    is the set of all measurements of interest to the investigator. Typically, there are too many experimental units in a population to consider every one. However, if we can examine every single one, we conduct what is called a census.
  7. sample
    is a subset of measurements selected from the population of interest.
  8. parameter
    is a numerical measurement describing some characteristic of a population and computed from all of the population measurements. For example, a population average (mean), the average obtained from every item in the population, is a parameter.
  9. statistic
    is a numerical measurement describing some characteristic of a sample drawn from the population.
  10. variable
    a characteristic that changes or varies over time or varies across different individual subjects.

    • experimental unit
    • individual or object on which a variable is measured, or about which we collect data.
    •  Person
    •  Place
    •  Thing
    •  Event
  11. measure of reliability
    is a statement about the degree of uncertainty of a statistical inference.
  12. Continuous numerical data
    result from infinitely many possible values that correspond to some continuous scale that covers a range of values without gaps, interruptions, or jumps. Example: The finishing times of a marathon
  13. Discrete numerical data
    result when the number of possible values is either a finite number or a countable number. (That is, the number of possible values is 0 or 1 or 2 and so on.) Example: The numbers of fatal automobile accidents last month in the 10 largest US cities
  14. Quantitative variables
    are numerical observations
  15. Continuous Variables
    can assume all of the infinitely many values corresponding to a line interval. Example: Y = The amount of milk that a cow produces; e.g. 2.343115 gallons per day
  16. Discrete Variables
    can assume only a countable number of values. (i.e. the number of possible values is 0, 1, 2, 3, . . .)
  17. Qualitative variables
    are non-numerical or categorical observations.
  18. representative sample
    exhibits characteristics typical of the target population. In order to ensure that we get a good sample that is representative, we often employ a random sampling approach.
  19. random sample
    is selected in such a way that every different sample of size n has an equal chance of selection.
  20. class
    is one of the categories into which data can be classified.
  21. Class frequency
    is the number of observations belonging to the class.
  22. Pareto Diagram
    is a bar graph that arranges the categories by height from tallest (left) to smallest (right).
  23. Stem-and-Leaf Display
    shows the number of observations that share a common value (the stem) and the precise value of each observation (the leaf)
  24. Histogram
    are like bar charts for numerical data, but they never have gaps between the bars (unless the frequency for the class is zero).
  25. frequency distribution (or frequency table)
    lists data values (usually in groups), along with their corresponding relative frequencies.
  26. Lower class limits
    are the smallest numbers that can belong to the different classes.
  27. Upper class limits
    are the largest numbers that can belong to the different classes.
  28. Class boundaries are the numbers used to separate classes, but without the gaps created by class limits.
  29. Class midpoints
    are the midpoints of the classes. Each class midpoint can be found by adding the lower class limit to the upper class limit and dividing the sum by 2.
  30. class width
    is the difference between two consecutive lower class limits or two consecutive lower class boundaries.
  31. relative frequency histogram
    is a bar graph in which the heights of the bars represent the proportion of occurrence for particular classes.
  32. Central tendency
    the tendency of the data to cluster, or center, about certain numerical values.
  33. Variability
    is the same as the spread or clustering of the data. shows how strongly the data cluster around that (those) value(s)
  34. Mean
    found by summing up all the measurements and then dividing by the number of measurements.
  35. Median
    middle number when the measurements are arranged in numerical order. It is also called the 50th percentile since 50% of the data is below the median and 50% is above.
  36. Mode
    data value that occurs most frequently.
  37. Skewed
    • when one side of the distribution has more extreme values than the
    • other. If the population mean is greater than or less than the population median the distribution is skewed.
  38. Range
    • largest measurement minus the smallest measurement.
    • Range = Max Min

    • Variance
    • for a sample of n measurements is equal to the sum of the squared distances from the mean divided by n � 1.
Author
Anonymous
ID
104596
Card Set
E1 statistics flashcards.txt
Description
E1 statistics mcguckian
Updated