-
Statistics
- is a collection of methods for planning experiments, obtaining data, and then
- organizing, summarizing, presenting, analyzing, interpreting and drawing conclusions based on the data. It is the science of data.
-
Statistical thinking
involves applying rational thought and the science of statistics to critically assess data and inferences. In this course we will divide our study of statistics into two categories:
-
Descriptive statistics
is where we will organize and summarize the data
-
Inferential statistics
is where we use data to make predictions and decisions about a population based on information from a sample.
-
Descriptive statistics
- utilizes numerical and graphical methods to look for patterns in a data set, to summarize the information revealed in a data set and to present that information in a convenient form.
- Inferential statistics
- utilizes sample data to make estimates, decisions, predictions or other generalizations about a larger set of data.
-
population
is the set of all measurements of interest to the investigator. Typically, there are too many experimental units in a population to consider every one. However, if we can examine every single one, we conduct what is called a census.
-
sample
is a subset of measurements selected from the population of interest.
-
parameter
is a numerical measurement describing some characteristic of a population and computed from all of the population measurements. For example, a population average (mean), the average obtained from every item in the population, is a parameter.
-
statistic
is a numerical measurement describing some characteristic of a sample drawn from the population.
-
variable
a characteristic that changes or varies over time or varies across different individual subjects.
- experimental unit
- individual or object on which a variable is measured, or about which we collect data.
- Person
- Place
- Thing
- Event
-
measure of reliability
is a statement about the degree of uncertainty of a statistical inference.
-
Continuous numerical data
result from infinitely many possible values that correspond to some continuous scale that covers a range of values without gaps, interruptions, or jumps. Example: The finishing times of a marathon
-
Discrete numerical data
result when the number of possible values is either a finite number or a countable number. (That is, the number of possible values is 0 or 1 or 2 and so on.) Example: The numbers of fatal automobile accidents last month in the 10 largest US cities
-
Quantitative variables
are numerical observations
-
Continuous Variables
can assume all of the infinitely many values corresponding to a line interval. Example: Y = The amount of milk that a cow produces; e.g. 2.343115 gallons per day
-
Discrete Variables
can assume only a countable number of values. (i.e. the number of possible values is 0, 1, 2, 3, . . .)
-
Qualitative variables
are non-numerical or categorical observations.
-
representative sample
exhibits characteristics typical of the target population. In order to ensure that we get a good sample that is representative, we often employ a random sampling approach.
-
random sample
is selected in such a way that every different sample of size n has an equal chance of selection.
-
class
is one of the categories into which data can be classified.
-
Class frequency
is the number of observations belonging to the class.
-
Pareto Diagram
is a bar graph that arranges the categories by height from tallest (left) to smallest (right).
-
Stem-and-Leaf Display
shows the number of observations that share a common value (the stem) and the precise value of each observation (the leaf)
-
Histogram
are like bar charts for numerical data, but they never have gaps between the bars (unless the frequency for the class is zero).
-
frequency distribution (or frequency table)
lists data values (usually in groups), along with their corresponding relative frequencies.
-
Lower class limits
are the smallest numbers that can belong to the different classes.
-
Upper class limits
are the largest numbers that can belong to the different classes.
-
Class boundaries are the numbers used to separate classes, but without the gaps created by class limits.
-
Class midpoints
are the midpoints of the classes. Each class midpoint can be found by adding the lower class limit to the upper class limit and dividing the sum by 2.
-
class width
is the difference between two consecutive lower class limits or two consecutive lower class boundaries.
-
relative frequency histogram
is a bar graph in which the heights of the bars represent the proportion of occurrence for particular classes.
-
Central tendency
the tendency of the data to cluster, or center, about certain numerical values.
-
Variability
is the same as the spread or clustering of the data. shows how strongly the data cluster around that (those) value(s)
-
Mean
found by summing up all the measurements and then dividing by the number of measurements.
-
Median
middle number when the measurements are arranged in numerical order. It is also called the 50th percentile since 50% of the data is below the median and 50% is above.
-
Mode
data value that occurs most frequently.
-
Skewed
- when one side of the distribution has more extreme values than the
- other. If the population mean is greater than or less than the population median the distribution is skewed.
-
Range
- largest measurement minus the smallest measurement.
- Range = Max Min
- Variance
- for a sample of n measurements is equal to the sum of the squared distances from the mean divided by n � 1.
|
|