-
Element
Entity upon which data are collected on
Ex: Name of player
-
Observation
set of measurements obtained for a particular element
-
Variable
characteristic of an element
-
Variable
Categorical (qualitative)
non numerical data that is classified into categories
Ex: Position or team
-
Variable:
Categorical:
Nominal
categorical data which have no meaningful order
Ex: position, team
-
Variable:
Categorical:
Ordinal
categorical data which can be ordered.
Ex: shirt size – small, medium, large
-
Variable:
Quantitative
numerical data that is measures on a numerical scale
Ex: Points scored in a game
-
Variable:
Quantitative:
Interval
numerical data that has no true 0 point
Ex: Temperature
-
Variable:
Quantitative:
Ratio
numerical data with a true 0 point
Ex: points scored
-
Cross Sectional Data
data that is collected at the same time
Ex: points scored in a specific week
-
Time Series
data collected over different time periods
Ex: points scored over multiple seasons
-
Descriptive Statistics
uses tables, graphs, and numerical methods to summarize data
-
Inferential Statistics
uses data from a sample to make estimates or test hypotheses about the characteristics of a population
-
Population
the set of ALL elements in a population
-
Sample
a SUBSET of a population. Sample estimates a population
-
Frequency Distribution
table that summarizes the number of items that occur in non-overlapping categories
-
Histogram
graphical way to display quantitative data. Uses intervals to display frequency table data
-
Correlation
shows an association between 2 variables
-
Measures of Central Tendency
Mean
the average of a sample of (n) observations.
The mean is sensitive to extreme values
-
Measures of Central Tendency
Median
the middle point where exactly ½ of the observations on either side of that point
The median is resistant to extreme values
-
Measures of Central Tendency
Mode
the observation that occurs most frequently.
Can have 2 modes (bimodal)
or more than 2 modes (multimodal)
-
Statistic
the numeric measure of SAMPLE data
-
Parameter
the numeric measure of POPULATION data
-
Types of Distribution
Symmetric
mean = median
-
Types of Distribution
Skewed Right (positive)
median is best measure
Mean is greater than the median
-
Types of Distribution
Skewed Left (negative)
median is best measure.
Mean is less than median
-
Types of Distribution
Percentile
a data value that has at least p% fall at or below a percent value
-
To find percentile
o Arrange observations in increasing order
o Compute the index: I = (p/100)*n
o If the index (i) is an integer, then take the average of that point and the next increasing point
o If the index (i) is not an integer, use the location of the next integer greater than i
-
Quartile Range
the area between the 25th and 75th percentile. Holds 50% of the data set
-
Measures of Variability and Dispersion
Range
the difference between the largest and smallest values in a data set
-
Measures of Variability and Dispersion
Variance
based on the difference between each value and the mean
Population variance (σ2)
- Sample variance (s2)
- has (n-1) in the denominator
-
Measures of Variability and Dispersion
Standard Deviation
the square root of variance.
Easier to interpret than variance because it isin the same units as the original data
-
Measures of Variability and Dispersion
Coefficient of variation
measures how large the standard deviation is relative to the mean.
It is expressed in a percentage.
- (CV = standard deviation/mean *100).
- Lower Lower is better.
Used to compare data which has different Standard deviations and means.
-
Measures of Distribution Shape and Relative Location
Z Scores
gives the number of standard deviations an observation is from the mean.
A z score of 0 indicates that the value is equal to the mean.
-
Measures of Distribution Shape and Relative Location
Outliers
z scores greater than 2 in highly skewed distributions or greater than 3 in normal distributions
-
Measures of Distribution Shape and Relative Location
Chebyshev’s Theorem
Within +/- 2 standard deviations, 75% of the observations will fall within this range
Within +/- 3 standard deviations, 89% of the observations will fall within this range
-
Measures of Distribution Shape and Relative Location
Empirical Rule (normal distribution)
Within +/- 1 standard deviations, 68% of the observations will fall within this range
Within +/- 2 standard deviations, 95% of the observations will fall within this range
Within +/- 3 standard deviations, 100% of the observations will fall within this range
-
Measures of Distribution Shape and Relative Location
Correlation Coefficient
the relationship between 2 random variables
-
Measures of Distribution Shape and Relative Location
Correlation Coefficient
Univariate
data collected on one random variable
-
Measures of Distribution Shape and Relative Location
Correlation Coefficient
Bivariate
data collected on two random variables
-
Measures of Distribution Shape and Relative Location
Correlation Coefficient
Person product moment sample correlation coefficient
measures the strength of the linear relationship (Rxy).
The sign depends on the slope of the data.
Must fall between -1 and +1.
- This is a POINT measurement.
- 0.00 – 0.29
- Little if any correlation
- 0.30 – 0.49
- Weak/Low correlation
- 0.50 – 0.69
- Moderate correlation
- 0.70 – 0.89
- Strong/High correlation
- 0.90 – 1.00
- Very strong/very high correlation
-
Probability
Experimental Outcome
A sample point
-
Probability
Event
one or more sample points/experimental outcomes
-
Probability
Properties
The sum of the probabilities must equal 1
Probabilities must fall between 0 and 1
-
Probablities
When to use combination or permutation formula?
Combination when order is not importants (C)
Permutations when order is important (P)
-
Probabilities
Methods (3)
Classical - # of outcomes / total # of outcomes
Relative Frequency – used when an experiment is repeated many times
Subjective – based on experience or intuition. Used when no relative data is available
-
Probablities
Events
a collection of sample points/experimental outcomes ( has one or more sample points)
-
Discrete Probability Variables
Random Variables
a variable that associates a numerical value with each outcome
-
Discrete Probability Variables
Random Variables
Discrete
a finite number of values
Ex: number of defective radios
-
Discrete Probability Variables
Random Variables
Discrete Properties
0 < f(x) < 1
Σf(x) = 1
-
Discrete Probability Variables
Random Variables
Discrete uniform probability has the form of?
f(x) = 1/n
-
Discrete Probability Variables
Random Variables
Discrete
Expected Value
the mean of a discrete random variable
-
Discrete Probability Variables
Random Variables
Continuous
numerical value in one or more intervals on the real number line.
Can pick 2 points and can find a 3rd between them such as a time measurement.
|
|