
Element
Entity upon which data are collected on
Ex: Name of player

Observation
set of measurements obtained for a particular element

Variable
characteristic of an element

Variable
Categorical (qualitative)
non numerical data that is classified into categories
Ex: Position or team

Variable:
Categorical:
Nominal
categorical data which have no meaningful order
Ex: position, team

Variable:
Categorical:
Ordinal
categorical data which can be ordered.
Ex: shirt size – small, medium, large

Variable:
Quantitative
numerical data that is measures on a numerical scale
Ex: Points scored in a game

Variable:
Quantitative:
Interval
numerical data that has no true 0 point
Ex: Temperature

Variable:
Quantitative:
Ratio
numerical data with a true 0 point
Ex: points scored

Cross Sectional Data
data that is collected at the same time
Ex: points scored in a specific week

Time Series
data collected over different time periods
Ex: points scored over multiple seasons

Descriptive Statistics
uses tables, graphs, and numerical methods to summarize data

Inferential Statistics
uses data from a sample to make estimates or test hypotheses about the characteristics of a population

Population
the set of ALL elements in a population

Sample
a SUBSET of a population. Sample estimates a population

Frequency Distribution
table that summarizes the number of items that occur in nonoverlapping categories

Histogram
graphical way to display quantitative data. Uses intervals to display frequency table data

Correlation
shows an association between 2 variables

Measures of Central Tendency
Mean
the average of a sample of (n) observations.
The mean is sensitive to extreme values

Measures of Central Tendency
Median
the middle point where exactly ½ of the observations on either side of that point
The median is resistant to extreme values

Measures of Central Tendency
Mode
the observation that occurs most frequently.
Can have 2 modes (bimodal)
or more than 2 modes (multimodal)

Statistic
the numeric measure of SAMPLE data

Parameter
the numeric measure of POPULATION data

Types of Distribution
Symmetric
mean = median

Types of Distribution
Skewed Right (positive)
median is best measure
Mean is greater than the median

Types of Distribution
Skewed Left (negative)
median is best measure.
Mean is less than median

Types of Distribution
Percentile
a data value that has at least p% fall at or below a percent value

To find percentile
o Arrange observations in increasing order
o Compute the index: I = (p/100)*n
o If the index (i) is an integer, then take the average of that point and the next increasing point
o If the index (i) is not an integer, use the location of the next integer greater than i

Quartile Range
the area between the 25th and 75th percentile. Holds 50% of the data set

Measures of Variability and Dispersion
Range
the difference between the largest and smallest values in a data set

Measures of Variability and Dispersion
Variance
based on the difference between each value and the mean
Population variance (σ2)
 Sample variance (s2)
 has (n1) in the denominator

Measures of Variability and Dispersion
Standard Deviation
the square root of variance.
Easier to interpret than variance because it isin the same units as the original data

Measures of Variability and Dispersion
Coefficient of variation
measures how large the standard deviation is relative to the mean.
It is expressed in a percentage.
 (CV = standard deviation/mean *100).
 Lower Lower is better.
Used to compare data which has different Standard deviations and means.

Measures of Distribution Shape and Relative Location
Z Scores
gives the number of standard deviations an observation is from the mean.
A z score of 0 indicates that the value is equal to the mean.

Measures of Distribution Shape and Relative Location
Outliers
z scores greater than 2 in highly skewed distributions or greater than 3 in normal distributions

Measures of Distribution Shape and Relative Location
Chebyshev’s Theorem
Within +/ 2 standard deviations, 75% of the observations will fall within this range
Within +/ 3 standard deviations, 89% of the observations will fall within this range

Measures of Distribution Shape and Relative Location
Empirical Rule (normal distribution)
Within +/ 1 standard deviations, 68% of the observations will fall within this range
Within +/ 2 standard deviations, 95% of the observations will fall within this range
Within +/ 3 standard deviations, 100% of the observations will fall within this range

Measures of Distribution Shape and Relative Location
Correlation Coefficient
the relationship between 2 random variables

Measures of Distribution Shape and Relative Location
Correlation Coefficient
Univariate
data collected on one random variable

Measures of Distribution Shape and Relative Location
Correlation Coefficient
Bivariate
data collected on two random variables

Measures of Distribution Shape and Relative Location
Correlation Coefficient
Person product moment sample correlation coefficient
measures the strength of the linear relationship (Rxy).
The sign depends on the slope of the data.
Must fall between 1 and +1.
 This is a POINT measurement.
 0.00 – 0.29
 Little if any correlation
 0.30 – 0.49
 Weak/Low correlation
 0.50 – 0.69
 Moderate correlation
 0.70 – 0.89
 Strong/High correlation
 0.90 – 1.00
 Very strong/very high correlation

Probability
Experimental Outcome
A sample point

Probability
Event
one or more sample points/experimental outcomes

Probability
Properties
The sum of the probabilities must equal 1
Probabilities must fall between 0 and 1

Probablities
When to use combination or permutation formula?
Combination when order is not importants (C)
Permutations when order is important (P)

Probabilities
Methods (3)
Classical  # of outcomes / total # of outcomes
Relative Frequency – used when an experiment is repeated many times
Subjective – based on experience or intuition. Used when no relative data is available

Probablities
Events
a collection of sample points/experimental outcomes ( has one or more sample points)

Discrete Probability Variables
Random Variables
a variable that associates a numerical value with each outcome

Discrete Probability Variables
Random Variables
Discrete
a finite number of values
Ex: number of defective radios

Discrete Probability Variables
Random Variables
Discrete Properties
0 < f(x) < 1
Σf(x) = 1

Discrete Probability Variables
Random Variables
Discrete uniform probability has the form of?
f(x) = 1/n

Discrete Probability Variables
Random Variables
Discrete
Expected Value
the mean of a discrete random variable

Discrete Probability Variables
Random Variables
Continuous
numerical value in one or more intervals on the real number line.
Can pick 2 points and can find a 3rd between them such as a time measurement.

