
Population
 a collection of persons, objects or items of interest.
 Whatever the researcher is studying

parameter
 a descriptive measure of the population. Usually denoted by Greek letters
 e.g. mean(µ), population variance(σ^2), populuation standard deviation(σ)
 data from a census are parameters

sample
a portion of the whole and if taken properly, representative of the whole

statistic
 a descriptive measure of the sample. Usually denoted by Roman letters
 e.g. mean(x *bar*), sample variance (s^2), sample standard deviation(s)
 data from a sample are statistics

Descriptive Statistics
 Using data gathered on a group to describe or reach concclusions about that same group
 e.g. most athletic stats. The data is gathered from that group and conclusions are drawn about that group only. Basketball stats are about Basketball

Inferential Statistics
 gathering data from a sample and use the statistics generated to reach conlusions about the population from which the sample was taken
 sometimes referred to as inductive statistics

emprical rule
 The approximate values that lie within a given number of standard deviations from the mean of a set of data if the data are normally distributed.
 Distance from the Mean Values within Distance
 µ + 1σ 68%
 µ + 2σ 95%
 µ + 3σ 99.7%

Population Mean
 µ = (∑x)/N
 where x = actual data values
 N = # total terms

standard deviation
 square root of the variance
 σ = sqrt(σ)
 Σ = sqrt( (∑(x µ)^2)/N)

sum of squares of x
 SSx
 The sum of the squared deviations about the mean of a set of values

variance
 average of the squared deviations about the arithmetic mean for a set of numbers
 Population Variance
  σ^2 = (∑(x µ)^2)/N)

deviation from the mean
xµ

mean absolute deviation (MAD)
 the average of the absolute values of the deviations around the mean for a set of numbers
 MAD = (∑xµ)/N
 where
 xµ = actual value of a given number minus the mean
 N= Number of terms

Chebyshev's Theorem
 at least (11/k^2) values will fall within + k standard deviations of the mean regardless of the shape of the distribution. Assume k>1
 e.g. k=2.5, 11/(2.5^2) = .84. so at least .84 of all values are within µ + 2.5σ.
 or at least .84 of all values will be within 2.5 standard deviations of the mean, µ.

sample variance
 variance: s^2 = ∑(x x(bar))^2)/(n1)
 also
 s^2 = (∑x^2  ((∑x)^2)/n)/n1
 where
 x = actual value
 x(bar) = sample mean
 n = sample number

sample standard deviation
 sqrt(s^2) where s^2 =
 s^2 = (∑x^2  ((∑x)^2)/n)/n1

Percentiles
 measure of central tendency that divide a group of data into 100 parts
 87.7% = 87th Percentile
 percentile location: i=(P/100)n
 where P = percentile
 i= percentile location
 n= number in the db
 if i is a whole number then then P = (i+(i+1))/2 or the average of the two numbers
 if i is NOT a whole number then P = whole number of i+1
 e.g. i= 11.8, P = (11.8+1) = 12.8 or 12th percentile
 e.g. i = 11, P = (11+12)/2 = 11.75 = 11th percentile

frequency distribution
 a cumming of data presented in teh form of class intervals and frequency
 e.g. 1 under 3, 3 under 5, etc.
 use classes rule of thumb, 515 classes

range
difference between the largest and smallest values of an order

classes
 515 rule of thumb
 arrangement of values in groups

cumulative frequency
running total of frequency through the classes of a frequency distribution

relative frequency
proportion of total frequency that is in any given class interval in a frequency distribution

class width
range/# classes

histogram
typical vertical barchart used to depict a freq. dist.

frequency polygon
graph in which line segments connnect the dots depicting frequency distribution

ogive
cumulative frequency polygon most useful for running totals

pie chart
 data represented as a whole
 Interval/total * 360

stem & leaf
constructed by separrating the digits for each # of the data into 2 groups

pareto
 Vertical bar chart that displays the most common types of defects
 ranked in order of occurence left to right

scatter plot
 2 dimensional plot of pairs of points from 2 variables
 god for attempting to determine relationship between 2 variables

census
 gather data from a whole population
 data from a census are parameters

Levels of Data
 Lowest to Highest
 Nominal
 Ordinal
 Interval
 Ratio

Nominal
 Lowest level of data: Used only to classify or categorize
 e.g. doctor, lawyer, educator, other
 NONMETRIC Data, aka qualitative data.

Ordinal
 Higher than Nominal, can be used to rank or order subjects
 e.g. not helpful, somewhat helpful, moderately helpful, very helpful, extremely helpful
 NONMETRIC Data, aka qualitative data.

Interval
 Higher than Ordinal
 Distances between consecutive numbers have meaning and the data are always numerical
 e.g. temperature

Ratio
 Highest Level of data measurement
 Have the same properties off Interval but they have an absolute zero which indicates absence
 Ratio of two numbers is meaningful
 e.g. Height, weight, Kelvin temperature, passenger miles

Parametric Stats
Must be Interval or Ratio

NonParametric Stats
Can be nominal or ordinal but can be used to analyze parmetric

grouped data
data that have been organized into a frequency distribution

ungrouped data
raw data or data that have not been summarized in any way

median
 middle value in an ordered array of #s.
 an array with an odd amount of values, the median is the middle value
 an array with an even amount of values the median is the average between the two middle numbers
 the median number is (n+1)/2
 e.g. for 77 terms the median is (77+1)/2= 39th term

Quartiles
 same rules as percentiles, if i is a whole number Qx is the average of the i+(i+1) number
 Q25 = Q1 = first 25% of values ending in the Q25 term
 Q50 = Q2 = first 50% of values ending in the Q50 term
 Q75 = Q3 = first 75% of values ending in the Q75 term
 Q2 is the median

measure of central tendency
yield info about the center, or midddle part, of a group of values

mode
 the most frequently occuring value in a set of data
 bimodal data set has two modes
 multimodal  data set has more than two modes

Inter Quartile Range : IQR
 The middle 50% of values
 IQR = Q3Q1
 e.g. if Q3 = the 12th (70)term and Q1 = the 4th term (5) IQR = 705 or 65

Coefficient of Variation
 The ratio of the standard deviation to the mean expresed in precentage and is denoted as CV
 CV = (σ/µ)100
 e.g. for σ=4.84 & µ = 64.4, CV = 7.5%

z score
 number of standard deviations a value (x) is above or below the mean of a set of numberrs when the data are normally distributed
 z = (xµ)/σ
 e.g. x = 1, µ = 4.28, σ = 2.491, z = 1.32
 x = 9, µ = 9, σ = 2.491, z = 1.89
 z scores still follow the empirical rule

coefficient of correlation
 correlation: measure of the degree of relatedness of variables
 coefficient of correlation = r
 r = (big equation)

classical method of assigning probability
 involves an experiment which is a process that produces outcomes, and an event, which is an outcome of an experiment.
 P(E) = n_e/N
 Highest probability of an outcome is 1.
 Lowest probability is 0

apriori
probabilities can be determined prior to the experiment

intersection
 contains the element common to both sets
 X = 1234 Y = 2367 X(int)Y = 23

mutually exclusive events
 when the occurence of one event precludes the occurence of another event
 e.g. Male and Female. OK and Defective. A person can not be both Male and Female and a part may not be both OK and Defective
 formula: P(X(int)Y) = 0

independent events
 events wherein the occurence or nonoccurence of one of the events does not affect the occurence or nonoccurence of the other event.
 e.g. Coin tosses or Die Rolls. The previous event does not influence the following event
 formula: Independent Events X & Y
 P(XY) = P(X) and P(YX) = P(Y)

complement
 All the elementary events of an experiment not in A comprise its complement.
 e.g. If the experiment is rolling die and the event is 5, then the complement is 1,2,3,4,6
 A'
 P(A') = 1  P(A)

relative frequency of occurence method of assigning probabilities
the probability of an event occurring is equal to the number of times the event has occurred in the past divided by the total number of opportunities for the event to have occurred.

subjective probability
assigning probability based on the feelings or insights of the person determining the probability

mn counting rule
 For an operation of m ways and a second operation of n ways, the tw operations then can occur, in order, in mn ways.
 This rule can be extended to 3 or more operations
 e.g. # of Groups possible with the following factors
 gender, marital status, economic class = 2(m/f), 3(singlenever married/married, divorced), 3(lower/middle/upper)
 =18 groups. Therefore 18 samples could be taken to represent all groups.

sampling from a population with Replacement
 sampling n items from a population of size N with replacement would provide N^n possibilities
 e.g. A die being rolled 3 times in succession, how many different outcomes can occur?
 N = 6, n=3, 6^3 = 216
 A lottery of reusable numbers 6 digits long from 09
 N=10, n=6, 10^6 = 1000000

Combinations
 Sampling n items from a population of size N without replacement
 N^Csub_n = N!\{n!(Nn)!}
 e.g. three lawyers are to be sent to a conference from a pool of 16
 16!/3!13!= 560
 combination because once selected the lawyer can not be selected again

Spearman Rank
 r(sub_s) = 1 (6(sum)d^2/n(n^21))
 where d = differenc in the ranks of each pair
 n= number os pairs
 High positive number indicates a positive correlation
 High negative number indicates a negative correlation
 e.g. if x and y pairs, and spearman's equals .830 this indicates a strong inverse correlation,
 that is when x is high y is low and viceversa

General Law of Addition
 P(A∪B)= P(A) + P(B)  P(A∩B)
 That is probability of A + probability of B  Probability of A&B together

Special Law of Addition
 Applies only if Probabilities are mutually exclusive
 i.e. male or female, or P(A∩B) = .000
 Then the union of P(A) and P(B) = P(A) + P(B)

General Law of Multiplication
 This gives the probability that both A & B will happen at the same time
 P(A∩B) = P(A) * P(BA) = P(B) * P(AB)
 P(A∩B) means that A & B MUST happen.
 P(AB) is the probability of A given that B is true

Special Law of Multiplication
If X, Y are independent, P(X∩Y) = P(X) * P(Y)

Independent Events X,Y
 To test to determine if X & Y are independent events, the following must be true
 P(XY) = P(X) and P(YX) = P(Y)

Conditional Probability
P(XY) = P(X∩Y)/P(Y) = (P(X)*P(XY))/P(Y)

