
Two parts of The Statistical Inferences
 1. Estimating unknown parameter(s) and constructing (1α)100% Confidence interval for unknown parameters
 2. Tests of Hypothesis about the unknown parameter(s)

Estimator
A rule that tells us how to calculate the estimator based on the information contained in the sample. It is generally expressed as a formula which does not involve any unknown parameters in it. There are two types of estimators: Point Estimator and Interval Estimator

Point Estimator
An estimator given as a point or a single value

Unbiased Estimator
 Let thetahat be the point estimator of unknown population parameter theta [where theta could be μ or p or σ^{2}^{ }] if E(thetahat)= theta, then the point estimator thetahat is an unbiased estimator of theta
 eg. E(s)=μ

Interval Estimator
When an interval is constructed around the point estimate, and it is stated that this interval is likely to contain unknown population parameter(s) with a specific level. This confidence level is usually denoted by (1α)100% where α is called the coefficient of confidence. If (1α)100% is not given, we usually use (1α)100%=95%

Interpretation of (1α)100% Confidence Interval
In repeated sampling under identical conditions, (1α)100% of all confidence intervals constructed in this manner will enclose the unknown mean μ

3 quantities to decrease the width of the Confidence Interval
 1. Confidence level (1α)100% or z_{α/2} (not ideal, lowers the probability that our confidence interval contains the unknown mean μ)
 2. Population variance σ^{2} or σ (not ideal, as recalculating variance is time consuming and costly, as we have to go through the entire population)
 3. Sample size n (ideal)

Margin of Error (for the estimate of unknown mean μ)
Denoted by E and defined as the quantity that is subtracted and added to the sample mean to obtain (1α)100% confidence interval. Also called the "bound on the error of estimation" or "the maximum error" or "the estimation is within.."

Interpretation of E
We can say with probability (1α)100% that the maximum error is within ±E when estimating μ by xbar

The most conservative estimate of n
When we have no prior information about p or q, we use p=.5 and therefore q=.5 so that the variance of phat, v(phat) is maximized.
The sample size n obtained using p=.5 and q=.5 is called the most conservative estimate of n.

Statistical Hypothesis
A conjecture about the unknown population parameter(s). The conjecture may or may not be true. There are two types of statistical hypothesis for each situation, called the Null Hypothesis and the Alternative Hypothesis

Null Hypothesis
Denoted by H_{0}, and states that the unknown population parameter is equal to a specific value. The Null Hypothesis always has an equal sign in it, and this is the hypothesis that is actually tested.

Alternate Hypothesis
Denoted by H_{A }, and defined as the complement or negation or opposite of the Null Hypothesis (H_{0})

Type I error
denoted by α and represents the probability of rejecting H_{0 }given H_{0 }is true. The value of alpha is also called the significance level of the test.

Type II error
denoted by β and represents to probability of accenting H_{0} given H_{0} is is false. Note: β≠(1α)!

The Power of the Test
1β, where β is Type II error. Both β and α cannot be reduced simultaneously for fixed sample size n (one goes up when the other goes down). Increasing n maximizes the power of the test, as it lowers both β and α.

The Classical or Critical Value Approach to testing Hypothesis
 I. Formulate H_{0} and H_{A}
 II. Select an appropriate test statistic (z_{calculated })
 III. Fix the level of significance (α) and formulate the decision rule
 IV. Write your conclusion in words

The Decision Rule
Aka the Critical Region or Rejection Region, depends on H_{A }and α. If H_{A} is twosided, we use z_{α/2} and z_{α/2} or t(n1, α/2) and t(n1, α/2). Otherwise we use z_{α} or z_{α} in the same direction of H_{A}

The mean and standard deviation always have...
The same units!

the Pvalue
 An alternate method to test H_{0} , the Pvalue is the probabillity, assuming H_{0 }is true that the statistic z_{c }would take an extreme or mre extreme value than the actually observed value.
 In fact, the pvalue is the smallest calculated α or Type I error assuming H_{0} is true. Thus, we reject H_{0} if α>pvalue.

Three methods for Testing Hypothesis
 a) The Classical or Critical Value Approach
 b) the pvalue Approach
 c) If H_{A} is twosided, (1α)100% confidence interval for μ (ie, Reject H_{0} if _{} μ=μ_{0} does not lie in the (1α)100% confidence interval)

Assumptions for using (1 )100% Confidence Interval for two populations when σ is known
 1) Two samples are random & independent
 2) Both samples came from two independent, normal populations
 3) σ_{1}^{2} (σ_{1}) and σ_{2}^{2}(σ_{2}) are known

Assumptions for using (1 )100% Confidence Interval for tdistribution
 1) Two samples are random & independent
 2) Both samples came from normal populations
 3) σ_{1}^{2} (σ_{1}) and σ_{2}^{2}(σ_{2}) are unknown but equal

The point estimator for the unknown common variance σ^{2}^{}^{ }is
s_{p}^{2}

To test the hypothesis about unknown p_{1 }& p_{2}
we combine the information given in both samples to compute estimated variance of p_{1} & p_{2}

To construct a (1 )100% confidence interval for p _{1} and p _{2}_{ }
we do NOT combine the information contained in both samples to compute the estimated variance

A goodness of fit test
Tests the Null Hypothesis that the observed frequencies follow a pattern or theoretical distribution. The test is goodnessoffit because the hypothesis tested is how good the observed frequencies fit a given pattern

The squared goodness of fit test
used to test whether of not the sampled multinomial data is in agreement with the hypothesized distribution. OR Testing 3 or more unknown population proportions.

In a goodness of fit test, when is the Null Hypothesis rejected?
A good agreement between the observed and expected frequencies results in a small value of . A perfect agreement would result in =0. Thus the Null Hypothesis is rejected if is large [upper tail test]

For tests of Independence between Criterion A and B...
H_{o}: The two criteria A&B are
INDEPENDENT or not related (HoI)

For tests of Independence between Criterion A and B...
H_{A}: The two criteria A&B are
DEPENDENT or related (H_{}AD)

For tests of independence, _{} is

For tests of Independence, E_{ij}=

For a 2x2 Contingency table, testing for independence for two criteria is equivalent to testing
H_{0}: P_{1}=P_{2 }vs H_{A}:P_{1}≠P_{2}

Test of Homogeneity
 A test of homogeneity involves testing the
 H_{0}: the proportions of elements with certain characteristics in two or more different populations are the same
 vs
 H_{A}: the proportions are not the same

Analysis of Variance (ANOVA)
A procedure used to test the Null Hypothesis that the means of three of more populations are equal.

The grand mean for all ksamples is

SST, SSB AND SSW must always be...
positive, since it's they're the sum of squares

The point estimator for the unknown common variance σ_{k}^{2} is
Mean Square within (MSW)

Assumptions for ANOVA
 a. ksamples are random and independent
 b.Each of the ksamples came from a normal population
 c. σ^{2}_{1}, σ^{2}_{2}, σ^{2}_{3}, ,σ^{2}_{k} are unknown but equal

The Statistical Linear Regression model is

SSxx or SSyy must always be...
positive

Sxy can be...
positive or negative

To Construct a (1 )100% Confidence Interval for unknown B, use
_{} , where

When solving for b in the estimated regression model...
write the equation out in general form first. If there it is "abx" that means "a+(b)x" which implies b is negative

The population correlation
denoted by and defined as the strength of the relationship between two variables, x&y.

The sample correlation
denoted by r. Since the population correlation is usually unknown, the point estimator of the population correlation is the sample correlation r.

The range for is
1<r<1

if r~1
we have a perfect relationship between x&y in a positive way. That is, if x increases, y increases.

if r~1
if x increases, y decreases. Or if x decreases, y increases

if r~0
the variables x&y are not related

Coefficient of Determination
denoted by for the population and r ^{2} for the sample. But or is usually unknown. The coefficient of determination for the samples represents the % of variation in the dependent variable y, explained by the independent variable (x) in the extimated least squares regression model ŷ . The higher the value of r ^{2} the better it is.

The test statistic for Ho: is

If no value of is given, use

For sample correlation r, approximately equal to 1...
r≈1 for values of r>80% or .8

