Intro Stats II Final

  1. Two parts of The Statistical Inferences
    • 1. Estimating unknown parameter(s) and constructing (1-α)100% Confidence interval for unknown parameters
    • 2. Tests of Hypothesis about the unknown parameter(s)
  2. Estimator
    A rule that tells us how to calculate the estimator based on the information contained in the sample. It is generally expressed as a formula which does not involve any unknown parameters in it. There are two types of estimators: Point Estimator and Interval Estimator
  3. Point Estimator
    An estimator given as a point or a single value
  4. Unbiased Estimator
    • Let theta-hat be the point estimator of unknown population parameter theta [where theta could be μ or p or σ2 ] if E(theta-hat)= theta, then the point estimator theta-hat is an unbiased estimator of theta
    • eg. E(s)=μ
  5. Interval Estimator
    When an interval is constructed around the point estimate, and it is stated that this interval is likely to contain unknown population parameter(s) with a specific level. This confidence level is usually denoted by (1-α)100% where α is called the coefficient of confidence. If (1-α)100% is not given, we usually use (1-α)100%=95%
  6. Interpretation of (1-α)100% Confidence Interval
    In repeated sampling under identical conditions, (1-α)100% of all confidence intervals constructed in this manner will enclose the unknown mean μ
  7. 3 quantities to decrease the width of the Confidence Interval
    • 1. Confidence level (1-α)100% or zα/2 (not ideal, lowers the probability that our confidence interval contains the unknown mean μ)
    • 2. Population variance σ2 or σ (not ideal, as recalculating variance is time consuming and costly, as we have to go through the entire population)
    • 3. Sample size n (ideal)
  8. Margin of Error (for the estimate of unknown mean μ)
    Denoted by E and defined as the quantity that is subtracted and added to the sample mean to obtain (1-α)100% confidence interval. Also called the "bound on the error of estimation" or "the maximum error" or "the estimation is within.."
  9. Interpretation of E
    We can say with probability (1-α)100% that the maximum error is within ±E when estimating μ by x-bar
  10. The most conservative estimate of n
    When we have no prior information about p or q, we use p=.5 and therefore q=.5 so that the variance of p-hat, v(p-hat) is maximized.

    The sample size n obtained using p=.5 and q=.5 is called the most conservative estimate of n.
  11. Statistical Hypothesis
    A conjecture about the unknown population parameter(s). The conjecture may or may not be true. There are two types of statistical hypothesis for each situation, called the Null Hypothesis and the Alternative Hypothesis
  12. Null Hypothesis
    Denoted by H0, and states that the unknown population parameter is equal to a specific value. The Null Hypothesis always has an equal sign in it, and this is the hypothesis that is actually tested.
  13. Alternate Hypothesis
    Denoted by HA , and defined as the complement or negation or opposite of the Null Hypothesis (H0)
  14. Type I error
    denoted by α and represents the probability of rejecting H0 given H0 is true. The value of alpha is also called the significance level of the test.
  15. Type II error
    denoted by β and represents to probability of accenting H0 given H0 is is false. Note: β≠(1-α)!
  16. The Power of the Test
    1-β, where β is Type II error. Both β and α cannot be reduced simultaneously for fixed sample size n (one goes up when the other goes down). Increasing n maximizes the power of the test, as it lowers both β and α.
  17. The Classical or Critical Value Approach to testing Hypothesis
    • I. Formulate H0 and HA
    • II. Select an appropriate test statistic (zcalculated )
    • III. Fix the level of significance (α) and formulate the decision rule
    • IV. Write your conclusion in words
  18. The Decision Rule
    Aka the Critical Region or Rejection Region, depends on HA and α. If HA is two-sided, we use zα/2 and -zα/2 or t(n-1, α/2) and -t(n-1, α/2). Otherwise we use zα or -zα in the same direction of HA
  19. The mean and standard deviation always have...
    The same units!
  20. the P-value
    • An alternate method to test H0 , the P-value is the probabillity, assuming H0 is true that the statistic zc would take an extreme or mre extreme value than the actually observed value.
    • In fact, the p-value is the smallest calculated α or Type I error assuming H0 is true. Thus, we reject H0 if α>p-value.
  21. Three methods for Testing Hypothesis
    • a) The Classical or Critical Value Approach
    • b) the p-value Approach
    • c) If HA is two-sided, (1-α)100% confidence interval for μ (ie, Reject H0 if μ=μ0 does not lie in the (1-α)100% confidence interval)
  22. Assumptions for using (1-Image Upload 1)100% Confidence Interval for two populations when σ is known
    • 1) Two samples are random & independent
    • 2) Both samples came from two independent, normal populations
    • 3) σ121) and σ222) are known
  23. Assumptions for using (1-Image Upload 2)100% Confidence Interval for t-distribution
    • 1) Two samples are random & independent
    • 2) Both samples came from normal populations
    • 3) σ121) and σ222) are unknown but equal
  24. The point estimator for the unknown common variance σ2 is
  25. To test the hypothesis about unknown p1 & p2
    we combine the information given in both samples to compute estimated variance of p1 & p2
  26. To construct a (1-Image Upload 3)100% confidence interval for p1 and p2
    we do NOT combine the information contained in both samples to compute the estimated variance
  27. A goodness of fit test
    Tests the Null Hypothesis that the observed frequencies follow a pattern or theoretical distribution. The test is goodness-of-fit because the hypothesis tested is how good the observed frequencies fit a given pattern
  28. The Image Upload 4-squared goodness of fit test
    used to test whether of not the sampled multinomial data is in agreement with the hypothesized distribution. OR Testing 3 or more unknown population proportions.
  29. In a Image Upload 5 goodness of fit test, when is the Null Hypothesis rejected?
    A good agreement between the observed and expected frequencies results in a small value of Image Upload 6. A perfect agreement would result in Image Upload 7=0. Thus the Null Hypothesis is rejected if Image Upload 8 is large [upper tail test]
  30. For tests of Independence between Criterion A and B...
    Ho: The two criteria A&B are
    INDEPENDENT or not related (HoI)
  31. For tests of Independence between Criterion A and B...
    HA: The two criteria A&B are
    DEPENDENT or related (HAD)
  32. For tests of independence, Image Upload 9 is
    Image Upload 10Image Upload 11
  33. For tests of Independence, Eij=
    Image Upload 12
  34. For a 2x2 Contingency table, testing for independence for two criteria is equivalent to testing
    H0: P1=P2 vs HA:P1≠P2
  35. Test of Homogeneity
    • A test of homogeneity involves testing the
    • H0: the proportions of elements with certain characteristics in two or more different populations are the same
    • vs
    • HA: the proportions are not the same
  36. Analysis of Variance (ANOVA)
    A procedure used to test the Null Hypothesis that the means of three of more populations are equal.
  37. The grand mean for all k-samples is
  38. SST, SSB AND SSW must always be...
    positive, since it's they're the sum of squares
  39. The point estimator for the unknown common variance σk2 is
    Mean Square within (MSW)
  40. Assumptions for ANOVA
    • a. k-samples are random and independent
    • b.Each of the k-samples came from a normal population
    • c. σ21, σ22, σ23, ,σ2k are unknown but equal
  41. The Statistical Linear Regression model is
    Image Upload 13
  42. SSxx or SSyy must always be...
  43. Sxy can be...
    positive or negative
  44. To Construct a (1-Image Upload 14)100% Confidence Interval for unknown B, use
    Image Upload 15 , where Image Upload 16
  45. When solving for b in the estimated regression model...
    write the equation out in general form first. If there it is "a-bx" that means "a+(-b)x" which implies b is negative
  46. The population correlation
    denoted by Image Upload 17 and defined as the strength of the relationship between two variables, x&y.
  47. The sample correlation
    denoted by r. Since the population correlation Image Upload 18is usually unknown, the point estimator of the population correlation is the sample correlation r.
  48. The range for Image Upload 19 is
  49. if r~1
    we have a perfect relationship between x&y in a positive way. That is, if x increases, y increases.
  50. if r~-1
    if x increases, y decreases. Or if x decreases, y increases
  51. if r~0
    the variables x&y are not related
  52. Coefficient of Determination
    denoted by Image Upload 20 for the population and r2 for the sample. But Image Upload 21 or Image Upload 22 is usually unknown. The coefficient of determination for the samples represents the % of variation in the dependent variable y, explained by the independent variable (x) in the extimated least squares regression model ŷImage Upload 23. The higher the value of r2 the better it is.
  53. The test statistic for Ho: Image Upload 24 is
    Image Upload 25
  54. If no value of Image Upload 26 is given, use
    Image Upload 27
  55. For sample correlation r, approximately equal to 1...
    r≈1 for values of r>80% or .8
Card Set
Intro Stats II Final
Study help for the Stats II final