-
1. When do we use 2-sample t, and when do we use paired t?
2. What is the major difference between paired-t and 2-sample t in the PLAN stage?
3. What are the conditions for using 2-sample t?
1. 2-sample t is used when we have two unrelated samples
paired t is used when we have two related samples
2. With 2-sample t, we won't do the subtraction right away like we do in paired-t.
- ex. 2-sample t: Ho: μM ≤ μF
- Ha: μM > μF
- ex. paired t: Ho: μA - μB ≤ 0
- Ha: μA - μB > 0
3.1. We must have two unrelated representative samples
3.2. There should be no outliers in the samples. (Check each sample separately)
3.3. Each population should be normal. (Check each sample separately)
3.4. Outliers and non-normality do not matter if the total number of readings is large (n1 + n2 ≥ 40)
-
1. What do we have to do differently in 2-sample t compared to paired t when checking for outliers and normality?
2. What do you do in Minitab when both samples for 2-paired t are in one column?(See ex. 21:53, p.505)
3. What do you do in Minitab when both samples for 2-paired t are in different columns?
1. Check for outliers by making a box-plot of each sample
Check for normality by drawing a histogram of each sample. I
f the tallest bar in the histogram of the sample not at the extreme left or right, then it's likely safe to assume that the population is normal.
- 2. If both samples are in one column, then use 'descriptive stats' with the "by" variable option. Hit "graphs" and select boxplot and histogram.
-
-
-
-
-
3. If the samples are in different columns, then either use descriptive stats and "graphs" or select "graph" in the Minitab toolbar.
-
Complete the FOUR STEP PROCESS
- STATE:
- How strong is the evidence that the average number of drinks all second year male students at this college claim to consume per sitting exceeds the average number of drinks all female second year students at this college claim to consume per sitting?
- PLAN:
- μM = Average number of drinks claimed to be consumed by all second year male students at this college per sitting.
μF = Average number of drinks claimed to be consumed by all second year female students at this college per sitting.
- After rewriting
- Ho: μF - μM ≥ 0
- Ha: μF - μM < 0
We'll use 2-sample t because we have two unrelated samples
- SOLVE:
- Conditions for 2-sample t
- 1. 2 unrelated representative samples
- 2. No outliers (check each boxplot separately)
- 3. Normal population (check each histogram separately)
- 4. If n1 + n2 ≥ 40, then disregard #2 and #3.
- Checking the conditions
- 1. The text tells us to assume SRS.
- 2-3. Since the sample size is over 40, outliers and non-normality don't matter.
- Test statistic: t = -5.19
- p-value: 0.000
- CONCLUDE:
- Since the p-value is less than 0.01, we have very strong evidence that the
- average number of drinks claimed to be consumed by all second year male students at this college per sitting exceeds the average number of drinks claimed to be consumed by all female students per sitting at this college.
-
1. When using 2-sample t and both samples are in the same column, what MUST you check in Minitab to ensure that you have the correct hypothesis written down?
2. What are the steps to ensure you do this correctly in Minitab?
(Use our hypothesis that the average number of drinks consumed by male students exceeds the average number of drinks consumed by female students).
3. After completing these steps in Minitab, what do we need to change in our FOUR STEP PROCESS?
- 1. You need to check which variable comes first.
-
In the example about drinking on campus, the data for female students comes first (stacked above) male students, so minitab will subtract male from female when doing the calculations. (f - m).
- 2a. First, go to 2-sample t, and pull down the menu to select 'Both samples are in one column'.
-
-
-
- 2b. Since our hypothesis is Ha: μM > μF, we would like to do a > equation, but since Minitab put the data for female students on top, we have to do a < equation, referring to the reverse conclusion that the number of drinks consumed by female students are less than then number consumed by male students.
- We have to change the Alternative hypothesis in Minitab to 'Difference < Hypothesized difference'.
-
- Doing it this way will give us a p-value of 0.000
- If we were to do a > equation, then the p-value would be 1.000 and we would wrongly claim that there is no evidence that male students drink more than female students.
-
- *The point estimate is the 'difference'. Here, it is μF - μM = -2.245. If we were to calculate μF - μM, then the difference would be positive. 2.245.
- 3. We have to rewrite the PLAN step from
- Ho: μM ≤ MF
- Ha: μM > MF
to
- Ho: μF - μM ≥ 0
- Ha: μF - μM < 0
-
1a. What is the point estimate?
1b. Interpret the point estimate for the example about student drinking
2. When estimating a confidence interval in Minitab for 2-sample t, what two steps do we need to remember?
3a. Use a 90% confidence interval to estimate the difference in the average number of drinks consumed by male and female students
3b. Interpret the estimate.
1a. The point estimate of μF - μM = -2.245
1b. I estimate that the average number of drinks claimed to be consumed by all second year male students at this college per sitting exceeds the average number of drinks claimed by all second year female students at this college per sitting by 2.45 drinks.
2. In the 'options' section, make sure that 'Hypothesized difference' is set to 0. *It will always be 0 for this class*
- Make sure that 'Alternative hypothesis' box is set to 'Difference ≠ hypothesized difference'
-
3a. 90% CI for μF - μM: (-2.961, -1.529)
3b. I estimate that the average number of drinks claimed to be consumed by all second year male students at this college per sitting exceeds the average number of drinks claimed to be consumed by all second year female students at this college per sitting by somewhere between 1.529 and 2.961 drinks.
-
1. What are the two formulae for determining SEM?
2. How would you obtain the data for this problem on Minitab? (3 parts)
3a. What is a 90% CI for the above example?
3b. Interpret the CI
1. SEM = s/√n or s = SEM(√n)
- 2. Since we have no data, we have to use 2-sample t 'summarized data'.
-
Since SEM is given instead of s, we have to do the calculation s = SEM(√n)
- s1 = 0.05(5) = 0.25
- s2 = 0.03(5) = 0.15
-
- Now we can plug the information (sample size, sample mean and std. dev. for each sample) in the table
-
3a. A 90% CI for μA - μN: (0.0318, 0.2282)
3b. I estimate that the average percentage of time zoned out by all who consume alcohol exceeds the average percentage of time zoned out by all who don't consume alcohol by between 3.18% and 22.82%.
*If you have a fraction, make sure to convert it into a percentage when doing the interpretation.
-
1. Give examples of inference for a population percentage or proportion? (4)
2. What is "behind every percentage?"
Use the above examples to demonstrate this.
3. What is a major difference between estimating averages and percentages in terms of the data collected?
1a. Estimate the percentage of UFV students who are less than 18 years old
1b. Compare the percentage of overweight people in the Fraser Valley with the 2004 figure of 41% for all of Canada.
1c. Estimate the percentage of people at UFV who are left-handed
1d. Estimate the percentage of UFV students who brought a vehicle to campus today.
2. Behind every percentage is a closed question - a question that has a Yes or No answer.
- a. Are you less than 18 years old? Yes or No
- b. Are you overweight? Yes or No
- c. Are you left-handed? Yes or No
- d. Did you bring a vehicle to campus today? Yes or No
3. In order to deal with a percentage, the data collected are categorical. Each individual yields a Yes or a No
In order to deal with an average, the data collected are quantitative. Each individual yields a number.
-
1. Describe the parameter we use when estimating population percentages, and how it is denoted?
2. In general, what do we use to estimate this parameter, and how is it denoted?
1. The parameter we want to estimate is denoted as p.
p = population proportion
It is the fraction of yes answers in the sample
2. We estimate p by the sample proportion, denoted as p̂ (p hat).
p̂ = the fraction of yes answers in the sample.
-
1. What is the formula for determining the fraction of yes answers in a sample, and what does each part stand for?
2. How would you go about determining the percentage of left-handed people on campus by using the class as a sample? (4 steps)
3. Interpret the estimate
1. p̂ = x / n
p̂ = the fraction of yes answers in the sample
x = the number of yes answers in the sample
n = the number of individuals in the sample
2a. Determine the number of individuals in the class (35)
b. Ask how many people are left-handed. (2)
- c. Use the formula
- p̂ = x / n
- d. Multiply by 100
- 0.057 x 100 = 5.7%
3. We estimate that 5.7% of all people on campus are left-handed.
-
1. What do we do when estimating physiological variables?
2. What is the difference between a percentage and a proportion?
3. What do you need to do when you go to interpret a proportion (fraction)?
1. Ask if there is any reason that the sample would be different than the larger population. Generally, you can infer about physiological variables to larger populations because there is generally nothing unique about the people in the sample.
- Like the % of left-handed people on campus.
- But always use your intuition on this one.
2. A percentage is a number between 0 and 100
A proportion is a number between 0 and 1.
- Multiply proportion by 100 to get percentage.
- Divide percentage by 100 to get proportion.
- 3. Convert it into a percentage.
- All interpretations should be in percentages because it is easier for people to understand.
-
1. What are the 2 conditions for using the "large sample" formula for calculating CI?
2. What is the formula for the large sample confidence interval? (don't need to memorize, just recognize)
3. How do you calculate a confidence interval using the large sample formula in Minitab?
1a. The sample is representative of the population
1b. The sample must contain at least 15 'yes' answers and at least 15 'no' answers.
*Minimum 30, and at least 15 of each*
- 3. Go to 'basic statistics' and then 1-proportion
-
Number of events = insert number of yes answers (x)
- Number of trials = insert number in the sample (n)
-
- Click 'options' and change the drop down box to 'normal approximation'. Also put in your confidence level.
-
- Click 'ok' and you will get your CI
-
1. In this question, what is the population and what is p?
2. Interpret the estimate
3.
Use a 90% confidence interval to estimate the percentage of staph infections amongst all patients admitted to hospitals for surgery.
4. Interpret the interval
5. Is the interval valid?
1. The population consists of all surgery patients.
p is the fraction of surgery patients who have staph
2. p̂ = x / n
p̂ = 1251 / 6771 = 0.18475
0.1847 x 100 = 18.48%
We estimate that the 18.48% of all surgery patients have staph.
3. 90% CI for p: (0.177001, 0.192516).
4. We estimate that between 17.7% and 19.22% of all surgery patients have staph.
5. The conditions for the large sample CI formula are:
1. Sample is representative of the population - 2. The sample must contain at least 15 yes answers and at least 15 no answers.
- Checking the conditions1. The text tells us it uses a random sample.
- 2. There are 1251 yes answers in the sample, and 5520 no answers
*subtract yes answers from sample size to get number of no answers*
-
1. If the conditions for the large sample method are violated, what do we do?
2. Are the conditions for the use of the large-sample CI met? Explain
3. Give a 90% confidence interval for the proportion of those receiving gastric bypass surgery that maintained at least a 20% weight loss six years after surgery
4. Interpret the interval
1. Don't use the formula. If it is not representative, throw it out.
If it is representative but the sample size is too small (less than 15 yes answers and/or less than 15 no answers), then we need to use the plus-four method.
2. First, we need to find out how many yes and no answers we have.
We are only given a percentage (76%), so we'll have to turn it into a proportion.
Yes answers = 418 x 0.76 = 317.6
Now round that number to the nearest whole number
- Yes answers: 318
- No answers: 418 - 318 = 100
Checking the conditions
1. No information is given about the selection process of the 418 individuals in the sample. In practice we would need to contact the person who collected the data to find out.
2. We have 318 yes answers and 100 no answers in the sample, so it is ok to use the large-sample method.
3. 90% CI for p: (0.726442, 0.795088)
4. I estimate that between 72.64% and 79.51% of all extremely obese people who receive gastric bypass surgery maintain at least 20% weight loss six years after surgery.
-
1. What are the 2 conditions for using the plus-four method?
2. What is the formula for getting a point estimate using the plus-four method?
3. How do you get a confidence interval on Minitab using the plus-four method?
1a. The sample size is representative of the populaiton
1b. The original sample must contain at least 10 individuals (n ≥ 10).
- 2. p̃ = (x + 2) / (n + 4)
- *We add 4 imaginary individuals to the equation. 2 yes answers, and 2 no answers.
3. Use Minitab 1-proportion
For events, enter the adjusted number of yes answers in the sample.
Adjust the number of yes answers (x + 2)
For trials, enter the adjusted sample size.
Adjusted sample size = (n + 4)
-
1. In what practical situation is the plus-four method very useful?
2. If the conditions are met for both the large sample method and the plus-four method, which one do you use?
1. When estimating the incidence of an extremely rare disease, where there will be very few yes answer
2. If the conditions are met for the large sample method, it doesn't matter which one you use
-
1. What is the formula for calculating the sample size for achieving a given margin of error in a CI for p?
2. What does p* stand for?
3. What do you do if p* is not available?
1. n ≥ (z*/m)² p*(1-p*)
*Put on cheat sheet side by side the other equation for estimating sample size
2. p* is a previous estimate of p if available.
It usually will come from a pilot study where they have an idea of what the population proportion will be
3. If p* is not available, use p* = 0.5.
-
1. Starting with a 75% estimate for Italians, how large a sample must you collect in order to estimate the proportion of PTC tasters within ±0.04 with 90% confidence?
*From z table, z* at 90% confidence is 1.645
2. Estimate the sample size required if you made no assumptions about the value of the proportion who could taste PTC.
- 1. You get 1 point for writing out the formula!
- n ≥ (z*/m)²p*(1-P*)
n ≥ (1.645 / 0.04)² 0.75(1 - 0.75)
n ≥ (1691.2656)(0.1875)
n ≥ 317.112 - Round up
n ≥ 318
- 2. You get 1 point for writing out the formula!
- n ≥ (z*/m)²p*(1-P*)
n ≥ (1.645 / 0.04)² 0.5(1 - 0.5)
n ≥ (1691.2656)(0.25)
n ≥ 422.8164 - Round Up!
n ≥ 423
-
1a. Survey companies like Angus Reid often say that the estimate of a percentage is "accurate to within 3 percentage points 19 times out of 20. This is a stock phrase.
What does 19 times out of 20 tell us?
1b. What does "accurate to within 3 percentage points" tell us?
2. What sample size will this get us?
z* at 95% confidence = 1.960
- 1a. That they are using a 95% confidence level.
- 19/20 x 100 = 95
1b. That the margin of error is 3%, or m=0.03
2. A sample size just over 1000.
- You get 1 point for writing out the formula!
- n ≥ (z*/m)²p*(1-P*)
n ≥ (1.96/0.03)² 0.5(1-0.5)
n ≥ (4268.4444)(0.25)
n ≥ 1067.111 -> Round up
n ≥ 1068
-
We know that students are reluctant to report cheating. Suppose that we are interested in the percentage of students who would report cheating if they saw it. Suppose we suspect that the percentage of undergrad students who would report cheating is less than 20%.
FOUR STEP PROCESS
STATE: How strong is the evidence that less than 20% of all undergrad students would report cheating if they saw it?
- PLAN:
- p = proportion of all undergrad students who would report cheating ifthey saw it.
We'll use 1-proportion because we have 1 sample and we are dealing with a proportion.
- SOLVE:State the conditions:
- 1. Representative sample
- 2. npo ≥ 10 and n(1-po) ≥ 10
- Checking the conditions
- 1. The text tells us they used an SRS
- 2. n = 172, po = 0.2
(172)(0.2) = 34.4 and 172(1-0.2) = 137.6
The test is valid
- Test stat = -2.94
- p-value = 0.002
- CONCLUDE:
- Since the p-value is less than 0.01, we have very strong evidence that less than 20% of all undergrad students would report cheating if they saw it.
-
1. What do you have to do differently when doing a significance test for a proportion in the STATE step compared to a other significance tests?
2. What do you have to remember when doing a significance test for a proportion in the PLAN stage?
1. Don't use the word average, because we're estimating a percentage instead.
ex. How strong is the evidence that less than 20% of all undergrad students would report cheating if they saw it?
2. Use a fraction (0.2) instead of a percentage (20%) both when writing out the plan stage, and for computing in Minitab.
- ex.
- Ho: p ≥ 0.2
- Ha: p < 0.2
-
1. What are the conditions for using Minitab's 1-proportion to conduct a significance test for a proportion?
2. What are the steps to doing a significance test on Minitab?
Use the following example:
1a. The sample is representative of the population
1b. npo ≥ 10 and n(1-po) ≥ 10
* po is number in hypothesis
2. Go to 1-proportion
Put in number of 'yes' answers in events box
Put in sample size (n) in trials box
- Click on 'perform hypothesis test' box, and put in the hypothesized proportion as a fraction
-
- Click on 'options' and put in the appropriate equation. This time, it is a 'less than' equation.
- Change 'method' to 'normal approximation' instead of exact. VERY IMPORTANT
-
- click 'ok'
|
|