1. Define inference
2. What is the difference between a parameter and a statistic?
3. Describe the two branches of inference
1. Inference: is the process of generalizing from sample to population.
2. A parameter is a number that describes a population, such as μ and σ
A statistic is a number that describes a sample, such as x̅ and s.
*Generally, parameters are not known. We use statistics to estimate them.
3a. Estimation: Using statistics to produce estimates of parameters. For example, we use the sample mean (x̅) to estimate the population mean (μ). Likewise, we use the sample standard deviation (s) to estimate the population standard deviation (σ).
*In estimation, we do not have any preconceived idea about the parameter.
3b. Testing (ie. significance testing): we first identify a preconceived idea that we have about the parameter. Then we use the data in the sample to measure the strength of evidence for the preconceived idea.
ex. Researchers think that .....
1. What MUST we have in order to obtain a reasonable estimate of a population parameter?
2. What is the difference between a point estimate, and a confidence interval?
Give an example of each.
3. What form does a confidence interval take?
1. We must have a representative sample from the population of interest.
2. A point estimate is a single number that estimates a parameter.
ex. See you at the coffee shop at 5:00pm.
Whereas a confidence interval is a collection of numbers that estimates a parameter.
ex. See you at the coffee shop at 5:00pm, plus or minus 10 minutes.
3. point estimate ± margin of error.
1. How would we 'interpret the point estimate', if from our sample of 74 7th grade-school girls, we found x̅ to be 110.73?
2. What does the margin of error in a confidence interval take into account, and what does it NOT take into account?
3. What is the basic formula for a confidence interval for μ?
4. What does this formula assume, and what could we do differently if we could not make this assumption?
1. We estimate that the average IQ of all seventh-grade girls in the district is 110.73.
2. The margin of error
takes into account the fact that there is variation amoungst random samples (ie. they're not all the same.
The margin of error
does NOT take care of bias in sampling. If the sample sufferes from under-coverage, non-response or response bias, throw it away. It is useless!
3. x̅ ± z* (σ/√n)
z* is a standard normal value from the z* line of the t-table
- For example, a 90% confidence interval has a z* score of 1.645 according to the table.
4. This formula requires that we know the value of σ. We CANNOT use s, the standard deviation of a sample in place of σ.
If we know s instead of σ, we can use the formula:
x̅ ± t* (s/√n)
*This may be easier in many cases, because Minitab can always calculate s from the raw data, whereas we will need to be given the σ of a population.
1. How would you go about finding a 90% confidence interval (CI) for the average IQ of all seventh-grade girls in the school district using Minitab? (3 Steps)
2. Interpret the confidence interval above.
- 1a. Stat - Basic Statistics - 1-sample Z
- 1b. Make sure 'One or more samples, each in a column' is selected - Select IQ - 'Known Standard deviation' type in 11 - Click on 'Options'
- 1c. in 'confidence level' box, type in 90.
As Minitab tells us, the 90% CI is (108.63, 112.83)
2. The average IQ test scores for all seventh-grade girls is between 110.73 and 108.63
**Confirm that this is the correct way of interpreting CI
How would you use the t-table to find t* when using the formula x̅ ± t*(s/√n)? (3 steps)
n = 20
1a. In t-table, look down the 90% column.
1b. Find degrees of freedom
*Degrees of freedom = n - 1, and is always a whole number
20 - 1 = 19
- 1c. In 90% CI column, go down to row 19 to find t*
t* = 1.729
What is the effect of increasing the level of confidence, say from 90% to 95%?
- The margin of error gets bigger.
- The intervals get longer/wider
1. In order to use the 1-sample Z formula, x̅ ± z*(σ/√n), what is the theoretical requirement?
2. What are the two ways of knowing this?
3. If we don't know the shape of the population, how can we find out, and what is a potential problem in doing this?
1. That the sample mean (μx̅) has at least an approximately normal distribution.
2. Either the population is normal, or the sample is large enough (roughly n ≥ 30)
3. We can draw a histogram of the sample, but histograms of small samples are often very different from histograms of the population from which the sample came.
So we can't trust the histogram of a small sample to tell us much about the shape of the population.
1. What are the four conditions for using 1-sample z?
2. How can we ensure that these four conditions are met? (2 steps)
3. What does it mean when a question on the test asks if the intervals are valid?
1a. We must have a representative sample
1b. There should be no outliers in the sample, as x̅ is sensitive to outliers
1c. The population should be normal
1d. Outliers and non-normality do not matter if the sample is large (n ≥ 30)
2a. Check for outliers by making a box-plot of the sample
2b. Check for normality by drawing a histogram of the sample.
*If the tallest bar in the histogram of the sample is not at the extreme left or right, then it's likely safe to assume that the population is normal.
3. Check the assumptions / conditions for 1-sample Z as outlined above.
- a. Must have a representative sample
- b. If n ≥ 30, we don't need to worry about outliers
- c. More than 30, so we don't need to worry about normality.
*** If n < 30, go to 1 sample z, boxplot and histogram to find out if we can still use this method.
1. What does SRS stand for?
2. When do we use 1-sample z, and when do we use 1-sample t?
1. Simple Random Sample
*SRS means that the sample is representative because it is randomly selected.
2. 1-sample z is used when we know the value of the population standard deviation, σ
1-sample t is used when we DO NOT know the value of the population standard deviation, σ.
1. If asked to give a point estimate, how do you find it on Minitab?
2. Interpret the estimate (melting point of copper at 1084.8°C)
3. If we find that a 90% CI for μ is (1084.63, 1084.97), how do we interpret the interval?
4. When asked, is the interval valid, what can we say if the question does not tell us whether or not it is a representative sample?
1. Go to basic stats, and find the mean
2. We estimate that the average of all possible measurements of the melting point of copper is 1084.8°C.
3. We estimate that the average of all possible measurements of the melting point of copper is somewhere between 1084.63°C and 1084.97°C.
4. No information about the selection process is given, so we don't know if the sample is representative.
In practice, we would contact the person who did the measurement to find out more information.
1. What's more useful, 1-sample z or 1-sample t?
2. What is the difference between violating the first condition (representative sample) and the second and third?
3. What are the three possibilities situations for dealing with condition 1 when determining the validity of an interval?
1. While 1-sample z is used to introduce inference in class, it is not very useful in reality. This is because we are generally not given the standard distribution of a population.
2. If the first condition is violated and the sample is not representative, then the interval is NOT VALID.
If there are outliers, or if the shape is not normal, we can say that the interval is less accurate, but it may still be valid.
3a. Random = valid
b. No info = In practice, I would contact the person who produced the data.
c. Evidence it is not random = invalid
1. What would we do differently in Minitab if we didn't have a data set, but only had the parameters for 1-sample z?
2. What do you do with the negative numbers when interpreting the interval?
Give an example of CI(-4.527, -2.648) from the sample of bone mineral loss from nursing mothers.
3. When interpreting the interval, what 2 elements MUST be included and why?
1. If you don't have the raw data, click summarize data.
2. Say that it decreased by the number, or there was a loss of the number. Don't use negative numbers when describing the interval.
ex. We estimate that the average amount of bone mineral lost in 3 months by all nursing mothers is somewhere between 2.648% and 4.527%.
3. Must have the word average to describe the mean, because non-stat students will not understand what mean means.
Must have the word all, becaues we are using the sample to make an inference about the entire population.
1. In x̅ ± z*(σ/√n), what part is the margin of error?
2. What happens when we increase n?
2. When we increase n (sample size), the margin of error gets smaller.
1. Define confidence
2. Give an example of what confidence means.
3. Give an example using IQ scores with a 95% confidence level.
4. What is the problem with most samples?
5. What is the difference between interpreting the interval and interpreting the confidence using IQ scores?
1. Confidence is the success rate for the method.
2. If we use 90% confidence, then 90% of random samples will yield CIs that enclose μ, and 10% of random samples will yield CIs that do not enclose μ.
3. There is a 95% chance that the sample will enclose the mean, and a 5% chance that it will not.
This means that there is a 5% chance that we will get a sample that is either all people with high IQ's above the mean, or low IQ scores below the mean.
4. In most cases, we only have one sample, and so we don't know if the sample is good, or bad.
5. Interpret the interval: I estimate that the average IQ score of all people in the population is between 94 and 110.
Interpret the confidence: I used a method that has a 90% success rate.
*Don't use the word 'confidence' when interpreting the confidence, because people who have not done a stat class will misinterpret what you mean by confidence.
1. Why is every statistic a variable, and how is it measured?
2. What is the formula for measuring how much x bar varies from sample to sample?
3. What is standard error?
4. What are the two equations for determining the standard error of the sample mean (S.E.M or s.e.)
1. Every statistic is a variable because it varies from sample to sample. Every statistic is a variable and has a standard deviation.
The variation of a statistic is measured by its standard deviation.
2. σx̅ = σ/√n
3. The standard error of a statistic is an estimate of its standard deviation.
For exapmle, the standard deviation of the statistic x̅ is σx̅ = σ/√n
4. SEM = s/√n
s = SEM(√n)
1. What happens to the shape of the distribution as degrees of freedom increase?
2. How do you find SEM in Minitab?
3. What is the difference between the conditions for 1-sample z and 1-sample t?
1. As degrees of freedom increase, they look more like standard normal..
2. Stat - Basic statistics - Display Descriptive Statistics
Then put in your variable, and click ok
SE mean is your SEM (the standard error of the mean)
3. 1-sample z requires a sample size of n ≥ 30, whereas 1-sample t requires a sample size of n ≥ 40 in order for outliers and non-normality to be disregarded.