1. What is a density function?
2. For heights of young women, what does N(64.2, 2.8) represent?
1. Density Function: The line that connects the bars of a histogram if you smoothen it out.
2. N(64.2, 2.8)
N = Normal Distribution (shape)
64.2 = the mean μ of the population of all young women
2.8 = the std. dev. σ of the population of all young women.
1. How do we find the percentage of young women shorter than 66 inches in Minitab?
2. What side of the 'input constant' number does Minitab make calculations for, and what do you need to do if you want to calculate the reverse?
3. How would you figure out the percentage of heights of young women between 66 and 70 inches?
4. How do you figure out the height that divides the tallest 10% of young women from the rest? (on Minitab)
1. Calc - probability distributions - normal - Select cumulative probability - enter the mean and std dev - select input constant - type in the 66
0.739842 x 100%
73.98% of young women are shorter than 66 inches.
2. ***Minitab only calculates on the left.
* If you want to figure out the heights of young women over
66 inches, you subtract the number you get from 100.
ex. 100 - 73.98 = 26.02%
3. Find the percentage for 70, and the percentage for 66 and subtract the two numbers
ex. 70 in is 98.09 and 66 in is 73.98
98.09 - 73.98 = 24.1%
The percentage of heights of young women between 66 and 70 inches is 24,1%
4. Calc - probability distributions - normal - Select Inverse cumulative probability
- enter the mean and std dev - select Input constant - type in 0.1
Make sure to convert the percentage to a decimal. 10% = 0.1, 86% = 0.86 etc.
10% = 60.61 inches
1a. What does 'z' stand for?
b. What is the distribution (μ & σ) for the Z table?
2. What is another way of describing the first quartile?
3. How many quartiles are there, and how many pieces do they divide the data into?
4. How many percentiles are there, and how many pieces do they divide the data into?
5. Fill in the blanks.
1% of the numbers in the data set are smaller than the ____ percentile, and ____ are bigger.
75% of the numbers in the data set are smaller than the ____ percentile and ____ are bigger. Another name for the ____ percentile is ____.
1a. Standard normal
1b. N(0, 1)
2. 25%, or 0.25 when inputing in Minitab
3. There are 3 quartiles - Q1, Q2 (median) and Q3, and they divide the data into 4 pieces.
4. There are 99 percentiles, and they divide the data into 100 pieces.
5. 1% of the numbers in the data set are smaller than the first percentile, and 99% are bigger.
75% of the numbers in the data set are smaller than the 75th percentile and 25% are bigger. Another name for the 75th percentile is the 3rd quartile (Q3).
1. What is a common mistake people make when talking about percentiles?
2. How do we know that the variable has a normal distribution?
3. What is important to remember when comparing the distribution of a sample and the population from which it came?
1. People often say that they are 'in' the 90th percentile. You can't be 'in' a percentile because it is the cutoff point (line). You can either be above the 90th percentile, or below it.
2. If there is no data set, we will be told. If it does have a data set, we can get some idea by making a histogram of the sample.
3. The distribution of a small sample does not necessarily look like the distribution of the population from which it came.
Consequently we cannot rely on histograms of small samples to tell us much about the shape of the population.
1. What is the formula for converting 'X' into a standard normal variable 'Z'?
2. What is the formula for converting z score into x scores?
1. z = x - μ / σ
2. x = zσ + μ
1. Does the z table work to the right or the left?
2. If you want to figure out the reverse, what are the two methods?
1. Like Minitab, z tables give percentages to the left
of the z score.
2a. Convert the z score to a negative if it is a positive, or a positive number if it is a negative.
If you want to know what is to the right of z-score 1.25, find the cumulative proportion for -1.25.
-1.25 is 0.1056 x 100 = 10.56%
2b. Convert the z score to a percentage, and subtract it from 100%.
ex. If z-score is 1.25, it gives a cululative proportion of 0.8944.
- 0.8944 x 100% = 89.44%
- 89.44% of z values are less than 89.44%
100 - 89.44 = 10.56%
So 10.56% of z-values are more than (to the right of) a z-score of 1.25
1. What are the three steps to finding the percentage of young women over 66 inches using the z-table? N(64.2, 2.8)
2. In a bell-curve (normal distribution), what is the percentage that falls between the following ranges:
a. 1 standard deviation from the mean
(Between μ - σ and μ + σ)
b. 2 standard deviations from the mean
(Between μ - 2σ and μ + 2σ)
c. 3 standard deviations from the mean
(Between μ - 3σ and μ + 3σ)
- Step 1: Draw a picture that includes the following information:
- Step 2: Standardize the x variable
- z = x - μ / σ
Step 3: Use the z-tables to find the percentage.
** For percentages to the right, or over 66 inches, you need to either use the reverse z-score (negative if you have a positive number) or subtract the percentage you get from 100.
2a. 68% - b. 95% - c. 99.7%
1. Are the following variables positively or negatively associated?
a. GPA and time spent on homework
b. GPA and hours spent working
c. GPA and stress
d. Time spent practicing golf, and score
2. When analyzing two variables (in a scatterplot), what type of variables do they have to be?
- 1a. Positive,
- b. Negative
- c. Both positive and negative (a bit of stress can motivate you to do better, but too much and you'll start shutting down.
- d. Negative - As you spend more hours practicing, your golf score will go down.
2. Must be quantitative variables.
1. What is the difference between a response variable and an explanatory variable?
2. In the example of coral reef growth and sea surface temperature, which is the response variable and which is the explanatory variable?
- Response variable - Measures the outcome of a study
*Whatever you're estimating is your response variable
- explains or influences changes in the response variable
2. Coral growth is the response variable because this is what we are trying to estimate.
Sea temperature is the explanatory variable because this is what we are using to explain the changes in coral reef growth.
1. What does a scatterpolot show us?
2. Describe the four terms we use to describe the relationship between the variables in a scatterplot?
- 1. A scatterplot shows the relationship
- between two quantitative variables
- Direction: (or association) is either positive or negative. The direction is positive if the response variable increases when the explanatory variable increases, and is negative if the response variable decreases when the explanatory variable increases (and vice-versa).
The shape of the curve. We are only going to use linear (straight line), but there are also non-linear forms.
Some of these include growth curves, exponential curves and parabolas (up then down)
How close are the dots to a smooth line drawn through the dots?
Exceptions to the general relationship.
1. In Minitab, how do you make a scatterplot?
2. How would we describe a scatterplot in a sentence or two using coral growth and sea temperature as an example.
3. How would we describe a scatterplot in a sentence or two using suicide rate and homicide rate as an example
1. Graph - scatterplot - select the response variable for y and the explanatory variable for x
2. We see a fairly strong negative linear relationship. All locations followed that pattern (if there are no outliers).
or if there is an outlier as the dot on the right could suggest....
In one location, the growth was lower than one would expect for that temperature, at a temperature of 26.6 degrees and a growth rate of 0.79.
*hover over the dot to see the values for the two variables
*This describes the direction, form, strength and outliers. Make sure to include outliers in the description as above, even if there are none.
3. We see a weak non-linear relationship that increases and then decreases. All counties followied this pattern.
1. What is the "very specific meaning" for correlation? (4 parts)
2. How is correlation denoted?
- 1. Correlation:
- a. is a number (between -1 and 1)
- b. that measures the direction and strength of
- c. the linear relationship
- d. between two quantitative variables.
2. Correlation is denoted as r
1. Correlation (r) can only be used to measure what type of relationship between what type of variables?
2. What does a positive value for r mean and vice-versa?
3. What does r=1 mean, what does r= -1 mean?
4. What does r=0 mean?
5. What must we also consider when determining the strength of r, and give an example
1. Correlation can only be used to measure the linear relationship between two quantitative variables
2. A positive value for r (r = 0.9) indicates a positive association between the two variables, whereas a negative value for r (r = -0.1) indicates a negative relationship between the variables.
3. If r = 1, it means the dots lie exactly on any forward-leaning straight line
If r = -1, it means the dots lie exactly on any backward-leaning straight line.
*therefore, the closer the dots are to a forward-leaning straight line, the closer they are to 1, and the closer the dots are to a backward-leaning straight line, the closer they are to -1.
4. r = 0, means that there is no linear relationship; however, there might be a very strong non-linear relationship as r cannot be used to measure the strength of non-linear relationships.
5. Sample size -
ex. a correlation of 0.8 might be very strong if there are only a few individuals in the data set, whereas a correlation of 0.8 might be quite weak if there are thousands of individuals in the data set.
1. How do we find the correlation (r) on Minitab?
2. What is r between sea surface temperatures and coral growth?
1. What is the problem with the following sentence, and how would we fix it?
Gender and salary are correlated because men earn more than women
2. What is the problem with the following sentence, and how would we fix it?
Fuel consumption and speed are correlated
1. Gender is NOT a quantitative variable, so we can't speak of correlation. Instead we can say the two variables are related, connected or associated.
2. Fuel consumption and speed are NOT correlated because they have a non-linear relationship. Fuel consumption goes up until you reach 60mph, then it goes down.
You can say they are connected or related.