
1. What is the shape of the distribution for data involving money (generally)?
2. What is the shape (generally) of things that occur in nature?
1, Skewed to the right
2. Symmetric  Bell curve

1, Describe:
Mean
Median
2. What is the major difference between the two?
1. Mean: the average
Median: The number that has half the data bigger than it and half the data smaller than it. 50% on each side.
2. The median is what is typical for the individuals in the data set, whereas the average is often not typical.
This is because high outliers inflate the average, so the mean follows the tail, whereas the median is stuck in the middle.
* The mean is sensitive to outliers and skewness

1. How do you find the five number summary on MiniTab?
2. On a bell curve, where are the mean and median?
3. What do we use to describe what is 'typical'?
1. Go to Stat bar at the top
Click on Basic Statistics, then, Display descriptive statistics..
Put in the variable (dollars) in the variables box,
* Don't put the individuals (countries) in the Byvariables box, otherwise it will give you stats for each country rather than the whole dataset.
2. If the shape of the distribution is completely symmetric, both the mean and median will be in the middle. Therefore mean = median
3. The median, unless the shape is symmetric, then you can use both median and mean.

Describe the 3 numbers used to describe the variability (spread) of the distribution, and how you would calculate each?
1. Range: Max  Min, this is the spread of the data.
*The range is sensitive to outliers or skewness
 2. Interquartile range (IQR): Q3  Q1
 This is the spread of the middle 50% of the data.
3. Standard deviation A measure of the average distance from the middle.
Tells you how tightly the various examples are clustered around the mean in a set of data.
 It is kind of the 'mean of the mean',.
 Technically it is the average square distance from the middle.

1. Describe Q1 and Q3
2. What does the Five Number Summary consist of?
3. What is Q2?
1. Q1 is the dividing line between the bottom 25% of the data and the rest. Q1 is called the first quartile.
Q3 is the dividing line between the top 25% of the data and the rest, 75% of the readings are smaller than Q3. Q3 is the third quartile.
2. 3. The five number summary: Consists of
 a. the minimum
 b. Q1
 c. Median
 d. Q3
 e. Maximum
3. Q2 is the same as the median, dividing the data into 2 pieces. 50% on each side.

How would you describe IQR in a sentence using the health care data?
The middle 50% of the amounts spent on health care are spread over $3473.

1. What is a boxandwhisker plot useful for?
2. What is the Five Number Summary for the following
3. What would be the range, and what would be the IQR?
1. It gives us the shape of the distribution from the five number summary data.
 2.

Minimum: approx 200
Q1: 1000
Median: 2000
Q3: 4500
 Maximum: 9000
 Q1:
 Median
 Q3:
 Maximum:
3. Range: 9000  200 = approx. 8800
IQR: 4500 (Q3)  1000 (Q1) = 3500

1. What 2 things are important to know when doing a Boxand Whisker plot in Minitab?
2. What is a limitation of boxandwhisker plots?
3. What is an important thing to remember when describing the boxplot?
1a. Minitab detects outliers, and won't draw the whisker down to the outler.
b. If you hover over the box, it will tell you the Five Number Summary.
2. It won't show you how many peaks there are.
3. Make sure to talk about outliers, even if there are none.
ex. No countries spent a lot more or a lot less on healthcare than all the others.

1. What is an important point to consider when determining outliers in Minitab?
2. What is the advantage of a histogram over a boxplot, and viceversa?
3. How do you do a sidebyside boxplot if you have more than one category? (2 methods)
1. Minitab has several different formulae for detecting outliers, so we might get different numbers depending on whether we use stemandleaf or a boxandwhisker plot.
The important thing is to identify and describe them.
2. Histograms are better for determining shape, and boxplots are better for outliers.
3. One Y simple
Multiple graphs
 Click byvariables tab, and in the byvariables with groups in separate panels box, put in the category (weather)
 Should look like this
Or go to Stat tab, Basic statistics, display descriptive statistics, fill in variables and byvariables boxes, and then select boxplot from Graphs button.

1. How do you find StDev in Minitab? (temperature data)
2. What are the two data summaries, and when do we use them?
3. Which data summary do we use if the distribution is symmetric and has no outliers?
4. Which data summary do we use if the distribution is symmetric and the data set is large (30 or more readings) with some outliers?
1. Same as Five Number Summary
Stat, basic statistics, display descriptive statistics
Put temperature in the variable box
2a. Mean and standard deviation: When the data does not have outliers, and is not skewed. This is because mean and standard deviation are sensitive to outliers, and won't give an accurate picture of the data.
b. Five Number Summary: When the data is skewed, and/or has outliers.
*Need a picture to decide.
3. We can use either 'mean and standard deviation' or 'five number summary'.
* Use one or the other, not both.
4. You can use mean and standard deviation because the effect of a few outliers is small when the data set is large.
*You can still use the five number summary in this case if you wish.

Define:
1. Population
2. Sample
1. A population comprises ALL the individuals of interst.
2. A sample comprises only the individuals we actually observe
ex. all women under 30, all schools in the district, all trees in a certain forest. etc.
*Samples are obtained in the hope that they represent the population.

1. Does the healthcare data represent a sample or a population?
2. Is the data on the amount of seal pups born a population or a sample?
1. A population, because it comprises of the richest 35 countries, and does not leave any out. This is the entire population selected.
2. It is a population, because the variable is changing so we can't be confident about using it as a sample to project forward or backward in time.
*If the pattern was consistent, then we could feel confident about using it as a sample

Give the notation and name for the following:
1. The mean of a sample
2. The mean of a population
3. The standard deviation of a sample
4. The standard deviation of a population
1. The mean of a sample = x bar
2. The mean of a population = nu
3. The standard deviation of a sample = s
 4. The standard deviation of a population = sigma

Are 'all young women in the world' a population, or a sample. Why?
Population
If you've got a group and can't generalize it, it's a population.
Even if they had to use a sample to get the information.

