-
1. What is the shape of the distribution for data involving money (generally)?
2. What is the shape (generally) of things that occur in nature?
1, Skewed to the right
2. Symmetric - Bell curve
-
1, Describe:
Mean
Median
2. What is the major difference between the two?
1. Mean: the average
Median: The number that has half the data bigger than it and half the data smaller than it. 50% on each side.
2. The median is what is typical for the individuals in the data set, whereas the average is often not typical.
This is because high outliers inflate the average, so the mean follows the tail, whereas the median is stuck in the middle.
* The mean is sensitive to outliers and skewness
-
1. How do you find the five number summary on MiniTab?
2. On a bell curve, where are the mean and median?
3. What do we use to describe what is 'typical'?
1. Go to Stat bar at the top
Click on Basic Statistics, then, Display descriptive statistics..
Put in the variable (dollars) in the variables box,
* Don't put the individuals (countries) in the By-variables box, otherwise it will give you stats for each country rather than the whole dataset.
2. If the shape of the distribution is completely symmetric, both the mean and median will be in the middle. Therefore mean = median
3. The median, unless the shape is symmetric, then you can use both median and mean.
-
Describe the 3 numbers used to describe the variability (spread) of the distribution, and how you would calculate each?
1. Range: Max - Min, this is the spread of the data.
*The range is sensitive to outliers or skewness
- 2. Inter-quartile range (IQR): Q3 - Q1
- This is the spread of the middle 50% of the data.
3. Standard deviation A measure of the average distance from the middle.
Tells you how tightly the various examples are clustered around the mean in a set of data.
- It is kind of the 'mean of the mean',.
- Technically it is the average square distance from the middle.
-
1. Describe Q1 and Q3
2. What does the Five Number Summary consist of?
3. What is Q2?
1. Q1 is the dividing line between the bottom 25% of the data and the rest. Q1 is called the first quartile.
Q3 is the dividing line between the top 25% of the data and the rest, 75% of the readings are smaller than Q3. Q3 is the third quartile.
2. 3. The five number summary: Consists of
- a. the minimum
- b. Q1
- c. Median
- d. Q3
- e. Maximum
3. Q2 is the same as the median, dividing the data into 2 pieces. 50% on each side.
-
How would you describe IQR in a sentence using the health care data?
The middle 50% of the amounts spent on health care are spread over $3473.
-
1. What is a box-and-whisker plot useful for?
2. What is the Five Number Summary for the following
3. What would be the range, and what would be the IQR?
1. It gives us the shape of the distribution from the five number summary data.
- 2.
-
Minimum: approx 200
Q1: 1000
Median: 2000
Q3: 4500
- Maximum: 9000
- Q1:
- Median
- Q3:
- Maximum:
3. Range: 9000 - 200 = approx. 8800
IQR: 4500 (Q3) - 1000 (Q1) = 3500
-
1. What 2 things are important to know when doing a Box-and Whisker plot in Minitab?
2. What is a limitation of box-and-whisker plots?
3. What is an important thing to remember when describing the box-plot?
1a. Minitab detects outliers, and won't draw the whisker down to the outler.
b. If you hover over the box, it will tell you the Five Number Summary.
2. It won't show you how many peaks there are.
3. Make sure to talk about outliers, even if there are none.
ex. No countries spent a lot more or a lot less on healthcare than all the others.
-
1. What is an important point to consider when determining outliers in Minitab?
2. What is the advantage of a histogram over a box-plot, and vice-versa?
3. How do you do a side-by-side boxplot if you have more than one category? (2 methods)
1. Minitab has several different formulae for detecting outliers, so we might get different numbers depending on whether we use stem-and-leaf or a box-and-whisker plot.
The important thing is to identify and describe them.
2. Histograms are better for determining shape, and box-plots are better for outliers.
3. One Y simple
Multiple graphs
- Click by-variables tab, and in the by-variables with groups in separate panels box, put in the category (weather)
- Should look like this
Or go to Stat tab, Basic statistics, display descriptive statistics, fill in variables and by-variables boxes, and then select box-plot from Graphs button.
-
1. How do you find StDev in Minitab? (temperature data)
2. What are the two data summaries, and when do we use them?
3. Which data summary do we use if the distribution is symmetric and has no outliers?
4. Which data summary do we use if the distribution is symmetric and the data set is large (30 or more readings) with some outliers?
1. Same as Five Number Summary
Stat, basic statistics, display descriptive statistics
Put temperature in the variable box
2a. Mean and standard deviation: When the data does not have outliers, and is not skewed. This is because mean and standard deviation are sensitive to outliers, and won't give an accurate picture of the data.
b. Five Number Summary: When the data is skewed, and/or has outliers.
*Need a picture to decide.
3. We can use either 'mean and standard deviation' or 'five number summary'.
* Use one or the other, not both.
4. You can use mean and standard deviation because the effect of a few outliers is small when the data set is large.
*You can still use the five number summary in this case if you wish.
-
Define:
1. Population
2. Sample
1. A population comprises ALL the individuals of interst.
2. A sample comprises only the individuals we actually observe
ex. all women under 30, all schools in the district, all trees in a certain forest. etc.
*Samples are obtained in the hope that they represent the population.
-
1. Does the healthcare data represent a sample or a population?
2. Is the data on the amount of seal pups born a population or a sample?
1. A population, because it comprises of the richest 35 countries, and does not leave any out. This is the entire population selected.
2. It is a population, because the variable is changing so we can't be confident about using it as a sample to project forward or backward in time.
*If the pattern was consistent, then we could feel confident about using it as a sample
-
Give the notation and name for the following:
1. The mean of a sample
2. The mean of a population
3. The standard deviation of a sample
4. The standard deviation of a population
1. The mean of a sample = x bar
2. The mean of a population = nu
3. The standard deviation of a sample = s
- 4. The standard deviation of a population = sigma
-
Are 'all young women in the world' a population, or a sample. Why?
Population
If you've got a group and can't generalize it, it's a population.
Even if they had to use a sample to get the information.
|
|