
Define:
Individuals
Variable
Individuals: The objects described by a set of data. Individuals may be people, but they may also be animals or things.
ex. dogs, schools, years
Variables: A characteristic of an individual. A vairalbe might vary from one individual to another.
ex. hair colour of dog, age of school, amount spent on healthcare in a year

How do you access Q drive?
1. Go to Computer
2. Instructor Files
3. Math & Stat
4. Gillian
5. Stat 104

What is the difference between raw data and summarized data?
Give an example of summarized data citing individual and variable
Raw data shows a list of the individuals and the information for each individual is presented.
Summarized data does not list individuals with information alongside. The data is already summarized.
 ex Car colour in percentages
 White  35%
 Black  17%
 Silver  12% etc.
Here the individual is the car, and the variable is the colour.
ex. Percentage of eye colour in the class.
 Individual  students
 Variable  Colour of eyes
*With summarized data you can'nt find the individual from going one line to the next.

Describe the two types of variables
Give examples of each.
1. Categorical variables  A qualitative characteristic that describes an individual in a nonnumeric way. The information is usually expressed as words.
ex. Gender of person, opinion of person, primary colour of dog
2. Quantitative Variable: A characteristic that describes an individual in a numeric way.
The numbers must have numeric meaning; it makes sense to form an average. So they cannot be student number, gender where male is 1 and female is 0,
or opinions using a Lykert Scale because there is no concept of distance between Disagree and Strongly Disagree.
ex. height of person, weight of dog, speed of car, number of children per family.

1, What is distribution of variables?
2. a. What are the two charts used to display categorical variables?
b. How do we describe categorical variables? (3 points)
3. a. What are the five charts used to display quantitative variables?
b. How do we describe quantitative variables? (4 points)
The distribution of variables gives the possible values of the variables along with how often they occur.
 Sometimes presented in tables
 ex. The car colour data table gives the distribution of car colours.
2a. bar charts and pie charts
b. Most common, least common and anything else of interest
3a histograms, dotplots, stemandleaf plots, boxandwhisker plots.
*Timeseries plots: only for variables that fluctuate over time
b. Center, spread, shape and outliers

How do you make a bar chart on Minitab?
What will be on the Horizontal Axis and what will be on the Vertical Axis?
* What do you need to do after the graph comes up?
1. Go to Graph > Bar Chart
2. Bars represent: Values from a table
3. Simple
 4. Graph Variable: Select the numerical data you want to show in your graph.
 ex number of people with certain eye colour. * Vertical Axis
 5. Categorical Variable: Select the categorical variable you want to graph.
 *Horizontal Axis
 ex. eye colour
*Change the name of the table to give a clear description of what data is being presented.

Brown eyes  55%
Blue eyes  26%
Hazel eyes  10%
Green eyes  7%
How would you describe the distribution of eye colour?
1. The most common eye colour is brown. 55% of people in the room have brown eyes.
2. The least common eye colour is green. 7% of people in the room have green eyes.
 3. Something else of interest:
 Brown eyes are twice as common as blue eyes
Hazel and green eyes are approximately just as likely to occur.

1. What are the two necessary components of a pie chart?
2. How do you make a pie chart on MiniTab?
 1.
 Pie charts can be used to describe a categorical variable if....
 a. Each individual is included exactly once
 ex. each person in the room (eye colour)
b. The percentages add up to 100%
*In other words, the categories are all part of a single pie
2.a.  Graph  Pie Chart
b. Click on Value from a Table box
 c. Categorical Variable: Insert the categories
 ex. Eye colour
 d. Summary variables: Insert the numerical data you want to represent.
 ex. Number of people with x eye colour

What do you have to do to make a Histogram more readable? (2)
1. Click on bottom numbers, change position of ticks, then click on the Binning tab, and change Interval Type to Cutpoint instead of Midpoint.
2. Change the labels so they describe the data better.

Define:
1. Center
2. Spread
3. Shape
4. Outliers
*How do you calculate these?
** How would you put these into a sentence, using the health care data?
1. Center: The value of the variable that has half of the individuals below it and half above it
Center = n + 1 / 2, where n is the number of individuals in the data set
* If we don't get a whole number, the center is the average of the numbers on either side.
2. Spread: How much the variable varies calculated as Max  Min
3. Shape: The shape of a distribution. Either symmetric or skewed.
4. Outliers: Generally, an individual that does not follow the overall pattern of the data. An individual whose value is a lot more or a lot less than the others.
*In a histogram, stemplots and dotplots, outliers can be identified because there is a gap between certain individuals and the others.
** Center: The typical amount spent on health care is $2200
Spread: The amounts spent are spread over $10,000.
Shape: The distribution of the amounts spent on health care is skewed to the right because the spread of the top 50% of the amounts spent is more than the spread of the bottom 50% of the amounts spent
Outliers: One country spent a lot more on health care than all the other countries. The country spent approx. $9500 on health care.

1. When do we NOT consider outliers?
2. What does it mean when a shape is skewed to the right?
3.
1. When determining the shape. Outliers DO NOT cause skewness.
2. The distribution has a long tail on the right. The spread of the lower 50% is less than the spread of the top 50%.

1. In a stemandleaf plot, what are the leaf units?
2. How do you read a stem plot?
3. What is the center, spread, shape and outliers?
1. The leaf units tell us the units for the data. ex. If the leaf unit is 100, then the numbers are in hundreds.
2. When reading a number from a stemplot, put the stem and leaf together and multiply by the leaf unit.
ex. If the stem is 60, and the leaf is 2, and the unit is 10, the number is 602 x 10 = 6020.
 3.

Center: n+1 / 2
35 + 1 / 2 = 18
The left hand column shows how many numbers the row passes. Count 18 from either the bottom or the top.
22 x 100 = 2200
Typically, the 35 countries that had the highest GDP in 2013 spent $2200 on health care.
Spread: 9100  200 = 8900
The amounts spent on health care in 2013 are spread over $8900
Shape: The distribution of the amounts spent on health care in 2013 is skewed to the right because the spread of the top 50% is more than the spread of the bottom 50% of the amounts spent.
Outliers: One country spent a lot more than all the other countries. This country spent $9100.

1. Why use a stemandleaf over a histogram, and viceversa. (2 points)
2. How do you find outliers on a stemandleaf using MiniTab?
3. How do you describe the healthcare data if there are no outliers?
4. Because the results of reading different graphs are not always the same, what do you need to do on a test?
1a. The numbers are generally more accurate using a stemandleaf, but histograms give a better visual illustration.
b. If you have a large dataset, histograms are better, A small dataset, histograms are useless and a stemandleaf or a dotplot are better.
 2. Click on box called Trim Outliers
 LO means a low outlier, HI means a high outlier.
3. No countries spent a lot more or a lot less than the other countries on healthcare.
4. Indicate which graph you have drawn.

How do you make a sidebyside histogram if there is more than one category?
 Go to Stat bar at the top
 Basic Statistics
 Display Descriptive Statistics...
 Select Variables and By Variables
Click Graphs button, and select Histogram of data

1. What is a timeseries plot, and what does it show?
2. Describe the 3 features of a time series plot
3. How do you make a TimeSeries plot in Minitab?
4. How would you describe this time series plot?
 1. A collection of reading of a variable taken sequentially in time.
 ex. recorded every year.
It shows how the variable has changed over time.
2a. Trend: the overall change in the variable during the time period for which we have data.
Trends may be increasing, decreasing or constant if the numbers fluctuate, but do not generally get bigger or smaller.
b. Seasonality: A repeating pattern that continues throughout the time period for which we have data.
ex. Temperatures go up in summer, and down in winter every year.
c. Random Fluctuations: irregular short term changes up or down, including spikes.
Time series graphs are never smooth. The line wobbles, and these are random fluctuations.
3a. In the Series box, put the variable, not the year (individual).
b. Click Time/Scale button.
c. Under Time Scale: column, select Stamp
d. In Stamp columns box, select year and then click 'ok' 'ok'.
 4.

Trend: Increased from approximately 1980 until 1992, and then levelled off.
Seasonality: No repeating patterns.
*Large random fluctuation in 2002.

1. Why should bar charts NOT be used for time series data?
2. Why should time series plots NOT be used to describe the distribution of categorical variables?
1. Because you want to see how data changes over time. Bar charts are choppy, and do not show random fluctuations.
2. Because categorical variables have no concept of distance.
ex. Percentage of students with each letter grade.

