1. What is a two-way table (2 points)
2. What direction does a row go in, and what direction does a column go?
3. What is the first thing you should do when starting to interpret a two-way table?
1. A table where individuals have been classified according to two variables.
The variables may be categorical or quantitative.
- 2. Row: Horizontal
- Column: Vertical
3. Make an extra row and an extra column for the totals. Then total each column and row up. Also, get a final total of all individuals in the table to put on the bottom right corner.
1. What is the marginal distribution
of a two-way table?
2. What is the conditional distribution
of a two-way table?
3a. How many people does this table describe, and how many have played video games?
3b. How would you calculate the marginal distribution of grades, and how would this look in a table? (visualize)
4. How would you figure out the conditional distribution for players and non-players, and what would this look like in a table? (visualize)
1. The row totals and column totals in a two-way table give the marginal distribution of the two individual variables.
It is clearer to present these distributions as percents of the table total.
Marginal distribution tells us nothing about the relationship between the variables.
2. There are two sets of conditional distributions in a two-way table. The distribution of the row variable for each fixed value of the column variable, and vice versa.
To find the conditional distribution of the row variable for one specific value of the column variable, look only at that one column in the table. Find each entry in the column as a percent of the column total.
There is a separate conditional distribution
for each value of the other variable
- 3a. Add up all the people in the table.
- 736 + 450 + 193 + 205 + 144 + 80 = 1808
- This table describes 1808 people.
- 736 + 450 + 193 = 1379
- 1379 people played video games
3b. A's and B's:
736 + 205 / 1808 = 0.5205 x 100 = 52.05%
450 + 144 / 1808 = 0.3285 x 100 = 32.85%
D's and F's:
193 + 80 / 1808 = 0.151 x 100 = 15.1%
- The complete marginal distribution for grades is:
- 4. There are 1379 players (736 + 450 + 193).
- Of these, 53.37% earned A's and B's. (736/1379) x 100
: 450 / 1379 x 100 = 32.63%
D's and F's
: 193 / 1379 x 100 = 14.00
Now do the same process for non-games
There are 429 non-gamers (205 + 144 + 80)
- Of these, 47.79% earned A's and B's(205/429) x 100
There are two types of significance tests for inference from a two-way table.
Which one do we do in this class, what is it used for and what is another name for it?
Chi-squared test of independence (also known as the chi-squared test of association) is used to find evidence for a relationship (dependence, connection, link etc.) between two variables.
*The other test is a chi-squared test of homogeneity, but we don't touch on it in this course.
1. What are the conditions for a chi-squared test of independence?
2. In the worksheet, which numbers are the observed counts and which are the expected counts?
1a. The sample is representative of the population
1b. All expected counts are at least 5. A few isolated expected counts below 5 do not matter, provided that they are not all in one row or all in one column.
*Need to check on Minitab to find out*
2. Observed counts: 736, 450, 193, 205, 144 and 80
Expected counts: 717.7, 453.1, 208.2, 223.3, 140.9 and 64.8
1. What is a table of observed counts?
2. What does the sample of teenage boys who played games and didn't play games represent?
- 1. Observed counts are simply the numbers from a sample that are put into a table.
2. The sample of teenaged boys represents the population of all teenaged boys at all similar schools.
In the population of all 14-18 year-old boys at similar schools, is there a relationship between gaming status and grades. That is, do grades depend on gaming status?
You know what to do!
- STATE: How strong is the evidence that grades depend on gaming status
NO PARAMETER TO WORRY ABOUT
- Ho: Grades do not depend on gaming status
- Ha: Grades depend on gaming status
We'll use a χ² test of association because we're looking for a relationship between two variables.
- SOLVE: State the Conditions
- 1. Representative sample
- 2. All expected counts are at least 5. A few isolated expected counts below 5 do not matter, provided that they are not all in one row or column.
- Checking the Conditions1. No information is given about the selection process concerning how they chose the boys. In practice we would contact the person who collected the data to find out.
- 2. All expected counts exceed 5. For example, 717.7
- Test stat: 6.739
- p-value: 0.034
*Always use pearson p-value*
- CONCLUDE: Since the p-value is between 0.01 and 0.05, we have strong evidence that grades depend on gaming status in the population of all teenage boys at similar schools
1. What type of variables do we use for a chi-square test?
2. Does a chi-square test establish causation?
3. What is a lurking variable for the association between gaming status and grades?
1. We can use quantitative or categorical variables.
2. No, only correlation.
3. Higher household income for the gamers could be causing them to get better grades, so we can't claim causation. Higher income households have more money for games and also for tutors to help with school
1. What do the expected counts mean? Use gaming and grades as an example.
2. How is the chi-square statistic calculated (generally speaking)?
3. What does it mean when the chi-square statistic is large and if its small?
1. The expected counts are the numbers of boys we would expect in each cell if grades were the same for gamers and non-gamers.
- If grades were the same for gamers and non-gamers, then the grade distribution for gamers would be the same as the grade distribution for non-gamers and this would be estimated by the marginal distribution of grades.
For example, 52.046% of 1379 (total number of gamers) would be expected to get A's and B's.
The same would go for the expected counts for non-gamers. If grades were the same for non-gamers as they were for gamers, then 52.045% of 429 (non-gamers) would mean that 223.3 students would get A's and B's.
2. The chi-square statistic compares each observed
count with the corresponding expected
- Don't memorize specific formula, just have an idea how it works.
3. If the observed counts are very different from the expected counts, then the chi-squared statistic will be large and we will conclude that grades depend on gaming status.
If the observed counts are close to the expected counts, then the chi-squared statistic will be small, and we will conclude that we can't claim that grades depend on gaming status.
1. What is the p-value if the chi-squared test statistic is 6.739 with 2 df?
2. What is the p-value if the chi-squared test statistic is 14.53 with 10 df?
3. How do we figure out the degrees of freedom (df) for a chi-squared test involving a two-way table?
4. Calculate df for the gaming example.
1. p-value = between 0.25 and 0.05
- 2. p-value = 0.15
3. In a chi-squred test involving a two-way table that has r rows and c columns, the degrees of freedom are given by:
4. 2 rows, 3 columns
1 x 2 =
1. People often try to use chi-square tests as soon as they see a two-way table.
But what are the three things that need to be in place in order to use a chi-square table?
1. The numbers are counts of individuals (not percentages/summarized data)
2. Each individual is counted exactly once
3. The conditions are satisfied (representative sample and expected counts all at least 5...)