STAT 104 - Chapter 6: Two-way tables, Chapter 25: Chi-Squared Tests

  1. 1. What is a two-way table (2 points)

    2. What direction does a row go in, and what direction does a column go?

    3. What is the first thing you should do when starting to interpret a two-way table?
    1. A table where individuals have been classified according to two variables.

    The variables may be categorical or quantitative.

    • 2. Row: Horizontal
    •     Column: Vertical

    3. Make an extra row and an extra column for the totals. Then total each column and row up. Also, get a final total of all individuals in the table to put on the bottom right corner.
  2. 1. What is the marginal distribution of a two-way table?

    2. What is the conditional distribution of a two-way table?

    3a. How many people does this table describe, and how many have played video games?

    3b. How would you calculate the marginal distribution of grades, and how would this look in a table? (visualize)
     Image Upload 1

    4. How would you figure out the conditional distribution for players and non-players, and what would this look like in a table? (visualize)
    1. The row totals and column totals in a two-way table give the marginal distribution of the two individual variables. 

    It is clearer to present these distributions as percents of the table total. 

    Marginal distribution tells us nothing about the relationship between the variables. 

    2. There are two sets of conditional distributions in a two-way table. The distribution of the row variable for each fixed value of the column variable, and vice versa. 

    To find the conditional distribution of the row variable for one specific value of the column variable, look only at that one column in the table. Find each entry in the column as a percent of the column total.

    There is a separate conditional distribution for each value of the other variable

    • 3a. Add up all the people in the table.
    • 736 + 450 + 193 + 205 + 144 + 80 = 1808
    • This table describes 1808 people.

    • 736 + 450 + 193 = 1379
    • 1379 people played video games

    3b. A's and B's: 736 + 205 / 1808 = 0.5205 x 100 = 52.05%

    C's: 450 + 144 / 1808 = 0.3285 x 100 = 32.85%

    D's and F's: 193 + 80 / 1808 = 0.151 x 100 = 15.1%

    • The complete marginal distribution for grades is:
    • Image Upload 2

    • 4. There are 1379 players (736 + 450 + 193). 
    • Of these, 53.37% earned A's and B's. (736/1379) x 100

    C's: 450 / 1379 x 100 = 32.63%

    D's and F's: 193 / 1379 x 100 = 14.00

    Now do the same process for non-games

    There are 429 non-gamers (205 + 144 + 80)

    • Of these, 47.79% earned A's and B's
    • (205/429) x 100
    • Image Upload 3
  3. How would you make a bar chart for the conditional distribution of gamers and non-gamers on Minitab?
    • Make a Label column for your response variable and put the categories in each row. Then make separate columns for the different categories of your explanatory variable, and put the data into the rows like this.
    • Image Upload 4
    • Choose 'Values from a table', then choose 'two-way table cluster'
    • Image Upload 5
    • Put your explanatory variables (all categories) into the 'graph variables' box.
    • The put your response variable into the 'row labels' box.

    • **Change 'table arrangement' to 'rows are outermost categories and columns are innermost'
    • Image Upload 6
    • Click ok and you'll get your graph
    • Image Upload 7
  4. There are two types of significance tests for inference from a two-way table.

    Which one do we do in this class, what is it used for and what is another name for it?
    Chi-squared test of independence (also known as the chi-squared test of association) is used to find evidence for a relationship (dependence, connection, link etc.) between two variables.

    *The other test is a chi-squared test of homogeneity, but we don't touch on it in this course.
  5. 1. What are the conditions for a chi-squared test of independence?

    2. In the worksheet, which numbers are the observed counts and which are the expected counts?
    Image Upload 8
    1a. The sample is representative of the population

    1b. All expected counts are at least 5. A few isolated expected counts below 5 do not matter, provided that they are not all in one row or all in one column. 

    *Need to check on Minitab to find out*

    2. Observed counts: 736, 450, 193, 205, 144 and 80

    Expected counts: 717.7, 453.1, 208.2, 223.3, 140.9 and 64.8
  6. 1. What is a table of observed counts?

    2. What does the sample of teenage boys who played games and didn't play games represent?
    • 1. Observed counts are simply the numbers from a sample that are put into a table.
    • Image Upload 9

    2. The sample of teenaged boys represents the population of all teenaged boys at all similar schools.
  7. In the population of all 14-18 year-old boys at similar schools, is there a relationship between gaming status and grades. That is, do grades depend on gaming status?

    You know what to do!
    Image Upload 10
    • STATE: 
    • How strong is the evidence that grades depend on gaming status

    PLAN: *NO PARAMETER TO WORRY ABOUT

    • Ho: Grades do not depend on gaming status
    • Ha: Grades depend on gaming status

    We'll use a χ² test of association because we're looking for a relationship between two variables. 

    • SOLVE: 
    • State the Conditions
    • 1. Representative sample
    • 2. All expected counts are at least 5. A few isolated expected counts below 5 do not matter, provided that they are not all in one row or column. 

    • Checking the Conditions
    • 1. No information is given about the selection process concerning how they chose the boys. In practice we would contact the person who collected the data to find out.

    • 2. All expected counts exceed 5. For example, 717.7
    • Image Upload 11

    • Test stat: 6.739
    • p-value: 0.034

    *Always use pearson p-value*

    • CONCLUDE: 
    • Since the p-value is between 0.01 and 0.05, we have strong evidence that grades depend on gaming status in the population of all teenage boys at similar schools
  8. How do you perform a chi-squared test of association in Minitab?
    • 1. Type in the observed counts into Minitab
    • Image Upload 12
    • Image Upload 13
    • Go to 'Stats' -> 'Tables' -> 'Chi-square Test for Association'
    • Image Upload 14
    • Change drop down menu to 'summarized data in a two-way table'

    • Now put in each column from the two-way table into the box called 'Columns containing the table'
    • Image Upload 15
    • Press 'ok'
    • Image Upload 16
  9. 1. What type of variables do we use for a chi-square test?

    2. Does a chi-square test establish causation?

    3. What is a lurking variable for the association between gaming status and grades?
    1. We can use quantitative or categorical variables. 

    2. No, only correlation. 

    3. Higher household income for the gamers could be causing them to get better grades, so we can't claim causation. Higher income households have more money for games and also for tutors to help with school
  10. 1. What do the expected counts mean? Use gaming and grades as an example.

    2. How is the chi-square statistic calculated (generally speaking)?

    3. What does it mean when the chi-square statistic is large and if its small?
    1. The expected counts are the numbers of boys we would expect in each cell if grades were the same for gamers and non-gamers. 

    • If grades were the same for gamers and non-gamers, then the grade distribution for gamers would be the same as the grade distribution for non-gamers and this would be estimated by the marginal distribution of grades. 
    • Image Upload 17

    For example, 52.046% of 1379 (total number of gamers) would be expected to get A's and B's. 

    The same would go for the expected counts for non-gamers. If grades were the same for non-gamers as they were for gamers, then 52.045% of 429 (non-gamers) would mean that 223.3 students would get A's and B's. 

    Image Upload 18

    2. The chi-square statistic compares each observed count with the corresponding expected count. 

    • ex. 
    • Image Upload 19
    • Don't memorize specific formula, just have an idea how it works.

    3. If the observed counts are very different from the expected counts, then the chi-squared statistic will be large and we will conclude that grades depend on gaming status. 

    If the observed counts are close to the expected counts, then the chi-squared statistic will be small, and we will conclude that we can't claim that grades depend on gaming status.
  11. 1. What is the p-value if the chi-squared test statistic is 6.739 with 2 df?
    Image Upload 20

    2. What is the p-value if the chi-squared test statistic is 14.53 with 10 df?

    3. How do we figure out the degrees of freedom (df) for a chi-squared test involving a two-way table?

    4. Calculate df for the gaming example.
    Image Upload 21
    1. p-value = between 0.25 and 0.05

    • 2. p-value = 0.15
    • Image Upload 22

    3. In a chi-squred test involving a two-way table that has r rows and c columns, the degrees of freedom are given by:

    (r-1)(c-1)

    4. 2 rows, 3 columns

    (2-1)(3-1) = 

    1 x 2 = 

    2 df
  12. 1. People often try to use chi-square tests as soon as they see a two-way table.

    But what are the three things that need to be in place in order to use a chi-square table?
    1. The numbers are counts of individuals (not percentages/summarized data)

    2. Each individual is counted exactly once

    3. The conditions are satisfied (representative sample and expected counts all at least 5...)
Author
MissionMindhack
ID
347465
Card Set
STAT 104 - Chapter 6: Two-way tables, Chapter 25: Chi-Squared Tests
Description
Final Exam Prep
Updated