INAF 5016 - QSS 2 - Causality 2

  1. 1. What is the causal effect we are trying to quantify in the resume experiment?

    2. What is the key to understanding causal inference?

    3. What is causal inference? 

    4. What is the fundamental problem of causal inference? Give an example or two
    1. The causal effects of applicants’ names on their likelihood of receiving a callback from a potential employer

    2. To think about the counterfactual

    3. Causal inference is a comparison between the factual (i.e., what actually happened) and the counterfactual (i.e., what would have happened if a key condition were different). 

    4. The fundamental problem of causal inference, arises because we cannot observe the counterfactual outcomes. 

    ex. In the resume data, the first name on the list,'Allison', was not called back. Would it be different if it was Lakisha? The experimenters didn't send out the same resume to the same companies with changed names, so we will never know. 


     In order to know the causal effect of increasing the minimum wage, we would need to observe the unemployment rate that would have resulted if this state had not raised the minimum wage. Clearly, we would never be able to directly survey this counterfactual unemployment rate.
  2. 1. What is a key causal variable of interest also called?

    2. How do we determine if the variable T cases a change in the variable Y?

    3. How would you apply T and Y to the resume experiment?

    4. What does Yi (1) represent?

    What does Ti represent?
    1. A treatment variable

    2. To determine whether a treatment variable of interest T , causes a change in an outcome variable Y , we must consider two potential outcomes, i.e., the potential values of Y that would be realized in the presence and absence of the treatment, denoted by Y(1) and Y(0), respectively.

    3. T may represent the race of a fictitious applicant (T = 1 is a black-sounding name and T = 0 is a white-sounding name) while Y denotes whether a potential employer who received the résumé called back. Then, Y(1) and Y(0) represent whether a potential employer calls back when receiving a résumé with stereotypically black and white names, respectively.

    4a. Yi (1) represents the potential outcome under the treatment condition for the ith observation,

    b. Ti is the treatment variable for the same observation.

    Image Upload 1
  3. 1. What are immutable characteristics? Give examples

    2. Why are they a problem, and how did the researchers in the resume experiment get around it?
    1. A characteristic that cannot be manipulated.

    ex. Gender and race.

    2. In order to determine cause, you must be able to manipulate the treatment variables.

    Many scholars believe that causal questions about these characteristics are not answerable. In fact, there exists a mantra which states, “No causation without manipulation.” 

    However, the résumé experiment provides a clever way of addressing an important social science question about racial discrimination. 

    The researchers of this study manipulated potential employers’ perception of job applicants’ race by changing the names on identical résumés.
  4. 1. What is an RCT?

    2. What is the SATE?

    3. Since we don't know the couterfactual, we have to estimate it.

    How do we estimate the counterfactual outcome for the treatment group? (3 parts)
    1. A randomized controlled trial:

    Researchers randomly assign the receipt of treatment.

    An RCT is often regarded as the gold standard for establishing causality in many scientific disciplines because it enables researchers to isolate the effects of a treatment variable and quantify uncertainty. 

    2. Sample average treatment effect: 

    the sample average of individual-level causal effects 

    3. In order to estimate the average counterfactual outcome for the treatment group, we may use the observed average outcome of the control group. 

    Similarly, we can use the observed average outcome of the treatment group as an estimate of the average counterfactual outcome for the control group.

    This suggests that the SATE can be estimated by calculating the difference in the average outcome between the treatment and control groups or the difference-in-means estimator.
  5. 1. What were the treatment and control groups in the resume experiment?

    2. What does randomization help to do, and what is it good for?
    1.  the treatment group consists of the potential employers who were sent résumés with black-sounding names.

    In contrast, the control group comprises other potential employers who received the résumés with stereotypically white names.

    2. By randomly assigning each subject to either the treatment or control group, we ensure that these two groups are similar to each other in every aspect. 

    Even though they consist of different individuals, the treatment and control groups are on average identical to each other in terms of all pre-treatment characteristics, both observed and unobserved.

    The randomization of treatment assignment guarantees that the average difference in outcome between the treatment and control groups can be attributed solely to the treatment, because the two groups are on average identical to each other in all pretreatment characteristics.
  6. 1. What is internal validity?

    2. What is the advantage of RTC's?

    3. What is external validity?

    4. What is the weakness of RTC's

    5. Name three reasons for this?
    1. whether the causal assumptions are satisfied in the study. 

    • 2. RCTs, when successfully implemented, can yield valid estimates of causal effects.
    • For this reason, RCTs are said to have a significant advantage for internal validity

    3.  External validity is defined as the extent to which the conclusions can be generalized beyond a particular study.

    4. The strong internal validity of RTC's often comes with a compromise in external validity.

    5a. One common reason for a lack of external validity is that the study sample may not be representative of a population of interest. 

     For ethical and logistical reasons, RCTs are often done using a convenient sample of subjects who are willing to be study subjects, and is an example of sample selection bias. 

    b. Another potential problem of external validity is that RCTs are often conducted in an environment (e.g., laboratory) quite different from real-world situations.

    c. In addition, RCTs may use interventions that are unrealistic in nature.

    * As we saw in the résumé experiment, however, researchers have attempted to overcome these problems by conducting RCTs in the field and making their interventions as realistic as possible.
  7. 1. What is the Hawthorne effect?

    2. What is an observational study, and how does it differ from a RCT in terms of validity?

    3. What is the important assumption in observational studies?
    1. A phenomenon where subjects may behave differently if they are aware of being observed by researchers.

    2.  researchers simply observe naturally occurring events and collect and analyze the data.

     researchers do not conduct an intervention

    In such studies, internal validity is likely to be compromised because of possible selection bias, but external validity is often stronger than that of RCTs.

    This is because The findings from observational studies are typically more generalizable than RTC's because researchers can examine the treatments that are implemented among a relevant population in a real- world environment.

    3. The important assumption of observational studies is that the treatment and control groups must be comparable with respect to everything related to the outcome other than the treatment.
  8. 1. How can we use the tapply() function to compute the turnout for each treatment group in the social experiment on voting? (three parts)

    2. How would you interpret the results? 

    *Flip card to see results*
    • 1. 
    • Image Upload 2

    2. We find that the naming-and-shaming GOTV message substantially increases turnout.

    Compared to the control group turnout, the naming-and-shaming message increases turnout by 8.1 percentage points, whereas the civic duty message has a much smaller effect of 1.8 percentage points.

    It is interesting to see that the Hawthorne effect of being observed is somewhat greater than the effect of the civic duty message, though it is far smaller than the effect of the naming-and-shaming message.
  9. 1. What are confounders?

    2. What is confounding bias?
    1. The pretreatment variables that are associated with both the treatment and outcome variables

    They are the variables that are realized prior to the administration of treatment and hence are not causally affected by the treatment.

    However, they may determine who is likely to receive the treatment and influence the outcome.

    2. The existence of such variables is said to confound the causal relationship between the treatment and outcome, making it impossible to draw causal inferences from observational data.

    **  The possible existence of confounding bias is the reason behind the existence of the popular mantra, “Association does not necessarily imply causation.”
  10. 1. What is the problem with self selection bias?
    1.  lack of control over treatment assignment means that those who self-select themselves into the treatment group may differ significantly from those who do not in terms of observed and unobserved characteristics. 

    This makes it difficult to determine whether the observed difference in outcome between the treatment and control groups is due to the difference in the treatment condition or the differences in confounders.
  11. 1. How do researchers try to minimize confounding bias in observational studies?

    2. Describe the before-and-after-design?
    1. One simple way is the statistical method called subclassification. The idea is to make the treatment and control groups as similar to each other as possible by comparing them within a subset of observations defined by shared values in pretreatment variables or a subclass.

    2. The before-and-after design examines how the outcome variable changed from the pretreatment period to the posttreatment period for the same set of units. The design is able to adjust for any confounding factor that is specific to each unit but does not change over time.

    However, the design does not address possible bias due to time-varying confounders.
  12. 1. Which is more sensitive to outliers, the mean or the median?

    2. What is the formula for calculating the median?

    3. What is IQR, and how do you do the function in R?

    4. What is a quantile? Give examples

    5. How do you do quantiles in R?
    1. The mean

    Ex. a single observation of extreme value can dramatically change the mean but it will not affect the median as much. The median of {1, 3, 4, 10, 82} is 4, but the mean now increases to 20.

    2. (n + 1) / 2

    3. Inter Quartile Range = the difference between the upper and lower quartiles (i.e., 75th percentile and the 25th percentile) 

    The IQR represents the range that contains the middle 50% of the data, thereby measuring the spread of a distribution.


    4. Quantiles divide the observations into a certain number of equally sized groups.

    ex. terciles (which divide the data into 3 groups), quintiles (5 groups), deciles (10 groups), and percentiles (100 groups).

    5. quantile()

    ex. for deciles (10 groups)

    quantile(minwageNJ$wageBefore, probs = seq(from = 0, to = 1, by = 0.1))

    ## 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

    ## 4.25 4.25 4.25 4.25 4.50 4.50 4.65 4.75 5.00 5.00 5.75
  13. 1. What is RMS?

    2. What is standard deviation?

    3. What is the formula for standard deviation? *not sure if we need to know this

    4. How do you calculate standard deviation in R?
    1. The RMS describes the magnitude of a variable

    2. the standard deviation measures the spread of a distribution by quantifying how far away data points are, on average, from their mean.

    Specifically, the standard deviation is defined as the RMS of deviation from the average:

    • 3. 
    • Image Upload 3


    Image Upload 4

    4. sd()



    ## [1] 0.2304592
Card Set
INAF 5016 - QSS 2 - Causality 2
Causation continued