Step 3 Biostatistics I

  1. External validity (generalizability)
    • It is defined as the applicability of a study's results beyond the group that was initially assessed.
    • It answers the question, "How generalizable are the results of a study to other populations?"
    • Evaluating the inclusion/exclusion criteria is helpful in determining a study's external validity (eg, a study in middle-aged women would not necessarily be generalizable to elderly men)
  2. Internal validity
    • It relates to conclusions regarding cause and effect in a study and answers the question, "Are we observing/measuring what we think we are observing/measuring?"
    • The major threat to internal validity is confounding.
  3. Power
    Power is the ability to detect an effect if that effect exists. Power depends on sample size (larger size increases power), effect size (and standard deviation), and alpha and beta error levels
  4. Cross-sectional studies
    • Individuals are assessed at a specific point in time to determine whether or not they have a certain risk factor and a certain disease of interest.
    • These studies can be descriptive or analytic (can infer but not prove causation).
    • Subjects are assessed for the occurrence of risk factor exposure and for the presence of disease; some will not have been exposed or have the disease.
  5. ANalysis Of VAriance (ANOVA) test
    • It is best suited for a scenario where the mean values of a continuous variable in several groups (categorical variable) are being compared.
    • The ANOVA test gives an
    • F statistic (based on the variation within and between the different groups) that can be used to obtain a p-value.
    • Specific conditions (eg, homoscedasticity, normality) need to be satisfied for the test to be valid.
  6. Fischer's exact test
    • It is used to study the association between 2 categorical variables when the number of observations is small.
    • For example, the association between 2 categorical variables (eg, treatment result [success/failure] and gender [man/woman]) can be determined in a small study with 20 subjects.
  7. McNemar test
    It compares the difference between 2 paired proportions; patients serve as their own control (eg, success/failure before and after treatment in the same subjects).
  8. Paired t-test
    It can test the difference between 2 paired means; patients serve as their own control (eg, mean blood pressure before and after treatment in the same subjects).
  9. Pearson chi-squared test
    It is used to compare the association between categorical variables.
  10. Effect modification
    • It is present when the effect of the main exposure on the outcome is modified by the level of another variable. Separate measures of outcome should be reported for each level of an effect modifier.
    • Effect modification is NOT a bias, and it is not due to flaws in the design or analysis phases of the study.
    • It should not be controlled or adjusted, but the RRs should be reported for each level of the effect modifier.
  11. Area under the curve (AUC) of a receiver-operating characteristic (ROC) curve
    • It is a reflection of diagnostic accuracy.
    • A larger AUC means better discrimination and higher diagnostic accuracy.
    • A perfect test would have an AUC of 1.0 and a non discriminating test would have an AUC of 0.5 (similar to a coin toss).
  12. Odds ratio (OR)
    • It is a measure of association between an exposure and an outcome.
    • It represents the odds that an outcome will occur in the presence of a particular exposure compared to the odds that the outcome will occur in the absence of that exposure.
    • An OR more than 1 means that the exposure is associated with higher odds of the outcome and an OR less than 1 means that the exposure is associated with lower odds of the outcome.
  13. Latency of a disease
    • The concept of a latent period can be applied to both disease pathogenesis and exposure to risk modifiers.
    • Inciting pathologic events (or exposure to risk modifiers) sometimes occur years before clinical manifestations become evident.
    • In addition, exposure to risk modifiers may need to continue over a long time period before disease outcome is affected.
  14. Likelihood ratio
    • Likelihood ratio (LR) is an expression of sensitivity and specificity that can be used to assess the value of a diagnostic test.
    • The positive likelihood ratio (LR+) represents the value of a positive test result, and the negative likelihood ratio (LR-) represents the value of a negative test result.
    • LRs are calculated from sensitivity and specificity so they do not change with disease prevalence.
    • Other advantages of LRs are that they can be used with tests that have more than 2 possible results; they can also be used to combine the results of multiple diagnostic tests and to calculate post-test probability for a target disorder (post-test odds = pre-test odds * LR).
  15. Positive likelihood Ratio
    • • LR+ = sensitivity / (1 - specificity).
    • This is the probability of a patient with the disease testing positive divided by the probability of a patient without the disease testing positive. It is equivalent to: (true positive rate/false positive rate).
  16. Negative Likelihood Ratio
    • • LR- = (1 - sensitivity) / specificity.
    • This is the probability of a patient with the disease testing negative divided by the probability of a patient without the disease testing negative. It is equivalent to: (false negative rate /true negative rate).
  17. Positive and Negative Predictive Value
    • PPV is defined as the percentage of individuals that actually have the disease out of all positive test results.
    • Unlike sensitivity and specificity, positive and negative predictive values are not intrinsic characteristics of the test. They are dependent on the prevalence of the disease in the tested population.
    • In a patient population with low disease prevalence, the PPV is likely to be low due to a high number of false positives relative to true positives.
  18. Likelihood Ratio
  19. Metaanalysis
    • Pooling the data from several studies to perform an analysis is called meta-analysis.
    • Meta-analysis is a useful epidemiologic tool that is employed to increase the power (i.e., the ability to detect the difference in the outcome of interest between groups) of a study.
    • If the outcome is rare or the difference between the groups is small, it is difficult for a single, even large scale study to detect the difference and reach statistical significance.
  20. Lead-time bias
    It occurs when a screening test detects the disease at an earlier point in time (making it look like the survival rate increased), but the associated prognosis of the disease does not actually change.
  21. Standardized mortality ratio (SMR)
    • It represents an adjusted measure of the overall mortality, and is typically used in occupational epidemiology.
    • Mortality is typically adjusted for age (less commonly for gender, race and other factors). The standard population is used for comparison.
    • The SMR is calculated using the following formula:
    • SMR = observed number of deaths/expected number of deaths
    • The expected number of deaths is calculated using age-specific rates of death in the standard population (e.g. total US population). The observed number of deaths in the population of interest (e.g. miners) is then divided by the expected number to obtain the SMR.
    • An SMR of 1.75 indicates that the observed number of death in miners is 1.75 higher than in the standard population (e.g. total US population).
  22. Recall bias
    • It should always be considered as a potential problem in case-control studies because it can cause an overestimation of the effect of an exposure.
    • Other typical scenarios that may illustrate recall bias are: women whose husbands are diagnosed with lung cancer tend to over-report the number of the cigarettes smoked daily by the patient; patients with melanoma tend to over report low tanning ability.
  23. Observer bias
    • It results in misclassification of the outcome (e.g., labeling a healthy baby as having an abnormality and vice versa).
    • It is explained by the knowledge of the exposure status of the patients by the physician who makes the diagnosis
    • It is defined as a misclassification of events that can result from knowing the exposure status of a patient.
    • It can be effectively reduced by using the blinding technique.
  24. Relative risk
    • It represents a measure of outcome in follow-up studies.
    • It is the risk ratio which compares the risk among the exposed to the risk among the unexposed.
    • It is the risk of the exposed, divided by the risk of the unexposed.
  25. Attributible Risk Percentage
    • ARP is a measure of excess risk.
    • It estimates the proportion of the disease in exposed subjects that is attributed to exposure status.
    • Two approaches can be used to calculate ARP.
    • The first approach uses the following formula: ARP = (Risk in exposed- Risk in unexposed)/Risk in exposed.
    • The other approach uses relative risk (RR) to calculate ARP: ARP = (RR- 1 )/RR.
  26. Population attributable risk percent (PARP)
    • PARP estimates the proportion of the disease in the population that is attributed to the exposure.
    • Unlike attributable risk percent, PARP is the measure of excess risk in the total population, not only in exposed subjects.
    • PARP can be calculated using the following approach:
    • PARP = (Risk in the total population - Risk in unexposed)/Risk in the total population.
    • Knowing the risk of stroke in exposed (0.1%), the risk in unexposed (0.05%), and the prevalence of exposure in the population (0.5, or 50%), it is possible to calculate the risk in the total population:
    • Risk in the total population = 0.1 x 0.5 + 0.05 x 0.5 = 0.075
    • The risk in the total population is 0.075%. Now we can calculate PARP: PARP = (0.075- 0.05)/0.075 = 0.33 (33%)
    • The following interpretation is valid in this case: 33% of the ischemic strokes observed in the population can be attributed to smoking.
  27. Confounding
    • It occurs when the exposure-disease relationship is obscured by the effect of an extraneous factor that is associated with both.
    • Randomization helps to remove the effects of both known and unknown confounders.
    • However, for the cohort study, the analysis accounted only for the confounding effects of age and gender, but additional confounding factors (residual confounders) may not have been taken into account, leading to residual confounding.
  28. Power
    • Power is the probability to detect a difference in the outcome of interest between two groups, if such a difference exists.
    • A bigger sample size will increase the power of the study, (the ability to detect the difference), and the p value will reach statistical significance.
  29. Number needed to treat (NNT)
    • It is the number of patients who need to be treated in order to prevent one additional bad outcome.
    • It is an important way to present the results of a study or assess the usefulness of treatment or prophylaxis. Sometimes, it is more convenient for practitioners to use NNT than measures of association (which represents the strength of association, not the practical aspects of treatment efficacy).
    • The calculation of NNT is easy: it is actually the inverse of absolute risk reduction (ARR).
  30. Interpretation of Relative Risk
    • Hypothetically, assume that the risk of sudden death in the low-dose group was 1.0% and the risk in the high-dose group was 2.5%.
    • The relative risk would be 1.0/2.5 = 0.4. That means that the low-dose group has a risk of sudden death that is 40%of the risk in the high-dose group.
    • While assessing measures of effect, it is very important to recognize which groups are being compared, as the resulting interpretation can differ strikingly.
    • If the RR of an outcome in group A as compared to group B is X, then the RR in group B as compared to group A is 1/X.
  31. Sensitivity of a test
    • It determines the capacity of the test to correctly diagnose a patient with the disease.
    • It is obtained by dividing the number of true positives by the number of people who have the disease (true positives + false negatives).
  32. Correlation coefficient (r)
    • The correlation coefficient shows the strength and direction (positive or negative) of linear association between 2 variables but does not imply causality.
    • Correlation coefficient values range from -1 to 1.
    • The sign indicates a positive or negative direction of linear association between 2 variables; the null value is 0, which denotes no association.
    • The closer the r value is to its margins [-1, 1], the stronger the association.
  33. Survival analysis
    • It is used to analyze follow-up studies and clinical trials.
    • It accounts not only for the number of events in both groups, but also for the timing of the events.
  34. Ecologic (correlational) study
    • It is important to remember that ecologic studies give population-level information, not individual-level information.
    • Applying population-level information to an individual level may lead to a bias called ecologic fallacy.
  35. PPV and NPV
    • Positive predictive value (PPV) [PPV = TP / (TP + FP)] is the probability that a positive test correctly identifies an individual who actually has the disease.
    • A higher specificity increases the PPV of the test.
    • Negative predictive value (NPV) [NPV = TN / (TN + FN)] is the probability that a negative test correctly identifies an individual who does not have the disease.
    • A higher sensitivity increases the NPV of the test.
  36. Publication bias
    • It occurs when trials with significant positive results are published but trials with negative/null results are not.
    • Effect modification: occurs when an external variable differentially modifies (positively or negatively) the effect of a risk factor on the disease of interest in different groups.
  37. Confidence Interval and Power of a study
    • If the confidence interval of a study includes 1.0, it is not statistically significant.
    • The power of a study is the ability to detect the difference between two groups (treated vs non-treated, exposed vs non-exposed).
    • Increasing the sample size increases the power of a study. As a result, the confidence interval of the point estimate (e.g., odds ratio) becomes tighter.
  38. How does one decide on which measure of central tendency to use?
    • • For continuous data that are normally distributed, the mean is the best choice for central tendency.
    • • For ordinal data or continuous data that are highly skewed, the median is the best choice for central tendency.
    • Ordinal data can be ordered in a logical fashion (eg, freshman, sophomore, junior, and senior).
    • • For nominal data, using the mode or describing the frequency or proportion of subjects in each category is best. Nominal data cannot be ordered in a logical fashion (eg, male and female).
  39. Interpretation of Sensitivity and Specificity
    • A test is more sensitive when it is able to diagnose more patients that actually have the disease (more true positives and less false negatives). This means that less "sick" patients will have a negative test result.
    • A test is more specific than others when it is better able to identify patients who are actually healthy (more true negatives and less false positives). This means that less "healthy" patients will have a positive test result.
  40. Using Diagnostic Tests in Clinical Practice
    • Screening tests should be used in an orderly and efficient manner.
    • The more sensitive test (test with more true positives) should be used first, because a positive result will mean that the patient has a high chance of actually having the disease.
    • A more specific test is then ordered. A positive result from the more specific test (test with less false positives) will mean that the patient is less likely to be healthy, and this result will confirm your diagnosis
  41. Odds Ratio
Author
Ashik863
ID
336433
Card Set
Step 3 Biostatistics I
Description
Mean, SD, Mode, Median
Updated