Statistics - PAP-575

  1. Definition of Statistics
    The theory, procedures, and methodology by which data are summarized.
  2. Descriptive Statistics
    serve the function of numerically describing a phenomenon, such as how many patients are diagnosed with lung cancer per year or the average cholesterol levels among patients in the control group and the patients in the treatment group.
  3. Inferential Statistics
    make predictions about a population based on a representative sample. (Outcome of treatment group is assumed to predict how the population will respond.)
  4. For a normal distribution, with a mean of 500 and standard deviation of 100.  A score of 600 is what percent better than the rest of the scores.
  5. Percent under normal curve for:
    1 st dev
    2 st dev
    3 st dev
    • 68%
    • 95%
    • 99.7%
  6. On a normal curve, where is the mean and median?
    Center of the curve (highest point)
  7. A linear regression test r2 = 0.45 means:
    45% of the variation in the response variable can be accounted for by variation in the predictor variable
  8. In a positive correlation scatter plot, as x increases
    y increases
  9. Predictive value
    an estimate of the response variable based on the linear model
  10. Residual
    The difference between the predicted and observed values
  11. Observed value
    The value of an individual response variable found in a sample.
  12. In order to use a t-test, the sample data needs to be
    unimodal, symmetric distribution
  13. In a Scatterplot with dots that seem to follow a linear pattern the variables are
  14. t-test
    a test for differences between groups, using continuous data, with an unknown population standard deviation
  15. z-test
    a test for differences between groups, using continuous data, with a known population standard deviation
  16. ANOVA (F-test)
    • a test for differences between more than two groups whose group sizes are nearly equal. The data must be normally distributed and the groups must have similar variance. Similar to a t-test.
    • (ANalysis Of VAriance)
  17. Chi-squared test
    • a test which can be used for differences or correlations used with categorical data, non-normal distributions, or small sample sizes.
    • (χ2)
  18. Median
    value that splits the distribution in half
  19. Mode
    most frequently occurring value of values
  20. Standard Deviation
    the average distance of values in a sample from its mean
  21. Correlation
    type of test which checks for relationships between variables
  22. Reliability
    the degree to which a study's data are consistent
  23. Validity
    the degree to which a study's data are accurate
  24. Qualitative Data Types
    • Ordinal
    • Not measurable (observations)
    • Sex (Male/Female)
    • Positive/Negative test results
    • Likert Scale
    • Studied by Non-parametric statistical testing (Chi2)
  25. Likert Scale:
    • 1. Strongly Disagree
    • 2. Disagree
    • 3. Neutral
    • 4. Agree
    • 5. Strongly Agree
  26. Nominal
    • Categorical Data Types
    • (Nominal - urban, suburban, rural)
  27. Binary
    • Categorical Data Type
    • (Binary - dead/alive)
  28. Ordinal
    • Categorical Data Type
    • (Ordinal - mild, moderate, severe)
    • Likert Scale
  29. Categorical Data Types
    • (Nominal - urban, suburban, rural)
    • (Binary - dead/alive)
    • (Ordinal - mild, moderate, severe)
  30. Quantitative Data Types
    • Patient age
    • Weight
    • Cholesterol level
    • (Continuous Data types)
  31. Continuous Data Types
    • (numerical)
    • Lab values
    • Integer data
    • Ratio data (contains true zero)
    • Studied by parametric statistical testing (t-test)
  32. Variance
    The square root of the sum of the squared deviations divided by (n-1)
  33. Simple random sample
    Selecting subjects such that each subject is chosen entirely by chance and every member of the population had an equal chance of being selected
  34. Stratified random sample
    Selecting subjects such that each subject is randomly chosen from categories or participants that meet predetermined criteria
  35. Skewed Distribution
    Median is the best summary of the center of the distribution
  36. Researchers want to know if regular exercise or good nutrition is more effective for weight loss. What is the best null hypothesis for this study?
    "Exercise and nutrition are equally effective"
  37. How many modes does a normal distribution have?
  38. Alpha
    • The probability that the outcome happened by random chance alone
    • Probability of a Type 1 Error
    • This is compared to the p-value, if p<0.05 it is considered significant
  39. Beta
    β, often set at 0.20; power defined as 1-β; power indicates probability that a type II error (False Negative) will be rejected
  40. Critical value
    Numerical value associated with alpha
  41. Test statistic
    The value that results from the statistical hypothesis test (e.g. t, F, χ²; often scarce)
  42. p-value
    The proportion of area under the normal curve beyond the test statistic, p<0.05 it is considered significant, the probability that the test statistic represents normal variability
  43. Publication bias
    When an article is not published for some reason other than the quality of the article or the study
  44. Type 1 Error
    • Incorrectly rejecting the null hypothesis
    • (Saying there is a difference when in fact there is not.)
    • False Positive
  45. Type 2 Error
    • Incorrectly failing to reject the null hypothesis
    • False Negative
  46. Alpha halves
    The risk of Type 1 error when a test of equality is performed. If equality is tested, the researchers assume that either group could turn out superior, in which case alpha should be divided in half. When a test of equality is performed it is referred to as a two-tailed test, meaning there are two sides.
  47. Confidence Interval
    The range of the expected values that contains the true population's parameter within a given percentage likelihood (95%). A narrow range of CI is best.
  48. Frequency
    A descriptive statistic showing the number of observations or the proportion of observations (if expressed as a percentage)
  49. A correlation coefficient (r) of 1.0 means:
    the variables are 100% positively correlated
  50. A variable, other than the predictor variable, that influences the response variable is referred to as:
    a lurking variable
  51. Three questions for data studies
    • Is the study valid?
    • Are the results important?
    • Can the results help you?
  52. Positive skew
    • Right skewed - outliers are to the right end (more positive) end of graph
    • (Age of PA Class of 2016)
  53. Negative skew
    • Left skewed - outliers are the left end (more negative) end of the graph
    • (Age at death)
  54. Mann-Whitney
    u-test for non-parametric tests
  55. SEM
    SEM = S/√n, so that the more in your sample, the smaller the SEM is (to halve the SEM, need 4X the sample number, given the square root relationship).
  56. (Mode < median < mean) describes what kind of distribution?
    Positively (right) skewed
  57. A researcher notes that the 95% confidence interval is very wide in the first component of the clinical trial. Generally speaking, what would be the best way to halve the width of the confidence interval and maintain that level of statistical significance?
    quadruple the number of subjects in the study
  58. A test comparison of the effectiveness of a modified penicillin (94% cure rate) compared to (standard) penicillin G (82% cure rate) with respect to treating a pharyngeal abscess generates a p < 0.05. What does this mean?
    The probability is less than 0.05 that the new drug is only as effective as or less effective than regular penicillin.
  59. Statistical power addresses what
    Type II errors
  60. Kaplan-Meier curve
    • Survivor curve
    • plot changes over time as they occur
  61. With random sampling, we are testing to see if our samples are representing:
    the same underlying population (null hypothesis) or two different underlying populations (alternative hypothesis)
  62. Bonferroni correction
    (p value of interest)/(n of observations) ex: With 200 observations, 0.05/200= 0.00025 would be the “equivalent” p value that would suggest the null hypothesis or no difference
  63. Point Estimate
    A single value (statistic) that is measured from a sample (e.g. a mean or a proportion)
  64. What value should not be included in a Confidence Interval for it to be considered significant?
    When evaluating a difference a 95% CI including zero is non-significant, when evaluating a ratio the 95% CI including one is non-significant.
Card Set
Statistics - PAP-575