
Definition of Statistics
The theory, procedures, and methodology by which data are summarized.

Descriptive Statistics
serve the function of numerically describing a phenomenon, such as how many patients are diagnosed with lung cancer per year or the average cholesterol levels among patients in the control group and the patients in the treatment group.

Inferential Statistics
make predictions about a population based on a representative sample. (Outcome of treatment group is assumed to predict how the population will respond.)

For a normal distribution, with a mean of 500 and standard deviation of 100. A score of 600 is what percent better than the rest of the scores.
84%

Percent under normal curve for:
1 st dev
2 st dev
3 st dev

On a normal curve, where is the mean and median?
Center of the curve (highest point)

A linear regression test r2 = 0.45 means:
45% of the variation in the response variable can be accounted for by variation in the predictor variable

In a positive correlation scatter plot, as x increases
y increases

Predictive value
an estimate of the response variable based on the linear model

Residual
The difference between the predicted and observed values

Observed value
The value of an individual response variable found in a sample.

In order to use a ttest, the sample data needs to be
unimodal, symmetric distribution

In a Scatterplot with dots that seem to follow a linear pattern the variables are
correlated

ttest
a test for differences between groups, using continuous data, with an unknown population standard deviation

ztest
a test for differences between groups, using continuous data, with a known population standard deviation

ANOVA (Ftest)
 a test for differences between more than two groups whose group sizes are nearly equal. The data must be normally distributed and the groups must have similar variance. Similar to a ttest.
 (ANalysis Of VAriance)

Chisquared test
 a test which can be used for differences or correlations used with categorical data, nonnormal distributions, or small sample sizes.
 (χ2)

Median
value that splits the distribution in half

Mode
most frequently occurring value of values

Standard Deviation
the average distance of values in a sample from its mean

Correlation
type of test which checks for relationships between variables

Reliability
the degree to which a study's data are consistent

Validity
the degree to which a study's data are accurate

Qualitative Data Types
 Ordinal
 Not measurable (observations)
 Sex (Male/Female)
 Positive/Negative test results
 Likert Scale
 Studied by Nonparametric statistical testing (Chi^{2})

Likert Scale:
 1. Strongly Disagree
 2. Disagree
 3. Neutral
 4. Agree
 5. Strongly Agree

Nominal
 Categorical Data Types
 (Nominal  urban, suburban, rural)

Binary
 Categorical Data Type
 (Binary  dead/alive)

Ordinal
 Categorical Data Type
 (Ordinal  mild, moderate, severe)
 Likert Scale

Categorical Data Types
 (Nominal  urban, suburban, rural)
 (Binary  dead/alive)
 (Ordinal  mild, moderate, severe)

Quantitative Data Types
 Patient age
 Weight
 Cholesterol level
 (Continuous Data types)

Continuous Data Types
 (numerical)
 Lab values
 Integer data
 Ratio data (contains true zero)
 Studied by parametric statistical testing (ttest)

Variance
The square root of the sum of the squared deviations divided by (n1)

Simple random sample
Selecting subjects such that each subject is chosen entirely by chance and every member of the population had an equal chance of being selected

Stratified random sample
Selecting subjects such that each subject is randomly chosen from categories or participants that meet predetermined criteria

Skewed Distribution
Median is the best summary of the center of the distribution

Researchers want to know if regular exercise or good nutrition is more effective for weight loss. What is the best null hypothesis for this study?
"Exercise and nutrition are equally effective"

How many modes does a normal distribution have?
1

Alpha
 The probability that the outcome happened by random chance alone
 Probability of a Type 1 Error
 This is compared to the pvalue, if p<0.05 it is considered significant

Beta
β, often set at 0.20; power defined as 1β; power indicates probability that a type II error (False Negative) will be rejected

Critical value
Numerical value associated with alpha

Test statistic
The value that results from the statistical hypothesis test (e.g. t, F, χ²; often scarce)

pvalue
The proportion of area under the normal curve beyond the test statistic, p<0.05 it is considered significant, the probability that the test statistic represents normal variability

Publication bias
When an article is not published for some reason other than the quality of the article or the study

Type 1 Error
 Incorrectly rejecting the null hypothesis
 (Saying there is a difference when in fact there is not.)
 False Positive

Type 2 Error
 Incorrectly failing to reject the null hypothesis
 False Negative

Alpha halves
The risk of Type 1 error when a test of equality is performed. If equality is tested, the researchers assume that either group could turn out superior, in which case alpha should be divided in half. When a test of equality is performed it is referred to as a twotailed test, meaning there are two sides.

Confidence Interval
The range of the expected values that contains the true population's parameter within a given percentage likelihood (95%). A narrow range of CI is best.

Frequency
A descriptive statistic showing the number of observations or the proportion of observations (if expressed as a percentage)

A correlation coefficient (r) of 1.0 means:
the variables are 100% positively correlated

A variable, other than the predictor variable, that influences the response variable is referred to as:
a lurking variable

Three questions for data studies
 Is the study valid?
 Are the results important?
 Can the results help you?

Positive skew
 Right skewed  outliers are to the right end (more positive) end of graph
 (Age of PA Class of 2016)

Negative skew
 Left skewed  outliers are the left end (more negative) end of the graph
 (Age at death)

MannWhitney
utest for nonparametric tests

SEM
SEM = S/√n, so that the more in your sample, the smaller the SEM is (to halve the SEM, need 4X the sample number, given the square root relationship).

(Mode < median < mean) describes what kind of distribution?
Positively (right) skewed

A researcher notes that the 95% confidence interval is very wide in the first component of the clinical trial. Generally speaking, what would be the best way to halve the width of the confidence interval and maintain that level of statistical significance?
quadruple the number of subjects in the study

A test comparison of the effectiveness of a modified penicillin (94% cure rate) compared to (standard) penicillin G (82% cure rate) with respect to treating a pharyngeal abscess generates a p < 0.05. What does this mean?
The probability is less than 0.05 that the new drug is only as effective as or less effective than regular penicillin.

Statistical power addresses what
Type II errors

KaplanMeier curve
 Survivor curve
 plot changes over time as they occur

With random sampling, we are testing to see if our samples are representing:
the same underlying population (null hypothesis) or two different underlying populations (alternative hypothesis)

Bonferroni correction
(p value of interest)/(n of observations) ex: With 200 observations, 0.05/200= 0.00025 would be the “equivalent” p value that would suggest the null hypothesis or no difference

Point Estimate
A single value (statistic) that is measured from a sample (e.g. a mean or a proportion)

What value should not be included in a Confidence Interval for it to be considered significant?
When evaluating a difference a 95% CI including zero is nonsignificant, when evaluating a ratio the 95% CI including one is nonsignificant.

