
Nominal:
 CATEGORY
 NO ORDER
 categorical measures that do not have an order
 e.g. color (red/blue/green/etc);
 types of teeth (molars/incisors/premolars/canine)

Ordinal:
 CATEGORY
 ORDER of intensity
 catagoical measures that have an order of intensity/degree
 e.g. stage of oral cancer (stages iiv); curvature of dental
 root (straight/slight curvature/pronounced curvature)

Interval
 CONTINUOUS
 NO TRUE ZERO
 ex: dates

RATIO
 CONTINUOUS
 TRUE ZERO
 ex. perio pocket depth

Continuous Measure
 Interval: measures that do not have a true zero; the relative difference is the key
 e.g. temperature; dates;
 Ratio: measures that have a true zero
 e.g. depth of periodontal pocket; size of oral lesion.

Range
 Distance between the largest and the smallest observation
 Simplest measure of variability

Percentile
 Point below which a specified percent of observations lie
 Percentile of an observation x is given by:
(# of obs less than x) + 0.5 /total number of obs in data X 100

Central Location
 The value on which a distribution tends to
 center
 Mean: the arithmetic average
 Median: the middle item of the data set
 Mode: the most frequent value

Confidence Interval (CI)
 Measures the likelihood that the true value of a population parameter (e.g., mean) is within the margin of error of the sample estimate.
 95% CI is the range of values that would cover the true population parameter 95% over time.
 95% CI for a normal distribution: will “capture” µ 95% of the time.

Descriptive Statistics
 Dispersion
 Variance  measures the variation
 Standard Deviation (SD)the square root of the
 variance, denoted by σ , has the same unit as x
 Standard Error (SE)an estimate of the precision of parameter estimates. It measures the variability of an estimate due to sampling:
 Kurtosischaracterizes the relative peakedness or
 flatness of a distribution (2 to infinity)
 Skewnessmeasures the asymmetry of a distribution: (3 to 3

Frequency
 Most commonly used method to describe categorical measures
 Consists of categories, the number of observations
 and percentage corresponding to each category:


Hypothesis Testing
 Goal: judge the evidence for a hypothesis
 Steps for hypothesis testing
 ♦ Stating the null & alternative hypothesis
 ♦ Choosing an appropriate statistical test
 ♦ Conducting the statistical test to obtain the pvalue
 ♦ Comparing the pvalue against a fixed cutoff for statistical significance – α (usually 0.05) and make conclusion 12

Type I error
 REJECT TRUE NULL
 Reject a null hypothesis when it is truewe have
 committed a Type I error (α error—0.05).

Type II error
 ACCEPT FALSE NULL
 Accept a null hypothesis when it is falsewe have
 committed a Type II error (β error—0.2).

Pvalue of a test
Probability that the test statistics assumes a value as extreme as, or more extreme than, that observed, given that the null hypothesis is true.

Power
 (1β) Probability that you reject the
 null hypothesis, given that the alternative hypothesis
 is true.

Parametric test
 Statistical procedures based on distribution assumptions
 ttest
 Analysis of Variance (ANOVA)
 ChiSquare test

Nonparametric test
 Statistical procedures not based on distribution assumptions
 Signtest
 KruskalWallis test (nonparametric ANOVA)

2group Ttest:
Compare whether two independent groups have the same mean of a normally distributed variable with unknown variance.

ANOVA
 Test means among multiple groups
 Uses Ftest. It is a generalization of ttest and equivalent to ttest if comparing two groups.
 Data will need to satisfy several assumptions (e.g., the outcome has a normal distribution; equal variance for each group; the data are independent between and within groups.)
 Example
 Null=means of all groups are equal
 Fstat exceeds the critical value for 5% level with a pvalue of 0.000<0.05
 not all means of three groups are the same.
 Pairwise comparison of means

ChiSquare Test
 Compare observed data with the data we would expect to
 obtain according to a specific hypothesis.
 Steps of χ2
 goodness of fit test
 Divided the data into c categories;
 Estimate k parameters of the probability model with your
 hypothesis;
 Compute observed and corresponding expected cell
 frequencies;
 Test Statistic:
 1. Create 6 intervals (categories): X ≤16.25, 16.25 < X ≤ 17.20, 17.20 <
 X ≤ 18.15, 18.15 < X ≤ 19.10, 19.10 < X ≤ 20.05, and 20.05 < X.
 2. Null hypothesis H0
 : the underlying distribution from which the
 measurements came is N(18.37, 1.92), i.e. the normal distribution
 with mean 18.37, variance 1.92.
 3. Calculate the observed frequency and expected frequency.
 The pvalue is 0.1072, we will accept the null hypothesis .

Sign test
 Used to test if there is a difference between paired
 samples.
 Independent pairs of sample data are collected:
 (x1,y1) (x2, y2)…, the difference of the pairs are
 calculated, and zeros are ignored.
 The null hypothesis is: equal numbers of positive
 and negative differences.
 A onesided sign test has pvalue 0.1719 indicating that it is not significant at 5% levelno statistically significant difference in # of patients seen between the two offices.

KruskalWallis (KW) Test
 Based on the rank of observations to compare the distribution of a continuous variable among more than two groups—nonparametric ANOVA.
 The only assumption required for the population distributions is that they are independent, and continuous.
 Many software provide such test (e.g., kwallis in STATA.)

Analysis of Covariance (ANCOVA)
 Continuous outcome
 Merger of ANOVA and Regression

Logistic Regression
 binary outcome
 Simple  single predictor
 Multiple  two or more predictors
 Dependent variable is binary
 Logistic function is nonlinear in terms of the probability of event

Linear Regression
 continuous outcome
 Simple  single predictor
 Multiple  two or more predictors
 dependent>independent
 predicted>predictors
 response > explanatory
 outcome>covariates

Logistic Regression
 The dependent variable is binary (e.g. whether inflammation of the gingiva presents.)
 Logistic function is nonlinear in terms of the probability of event.
 The parameter estimates can be expressed as odds ratio, which describe the relationship between exposure and
 outcome, controlling for other factors.

Analysis of Covariance (ANCOVA)
 A method for comparing mean values of the outcome between groups when adjusting for covariates (e.g., compare mean LOA across groups, adjusting for age)
 The response is continuous and the covariates can be both continuous and categorical
 An extension of ANOVA or a combination of ANOVA and linear regression

Statistical significance
 Desired outcome of a study, planning to have enough sample size is of prime importance.
 – Due to limitations of resources and availability of subjects, we can only get limited sample size.

Sample Size & Statistical Power
 Five key factors
 1. Sample sizethe minimum number of unique subjects in your data required to detect a certain difference
 2. Effect sizethe difference between parameters to be tested, (e.g., difference in LOA between groups)
 3. Significance level (Type I error)the probability that we reject a null hypothesis when it is true(commonly at 0.05)
 4. Power the probability of rejecting a null hypothesis when it is false (equals to 1Type II error; commonly at 0.8)
 5. Variability  variation of the outcome measure

