
Types of statistics
 Descriptive stats (summary of the data, represent data in an easily understandable way/graphs, charts, tables, average, range)
 inferential stats (sample vs. pop., experimental design, hypothesis testing)

Sample vs population
 samples should be:
 representative and unbiases (males & females, all ages)
 every type of subject should have the same chance of being included
 (normal distribution/random sampling)

Types of Data
 Categorical
 continuous
 discrete

Categorical Data
 nominal (unordered) eg dog breeds
 ordinal (ordered) eg body condition score of cattle

Continuous data
 Any positive value theoretically possible
 eg: weight, height

discrete data
 can only be integer values (whole numbers)
 eg: numbers of piglets in a litter

Bar Charts
Good for frequency (categorical or discrete)

Scatterplot
used for correlation and progression (regression)

Descriptive Statistics
 Data is collected so that we can obtain INFORMATION about a certain topic
 when only a few observations are made, it might be easy to see a potential relationship
 as more data is collected it's more difficult to obtain an overall picture

Histogram
Quantitive data/distribution

Stem and leaf
(not often used) number of observation above or below the median....??

Boxplot
 single and double to compair
 line represents median value

Numerical measures
 numerical measures are used to summarise the position, spread and shape of a data distribution
 used to describe the data so that we have a general idea of the data that we have and the population that it might have come from or represent

Measures of central tendency
 "averages" or the middle of data
 Mean= sum of all observations/number of observations
 median= middle observation (half smaller, half larger)
 mode= value that occurs most frequently

Measures of spread
 Range difference between largest and smallest observation
 standard deviation measure of spread about the mean (SD= Square root of variance)

Shape (skewness/kurtosis/etc)
 Various shape statistics exist:
 Skewness (is it symmetrical or not)
 Kurtosis (how concentrated is the data around the mean)
 (and more)

Probablility Theory
 Generally very poorly understood
 describes outcomes that depend on chance
 eg rolling a dice, tossing a coin, infected with disease, pups in a litter, etc.
 can almost never predict an outcome w/ total accuracy, but can describe whay MIGHT happen, or the probability of different outcomes

Probability distributions
 The probability of an outcome given that we know what happens in the 'system' (variability, predict the future)
 What we believe about the 'system' given that we know the outcome (uncertanty, estimating the true population parameters)

Normal distribution
 (Gaussian, Bell curve)
 Described by mean, sd
 data can be any continuous value
 symmetrical distribution
 mean=median=mode
 ex: birth weights, heights, live weights gains, body temperatures, serum biochemistry parameters

Poisson distribution
 used for count data (integer)
 described by mean only
 asymmetrical distribution
 mean does not equal median does not equal mode
 examples: pups in a litter, cars on the street, earthquaks in a year

Binomial distribution
 used for binary outcomes (yes/no, pass/fail, m/f, dies/survvives)
 described by the probability of a success at each trial, and the number of trials
 ex: number of heads out of 10 tosses of a coin, number of female calves from sexed semen, number of you will pass exams

Hypothesis testing
 used for research scientists:
 does drug A kill mice faster than drug B
 do a greater proportion of smokers than nonsmokers get lung cancer?
 (also relevant to vets)
 5 Steps!!

5 steps to Hypothesis testing
 Think of a question you want to ask
 put the question into a testable format
 collect the data
 apply the correct statistical test
 interpret the results of the test

Generating a hypothesis to test:
 what do we want to find?
 how many groups are we comparing?
 typically a simple question with a yes or no answer
 ex: are these 2 groups of calves growing at the same rate?, did pyoderma cases given synulox recover at the same speed as ampicillin?

The 'NULL' hypothesis (and alternative hypothesis)
 The baseline belief there is NO difference in groups/drugs (denoted H_{0})
 the alternative hypothesis:
 opposite of the baseline belief there IS a difference in groups/drugs (denoted H_{1})

Hypothesis testing
 Goal is to provide evidence that the 'Null' hypothesis is WRONG! there is a difference between groups/drugs
 BUT! we have to account for the effects of outcomes being uncertain
 the difference between the groups/drugs is more than would be expected by chance

Rejecting the Null hypothesis
 It is always possible that the difference between 2 sets of observations is entirely chance! (that the pops. are really the same even though the samples look diff.)
 this becomes less and less likely as the magnitude of the differences increase and number of observations increases

Confidence intervals
 use confidence intervals to look at the data in a more formal way
 do the confidence intervals for the parameter of interest in each group overlap (95% 2.5 high and 2.5%low)
 The more data we have the small the confidence intervals become

Rejecting the null hypothesis with Confidence intervals
 the amount of overlap in confidence intervals reflects the probability (pvalue) with which we reject the null hypothesis
 if there is LITTLE overlap, we reject Ho
 how little is given by the pvalue (0.05)
 this makes no comment at all about the magnitude (or biological impact) of the difference

Failing to reject the null hypothesis
 if there is not enough evidence to prove the groups are different we cannot reject the null hypothesis
 (this does not necessarily mean that there really is no difference, only we couldn't find any difference in the samples obtained)

Statistical signifiance DOES NOT EQUAL biological relevance
Remember this!

Can compare means by?
Using a ttest

Camparison of means...?
can compare one mean or two or more than two

Compare ONE mean (with a fixed number)
 Confidence interval approach
 look at sample mean, size and sd. 95% confidence interval... does the fixed number overlap the conficence interval?
 Null: population mean = XXX
 alt: Population does not = XXX

Significance Testing
 looks at how far the observed sample mean is from the population mean
 if P value is lower thatn 0.05 than it is significant (reject null) if greater than 0.05 than it is NOT significatn and accept null

comparison of TWO means (with each other)
 Ttest
 95% CI for difference between means
 take mean of each group
 null: means are the same
 alt: means are not the same
 P > 0.05 accept null
 P< 0.05 reject null

Paired Values
 Pre and post treatment (somatic cell count sub clinical mastitis, createnine kinase exertional rhabdomyolysis)
 before on or after a certain date (hormone levels for oestrus detection)
 compare the same thing at 2 different times in the same animal

Paired Ttests
 (example)
 weight before diet and weight at 3 months on diet
 95% CI for mean difference
 Ttest of mean difference (=0 versus not = 0)
 NOT independent!!! CI becomes tighter

Comparison of means (more than 2 means)
 comparison of means may be extended into 3 groups
 more complex takes into account the variance between and w/in groups
 ex: are the daily grouth rates of pigs in 3 rearing units different?

ANOVA (analysis of Variance)
 null: all means are the same
 alt: at least one mean is different
 F (variance ratio)
 P < 0.05 = evidence of diff. between population means (but which ones!!??)
 Must compare each...

Compare Ranks
 use a nonparametric equivalent of a ttest (or similar)
 use if data is NOT normally distributed

Nonparametric tests
 compare ranks
 corresponding confidence intervals is for difference in population medians
 tests work by ranking the scores and then computing the average rank and then testing:
 Null: there is no diff. between sum of ranks of groups
 Alt: a deff. exists between sum of ranks of groups

Parametric vs. nonparametric (equivalents)
 Parametric: NonParametric
 1sample ttest1 sample Wilcoxon signed rank test
 2sample ttestMann Whitney U test/Wilcoxon rank sum test
 Paired ttestWilcoxon signed rank test
 One Way ANOVAKruskal Wallis Test

Comparison of proportions
 use a Chisquare test (or equivalent)
 these are used for categorical data
 null: proportion of infection is the same for draped and undraped cases OR Null: That drape use and infection are independent
 alt: proportion of infection is DIFFERENT between draped and undraped cases
 eg:
 252 surgical colic cases
 102 used a drape, 150 did not
 73 postop infections
 is there a difference in proportion of infection in those that used a drape compared to those who did not?
 Chisquare test statistic is then based on the differences between observed and expected values
 (remember this is a shown association/relationship DOES NOT SAY: leaving undraped WILL cause infection/bias....)

