-
Types of statistics
- Descriptive stats (summary of the data, represent data in an easily understandable way/graphs, charts, tables, average, range)
- inferential stats (sample vs. pop., experimental design, hypothesis testing)
-
Sample vs population
- samples should be:
- representative and unbiases (males & females, all ages)
- every type of subject should have the same chance of being included
- (normal distribution/random sampling)
-
Types of Data
- Categorical
- continuous
- discrete
-
Categorical Data
- nominal (unordered) eg dog breeds
- ordinal (ordered) eg body condition score of cattle
-
Continuous data
- Any positive value theoretically possible
- eg: weight, height
-
discrete data
- can only be integer values (whole numbers)
- eg: numbers of piglets in a litter
-
Bar Charts
Good for frequency (categorical or discrete)
-
Scatterplot
used for correlation and progression (regression)
-
Descriptive Statistics
- Data is collected so that we can obtain INFORMATION about a certain topic
- when only a few observations are made, it might be easy to see a potential relationship
- as more data is collected it's more difficult to obtain an overall picture
-
Histogram
Quantitive data/distribution
-
Stem and leaf
(not often used) number of observation above or below the median....??
-
Boxplot
- single and double to compair
- line represents median value
-
Numerical measures
- numerical measures are used to summarise the position, spread and shape of a data distribution
- used to describe the data so that we have a general idea of the data that we have and the population that it might have come from or represent
-
Measures of central tendency
- "averages" or the middle of data
- Mean= sum of all observations/number of observations
- median= middle observation (half smaller, half larger)
- mode= value that occurs most frequently
-
Measures of spread
- Range- difference between largest and smallest observation
- standard deviation- measure of spread about the mean (SD= Square root of variance)
-
Shape (skewness/kurtosis/etc)
- Various shape statistics exist:
- Skewness (is it symmetrical or not)
- Kurtosis (how concentrated is the data around the mean)
- (and more)
-
Probablility Theory
- Generally very poorly understood
- describes outcomes that depend on chance
- eg rolling a dice, tossing a coin, infected with disease, pups in a litter, etc.
- can almost never predict an outcome w/ total accuracy, but can describe whay MIGHT happen, or the probability of different outcomes
-
Probability distributions
- The probability of an outcome given that we know what happens in the 'system' (variability, predict the future)
- What we believe about the 'system' given that we know the outcome (uncertanty, estimating the true population parameters)
-
Normal distribution
- (Gaussian, Bell curve)
- Described by mean, sd
- data can be any continuous value
- symmetrical distribution
- mean=median=mode
- ex: birth weights, heights, live weights gains, body temperatures, serum biochemistry parameters
-
Poisson distribution
- used for count data (integer)
- described by mean only
- asymmetrical distribution
- mean does not equal median does not equal mode
- examples: pups in a litter, cars on the street, earthquaks in a year
-
Binomial distribution
- used for binary outcomes (yes/no, pass/fail, m/f, dies/survvives)
- described by the probability of a success at each trial, and the number of trials
- ex: number of heads out of 10 tosses of a coin, number of female calves from sexed semen, number of you will pass exams
-
Hypothesis testing
- used for research scientists:
- does drug A kill mice faster than drug B
- do a greater proportion of smokers than non-smokers get lung cancer?
- (also relevant to vets)
- 5 Steps!!
-
5 steps to Hypothesis testing
- Think of a question you want to ask
- put the question into a testable format
- collect the data
- apply the correct statistical test
- interpret the results of the test
-
Generating a hypothesis to test:
- what do we want to find?
- how many groups are we comparing?
- typically a simple question with a yes or no answer
- ex: are these 2 groups of calves growing at the same rate?, did pyoderma cases given synulox recover at the same speed as ampicillin?
-
The 'NULL' hypothesis (and alternative hypothesis)
- The baseline belief- there is NO difference in groups/drugs (denoted H0)
- the alternative hypothesis:
- opposite of the baseline belief- there IS a difference in groups/drugs (denoted H1)
-
Hypothesis testing
- Goal is to provide evidence that the 'Null' hypothesis is WRONG!- there is a difference between groups/drugs
- BUT! we have to account for the effects of outcomes being uncertain
- the difference between the groups/drugs is more than would be expected by chance
-
Rejecting the Null hypothesis
- It is always possible that the difference between 2 sets of observations is entirely chance! (that the pops. are really the same even though the samples look diff.)
- this becomes less and less likely as the magnitude of the differences increase and number of observations increases
-
Confidence intervals
- use confidence intervals to look at the data in a more formal way
- do the confidence intervals for the parameter of interest in each group overlap (95%- 2.5 high and 2.5%low)
- The more data we have the small the confidence intervals become
-
Rejecting the null hypothesis with Confidence intervals
- the amount of overlap in confidence intervals reflects the probability (p-value) with which we reject the null hypothesis
- if there is LITTLE overlap, we reject Ho
- how little is given by the p-value (0.05)
- this makes no comment at all about the magnitude (or biological impact) of the difference
-
Failing to reject the null hypothesis
- if there is not enough evidence to prove the groups are different we cannot reject the null hypothesis
- (this does not necessarily mean that there really is no difference, only we couldn't find any difference in the samples obtained)
-
Statistical signifiance DOES NOT EQUAL biological relevance
Remember this!
-
Can compare means by?
Using a t-test
-
Camparison of means...?
can compare one mean or two or more than two
-
Compare ONE mean (with a fixed number)
- Confidence interval approach
- look at sample mean, size and sd. 95% confidence interval... does the fixed number overlap the conficence interval?
- Null: population mean = XXX
- alt: Population does not = XXX
-
Significance Testing
- looks at how far the observed sample mean is from the population mean
- if P value is lower thatn 0.05 than it is significant (reject null) if greater than 0.05 than it is NOT significatn and accept null
-
comparison of TWO means (with each other)
- T-test
- 95% CI for difference between means
- take mean of each group
- null: means are the same
- alt: means are not the same
- P > 0.05 accept null
- P< 0.05 reject null
-
Paired Values
- Pre and post treatment (somatic cell count- sub clinical mastitis, createnine kinase- exertional rhabdomyolysis)
- before on or after a certain date (hormone levels for oestrus detection)
- compare the same thing at 2 different times in the same animal
-
Paired T-tests
- (example)
- weight before diet and weight at 3 months on diet
- 95% CI for mean difference
- T-test of mean difference (=0 versus not = 0)
- NOT independent!!! CI becomes tighter
-
Comparison of means (more than 2 means)
- comparison of means may be extended into 3 groups
- more complex- takes into account the variance between and w/in groups
- ex: are the daily grouth rates of pigs in 3 rearing units different?
-
ANOVA (analysis of Variance)
- null: all means are the same
- alt: at least one mean is different
- F (variance ratio)
- P < 0.05 = evidence of diff. between population means (but which ones!!??)
- Must compare each...
-
Compare Ranks
- use a non-parametric equivalent of a t-test (or similar)
- use if data is NOT normally distributed
-
Non-parametric tests
- compare ranks
- corresponding confidence intervals is for difference in population medians
- tests work by ranking the scores and then computing the average rank and then testing:
- Null: there is no diff. between sum of ranks of groups
- Alt: a deff. exists between sum of ranks of groups
-
Parametric vs. non-parametric (equivalents)
- Parametric: Non-Parametric
- 1-sample t-test----------1 sample Wilcoxon signed rank test
- 2-sample t-test----------Mann Whitney U test/Wilcoxon rank sum test
- Paired t-test-------------Wilcoxon signed rank test
- One Way ANOVA-----Kruskal Wallis Test
-
Comparison of proportions
- use a Chi-square test (or equivalent)
- these are used for categorical data
- null: proportion of infection is the same for draped and undraped cases OR Null: That drape use and infection are independent
- alt: proportion of infection is DIFFERENT between draped and undraped cases
- eg:
- 252 surgical colic cases
- 102 used a drape, 150 did not
- 73 post-op infections
- is there a difference in proportion of infection in those that used a drape compared to those who did not?
- Chi-square test statistic is then based on the differences between observed and expected values
- (remember this is a shown association/relationship- DOES NOT SAY: leaving undraped WILL cause infection/bias....)
|
|