When the null hypothesis is rejected despite it actually being true (finding a false effect)
What is the type 2 error? (regarding power)
when the null hypothesis is retained despite there being an effect present
What is power a measure of?
the probability of correctly rejecting a false null hypothesis
What are the 5 factors affecting power?
size of alpha, direction of the alternative hypothesis, size of effect (gamma), size of sigma (variability), size of n (sample size)
How does size of alpha affect power?
Changing alpha (eg from 0.05 to 0.01), increases the chance of a type II error (more likely to miss an effect)
How does size of effect (gamma) affect power?
increases the risk of a type II error (retaining the null incorrectly)
What is the meaning of effect size (gamma)?
Gamma measures the size of an effect found in terms of standard deviation. So a gamma of 1.5 literally means that the hypothesised mean is 1.5 SD away from the true mean
Which are the 2 (of the 5) factors affecting power, which do so by affecting the size of standard error?
size of sigma & size of n
What are the values for effect size based on Cohen's rule of thumb guidelines?
small gamma=0.2, medium gamma=0.3, large gamma=0.8
What does it mean if you have a power of 0.98?
It means that 98% of the time, you will detect the effect (if an effect is present)
What is the difference between gamma & delta? (power)
Gamma is a measure of the effect size between 2 means & tells us how many SD apart they are (which can be used to calculate delta).
Delta is a statistic which takes both gamma (effect) and n (sample size) into account, it tells us how far apart they are in terms of SE
What is the usual desired level of power?
usually 80% or higher
What is a t-value a measure of?
Number of standard errors that 2 means are apart by
what is ρ?
correlation in population
What is an independent variable?
The variable that is manipulated in an experiment, in order to measure the resulting change in the dependent variable (eg, manipulating number of shop assistants in a store in order to measure how many customers are served)
What is a dependent variable?
The variable that is measured for change, as a result of the independent variable/s affecting it (eg, severity of illness changing as a result of drugs given)
In Cohen's guide to effect sizes for correlations, what are the values for small, medium, & large?
(small) r= 0.1
(medium) r= 0.3
(large) r =0.5
What is multiple regression? (as opposed to single)
Where we use more than 1 X (independent variable) to predict Y (dependent variable)
What is the assumption about effect size (gamma), that we make when talking about correlation?
That correlation IS the effect size (ie, r=gamma)
In the regression equation of a straight line, Y = a + bx, what does "a" stand for?
The Y intercept (ie, in Y = 3 + 2x, if x = 0 then Y = 3)
In the regression equation of a straight line, Y = a + bx, what does "b" stand for?
The slope (ie, how much the line goes up for every unit you go across)
What does it mean if instead of having Y = a + bx, we have Y = a - bx?
It means that the regression line will be negative rather than positive
What does SS stand for?
The sum of squared deviations around a mean
How is SS total [ E (Y - My) 2 ] calculated?
SS regression [ E (Y' - My) 2 ] + SS residual [ E (Y - Y') 2]
What is the defining feature of a Quasi-experiment?
There is no random allocation
What is considered a large sample size?
Typically 30 or more
What is the "standard error of estimate"?
The standard deviation of the distribution of observed scores around the corresponding predicted score (SD of the errors) (ie, Syx measures predictive error). So if Syx = 2.2 then approx 68% of actual Y scores will be within 2.2 points of Y', bcos 68% of scores are within 1 SD of the mean)
What percentages of scores are within 1, 2 & 3 standard deviations of the mean? (3 seperate percentage values)
68% are within 1 SD
95% are within 2 SD
99% are within 3 SD
What does calculating the "standardised residual" tell us?
It can answer questions like: How unusual is it to obtain a particular Y, given Y' = ?
(ie, if Y' = 27, what percentage of scores would be likely to have Y = 30)
How do you work out the area beyond your Zc score? (as opposed to the area from the mean to your Zc)
Look up the area of the Zc on the tables, then calculate 0.5-area
What happens when you restrict the range of values of either X or Y? (like limiting UAI [X] to only >95)
It reduces the observed correlation between X & Y (eg, if r = 0.5, restricting X to the top 5%, now r < 0.5)
Why should a prediction equation be cross-validated with another sample?
Because "confidence limits" are not the same as "confidence intervals". Variability of Y values around Y' is due to real variability in the population. "C. Limits" do not take sampling variability into account & should really only be used for large samples.
What does the strength of correlation between X & Y do to the width of the confidence limit interval for Y?
The stronger the correlation, the smaller the width of the interval
Apart from the size of the relationship between X & Y, what else does the size of the correlation coefficient depend on?
Also on the variability between X & Y (restricting the values of either X or Y that you measure, also reduces the correlation between X & Y)
What kind of X2 (chi) test would you use if you have 1 categorical variable?
Goodness of fit
What kind of X2 (chi) test would you use if you have more than 1 categorical variable?
Test of independence (eg, divorced/married = 1 category, ethnicity = 1 category). We want to see if marital status is independent from ethnicity or if something is going on there.
When can we state direction of effect in X2 (chi) "goodness of fit" test, & when can't we?
We can when there is only 2 categorical variables & the df=1. We can't state direction if there is more than 2 categorical variables & df = > 1
What else determines the value of X2 (chi), (apart from the discrepency between fo & fe)?
It is also (fo - fe)2 relative to fe that determines the value of X2 (it depends on the magnitude of what you would expect in the 1st place)
What are the assumptions for using a X2 (chi) test?
Random sampling (just like other tests); Observations are independent (categories are mutually exclusive bcos each participant is only counted once); All participants are categorised (can't leave anyone out)
What is Cramer's phi?
Like a correlation coefficient bcos it is bound by 0 to1.
Different though bcos it is not directional.
It is an Index of how strong your effect is between treatment & outcome, independent of sample size.
How do the calculations for X2 (chi) differ from all the other calculations, when it comes to using SPSS?
If you already have the frequencies, it's actually easier to calculate by hand
What is a time-saving feature of "factorial designs" regarding hypothesis testing?
They can test more than 1 hypothesis at once
How many factors (& levels) are there in a "factorial design" which looks at gender & level of stress (high or low)?
2 factors, each with 2 levels.
Factor 1 is Gender with 2 levels: Male & Female.
Factor 2 is Stress Level with 2 levels: High & Low
When plotting a graph for data in a factorial design, which factor is plotted on the 2 axis' & on which axis is the DV placed?
We plot Factor B (manipulated factor), & place the DV on the vertical axis (with the lines inside the graph representing Factor A)
What is a main effect? (ie, how is it calculated)
Is there effect of Factor B when you average across Factor A, & is there effect of Factor A when you average across Factor B (this can be seen by working out the marginal means)
What is an interaction effect (ie, how is it calculated)?
Differences between cell values, either in columns or rows (eg, does the difference between b1 & b2, differ across Levels of A?)
As a general rule, how big does the difference in marginal means have to be to say there is a main effect?
1-2 points or less = no main effect, More than this & you can say there is a main effect
For between-subjects design, what else increases as the number of factors increases?
Sample size (although this can be avoided by using repeated measures - within subjects design)
What are 4 ways you can decrease random (error) variance in estimates?
Reducing individual differences
Using repeated measures design
Reducing measurement errors (proxy measures)
Increasing sample size
What is statistical conclusion validity?
whether the DV & IV covary & how strongly (relates to power)
What is internal validity?
Are differences in the causal explanation responsible for the observed association (is it plausible)?
What is construct validity?
Do the DV & IV represent the constructs they are presumed to?
What is external validity?
To what extent can the causal relationship be generalised across alternative measures, persons, settings, times?
Threats to statistical conclusion validity? (IV & DV covariation & strength & power)
low power, violation of test assumptions (normal distribution, observations independent), fishing for significance, unreliable measures, unreliable implementation of IV
Threats to internal validity? (IV & DV the most plausible explanation for relationship)
(experimental design) alternative hypotheses not ruled out, maturation, (control group) testing effects, instrumentation changes, regression to the mean, attrition, (random allocation) diffusion of IV to other groups, control rivalry, (blind raters) experimenter bias, demand characteristics
Threats to construct validity? (IV & DV represent what they should)
Threats to external validity? (generalising ability)