What is the definition of a test?
- A standardized procedure
- for sampling behavior and describing it with categories or scores.
What are the common characteristics that most
- Standardized procedure, behavior sample, scores
- or categories, norms or standards, prediction of nontest behavior
What is a norm-referenced test?
- test that use a
- well-defined population of persons for their interpretive population.
What is a criterion-reference test?
- Tests that measure what a person can do rather
- than comparing results to the performance levels of others.
- What are the different types
of psychological tests?
- Intelligence test:
- measure an individual’s ability in relatively global areas such as verbal
- comprehension, perceptual organization, or reasoning and therby help
- determine potential for scholastic work or certain occupations.
- Aptitude Test: Measure
- the capability for a skill
- Achievement Test: Measure
- a person’s degree of learning, success, or accomplishment in a subject or
- Personality test: measure
- the traits, qualities or behaviors that determine a person’s
- Interest Inventories:
- Measure an individuals preference for certain activities.
- Behavioral procedures:
- objectivily describe and count the frequency of a behavior, identifying
- the antecedents and conseuences of the behavior.
- Neuropsychological test:
- measure cognitive, sensory, perceptual, and motor performance to
- determine the extent, locus, and behavioral consequnces of brain damage.
- What are psychological
tests primarily used to do?
- Diagnosis and treatment
- Program evaluation
. What are the common uses of tests?
Make decisions about persons.
- What are desirable test
- Examiners must be
- familiar with materials and directions before giving tests out.
What are the primary responsibilities of test
- Publication: no pre-maturely released
- Marketing: advertise it honestly
- Distribution: only to trained ppl. Class A , B(BA), C
What are the three levels of qualifications test users
must meet for purchasing tests?
- A. nonpsychologiest: business executives or educational
- b. B. completed an advance course in testing from college
- (aptitude, personality)
- c. Master degree minimum. (i.e., What kinds of tests could be bought at
- the different levels?)
What is meant when we say testing should be in the
“best interest of the client?”
- a. Test should be given to benefit the client not harm
What is the Tarasoff case? What is meant by ‘duty to
warn’ and when does a psychologist have a duty to warn?
- a. You warn the persons in danger. And you have to notify
- authorites if they are abusing children/elderly/themselves/otehrs.
- b. An Indian student stabs another student to death.
- Campus therapist knew this and only reported it to campus police and never
- warned the girl that got stabbed.
What is informed consent? When, in regard to testing, is it not required?
- a. Test takes or representatives re made aware in plain
- English what reasons for testing, types of test being used, how the results
- will be measured and used.
- i. 1.
- Disclosure: client received suffient info
- ii. Competency:
- client is mentally able to give consent
- iii. Voluntariness
What is meant when we say test results must be given
“in a language the test taker can understand?”
- a. Linguistic barriers ESL, also appropriate age/ mental
- ability vocab.
What does the book recommend for how to consider the
impact of cultural background on test results?
- a. Avoid stereotype threat
- b. Adopt a frame of reference
- What was the first use of
- Chinese 2200 BC civil
- service testing.
What does the “brass instruments” era of testing
- a) Tools used to measure sensory thresholds and reactions
- times erroneously that that measured intelligence.
- b) 1800’s Europe and Great Britain
- Who developed the 1st
intelligence test & why?
- Binet and Simon 1905.
- Goal to identify which kids could
- or could not learn in a typical classroom environment.
- When did intelligence
testing make it to the U.S.? Why
was this important?
- 1916. Translated it to be
- culturally relavent to USA. Goddard was a D-bag nativist
- What tests were developed
for use with Army recruits & what were the positive & negative
results of these tests?
- i. Following
- oral direction
- ii. Math
- iii. Judgment
- iv. Synonym-antonym
- v. Sentence
- vi. Number
- series completion
- vii. Analogies
- viii. Information
- Positive effects
- i. Psychologist
- got to experience in psychometrics of test construction
- ii. Test
- construction became a science
- i. Army
- spent money and didn’t really even use the test scores
- ii. Ppl
- wouldn’t understand directions/fell asleep.
- What are projective tests
designed to measure?
- Responses to ambiguous
- stimuli ->disclose innermost needs, fantasies, and conflicts.
- What is the MMPI? When was
the most recent version published?
- The Minnesota
- Multiphasic Personality Inventory. 2003
1. How does test interpretation work, For
Comparing ones score is compared to a standardized sample
How does test interpretation work Criterion-referenced tests?
compare raw score to set standard
How can we summarize & pictorially represent
a distribution of scores
- Histagram, Ploygon, bell shaped curve.
Which is strongly
influenced by outliers?
What is the standard deviation (SD)?
- Degree of dispersion in a group of
What percent of scores fall within 1 SD in a
What percent of scores fall within 2 SD in a normal
2 SD. 95%
What percent of scores fall within 3 SD in a
What is a percentile rank?
- The percentage rank of a
- person in the standardized sample who scored below a specific raw score
- How do you calculate a
Number of scores below target raw score, divided by total of participants, multiplied by 100
- What are the benefits in
using percentiles? .
Easy to obtain, and understand
. What is a standardized score?
- Expresses the distance from the mean in standard
- deviation units.
What is a z score
Computation of an examinee’s standard score.
standardized score calculated how
What are the M & SD of z scores
What are the M & SD T scores
What are the M & SD IQ scores
100 M, 15 SD
What are the M & SD CEEB
500 M, 10 SD
How can we calculate the various standardized
scores once we have the Z scores?
What is meant by test standardization?
- Test results are compared
- to norms.
What are norms?
- Statistical summary of scores from the norm group or
- standardization sample.
What types of scores can be used as norms
- Sample is large and representative
- and the raw score distribution is only mildly nonnormal.
What are the important issues to consider when
developing test norms
- if a norm-referenced test doesn’t represent the population
- for whom the test is intended, they are useless and all comaprisons made will
- be ueseles.
What factors should be considered in selecting a
- age, grade, sex, education, ses, ethnic group, geographic
What is random sampling?
- Every person in the target population has an equal likely
- chance to be used.
stratified random sampling
- putting constraints on your randomness. PPl r chosen
- randomly from within each strata.
- divided the population into geographical clusters, than
- randomly sample from each cluster.
1. What is classical test theory? Test scores
result from the influence of 2 factors:
consistency and Inconsistency
What are the four assumptions of classical test
- a. measurement errors are random
- b. mean
- error of measurement is zero
- c. true
- score and error scores are uncorrelated
- d. errors
- on different scores are uncorrelated
What are the two types of error?
Un-systematic & systematic.
Reliablitity is involved with what time of error/
What is the range of a reliability coefficient?
(-1) -- +1What is high?
. What are the types of reliability that consider temporal
- Test-Retest, Alternate-form
What are the types of reliability that consider
- a. Split half reliability
- b. Spearman-brown Formula
- c. Coeffient alpha
- d. Kuder-Richard estimate of reliability
- e. Interscorer reliability
What are the benefits and drawbacks of the different
types of reliability?
- a. Split-half approach is not precise; coeffient alpha
- reliability is higher.
. What is considered acceptable reliability for
research purposes? .
What is acceptable for tests used to make
important decisions about individuals?
What is the standard error of measurement (SEM)?
- 1. Index of how much on average an individual score might
- varies if they were to repeatedly take a test.
What is the relationship between reliability and
- The more reliable the test the less error there is on
How do we calculate confidence intervals?
what does a confidence interval tell us?
- The accuracy of what we can predict will be the next test
- score if they were to retake the exam.
What is Item Response Theory?
- An idea that we can use fancy math to develop scales that
- contain highly discriminating items and thereby increase the reliability of
- tests. Eliminate error in test development phase,
What is validity?
What it measure what it claims to measure .
What is the relationship between reliability and
- Test must be reliable
- before it can be valid.
test be valid if it is not reliable?
What type of error affects validity?
- and unsystematic error
What is the Tinitarian model?
3 phase model to describe validation procedures
What types of validity does Tinitarian modele ncompass?
- Content validity, criterion-related validity
- (construction/predictive Validity), Construct validity.
What is face validity?
- Appearance of the appropriateness of the test
- taker’s perspective.
How is face validitydifferent from other types of
- Not an actual measure of
What is content validity and how is it assessed?
- A)are items on the test a good representative
- sample of the domain we are measuring. B) assessed by experts.
What is criterion-related validity? What types of validity are encompassed under it?
- A) the extent to which a
- test correlates with non0test behaviors; called criterion.
- B) Concurrent Validity/predictive
What are the differences between concurrent
& predictive validity?
- Concurrent Validity- test is correlated with a criterion
- measure that is available at the time of
- testing. predictive Validity- test is correlated to a criterion that
- becomes available in the future.
What is the standard error of the estimate?
- Margin of error expected
- in the predicted criterion score, it tells us how accurately can test
- scores predict the performance on the criterion.
How SEE it related to predictive validity?
- The higher the correlation between test and
- criterion, the less there is error there is in the predictions to be made from
- the test.
What is a typical validity coefficient for
- Rarely greater than .60
- Predictive validity is
still considered useful if its between
What is decision theory used for?
- involves the use of test scores as a
- decision-making tool.
What is an expectancy table
- A visual tool that helps decision makers chart
- data and make cut off points
How does expectancy table relate to predictive validity?
- If we use tests to make
- decisions than those test must have strong predictive validity.
In decision theory, what is considered a hit?
Is when a correct prediction occurred
In decision theory, what is considered a miss
Is when we predicted incorrectly. false positive
In decision theory, what is considered a false negative
- When they were predicted to fail but actually
What is construct validity?
- 1. Tests designed to measure constructs (personality) must
- estimate the existence of an inferred, underlying characteristic based on a
- limited sample of behavior. Good for test that don’t have a well defined domain
- of content.
What does Construct validity involve?
- a. Construct validity involves the theoretical meaning of test scores.
What are the ways we can demonstrate a test has
- a. Expert Opinion
- b. Test Homogeneity- see if items intercorrelate with on another.
- c. Developmental change- if tests measure something that
- changes with age.
- d. Theory-consistent group differences- do ppl with
- different characteristics score differently (in a way we would expect)
- e. Theory consistent intervention
- effect- do test scores change as expected based on an intervention
- f. Classification Accuracy – how well can a test classify
- people on the construct being measured
- i. Sensitivity-
- accurately identify those with the trait
- ii. Specificity-
- accurately id those w/0 the trait.
- g. Inter- correlation among tests- looking for similarties
- or diff. with scores on other tests
- i. Convergen
- validity/ discriminant validity
What is convergent validity?
- 1. Is supported when tests measuring the same construct
- are found to correlate. Example test that measure depression should correlate
- with one another
- Is supported when tests measuring different or
- unrelated constructs are found NOT
- to correlate with one another.
What do they tell us? discriminant validity
- a. Tells you if your comparing apples to oranges or not.