How do we broadly define validity of a test?
measure of how well a particular measure fulfills the function for which it is being used
2nd characteristic of psychometric soundness
“Does the test measure what it is supposed to measure?”
Conceptually, how does validity differ from reliability? (e.g.,internal vs. external criteria)
Reliability is necessary yet not a sufficient condition for validity especially in establishing criterion-related validity
- •in order to be predictable,criterion itself must be reliable
- •test to measure criterion itself
- may be reliable, but if criterion it is trying to predict is not (i.e.,
- contains measurement error), test will not accurately predict criterion
- •e.g., measuring “teacher success”
Reliability and validity are related because it is diffi cult to obtain evidence for validity unless a measure has reasonable validity. On the other hand, a measure can have high reliability without supporting evidence for its validity. If a test is not reliable, then one cannot demonstrate that it has any meaning. Validity - “Does the test measure what it is supposed to measure?”
- Reliability is necessary yet not a sufficient condition for validity especially in establishing criterion-related validity
- •in order to be predictable, criterion itself must be reliable
- •test to measure criterion itself may be reliable, but if criterion it is trying to predict is not (i.e.,contains measurement error), test will not accurately predict criterion
- •e.g., measuring “teacher success”
3 Types of Validity
Extent to which test consists of representative sample of content universe and/or behavior of domain being assessed/Based almost entirely on rational analysis/careful and critical examination of test items in relation to objective of the test/no empirical level of analysis
- *Only type of evidence besides face validity that is logical rather than statistical. We ask whether the items are a fair sample of the total potential content. Establishing content validity evidence for a test requires good logic, intuitive skills, and perseverance
- Ex. Professor tells class tha test will cover chapters 1,2,3;Test questions are actually taken from 4,5,6
What is face validity? Why is it important to a test? What are some potential consequences if a
test lacks face validity?
The appearance that test measures what it purports to at a surface level.
Not recognized as a legitimate category, although commonly referenced
- Highly important characteristic in many instances.
- It is commonly used in the testing literature
- Crucial to have a test that “looks like” it is valid. These appearances can help motivate test takers because they can see that the test is relevant.
A test has face validity if the items seem to be reasonably related to the perceived purpose of the test.
*Extent to which test corresponds with a particular criterion or standard against which test is being compared
*typically used when objective is to predict subsequent or future performance on a particular criterion/empirically driven
A criterion is the standard against which the test is compared. I.E. a test used to predict whether engaged couples will have successful marriages or get divorced. Marital success is the criterion, but it cannot be known at the time the couples take the premarital test. T h e reason for gathering criterion validity evidence is that the test or measure is to serve as a “stand-in” for the measure we are really interested in. In the marital example, the premarital test serves as a stand-in for estimating future marital happiness.
How do we go about demonstrating criterion-related validity
(i.e., specific methods)?
Be able to define and apply each.
test or measure predicts future performance in relation to a particular criterion
Example - predict which job candidates are most likely to be successful based on information from a personality measure li.e., measure of how well predictions agree with subsequent outcomes
Applications of predictive validity:
SAT, success in college/Car insurance companies, setting rates/“Kindergarten Round-up”
Assessment of the simultaneous relationship between a test and some criterion or other measure e.g., measure of reading achievement and school performance in area of reading
Examination of variables being assessed in real time
I.E. learning disability test and school performance. Measures and criterion measures taken at same time- Helps to explain why the person is now having difficulty in school. Gives diagnostic information to guide the development of individualized learning programs.
Concurrent evidence for validity applies when the test and the criterion can be measured at the same time.
What is a construct? Why is it so difficult to measure
Any variable built by mental synthesis
- -unobservable, postulated
- -lacks true objectivity
- -evolves from theory or from one’s ideas
Involves assembling evidence about what a test means. Done by showing the relationship between a test and other tests and measures. Each time a relationship is demonstrated,one additional bit of meaning can be attached to the test. Over a series of studies, the meaning of the test gradually begins to take shape. It's an ongoing process that is similar to amassing support for a complex scientifi c theory.
- Has multiple explanations or definitions
- •critical thinking
Define Construct Validity
Degree to which test accurately describes individuals who manifest some abstract psychological trait or ability
Required when no single criterion or universe of content is accepted as entirely adequate to define quality being measured
What methods do we use to demonstrate construct validity? Be able to identify and apply
(1) Logical analysis –
- Analysis of construct validity consists of both logical and empirical components
- e.g., test of “emotional stability”
content of items to be included on measure should be relevant to the assessment of emotional stability
- (2) Empirical analysis – investigate certain predictions that should hold true
- •high correlations bet. inventory scores and peer ratings
- •observed differences in scores between individuals from mental health settings vs.
- “normal” population
- •some observed relationship between teachers’ ratings in citizenship/classroom
- behavior and inventory scores
Two types of evidence to support construct validity:
(1) Convergent - evidence that test/measure correlates well with other tests believed to measure same construct. i.e., two tests “converge” on same construct demonstration of “sameness” with some established, external source.
(2) Discriminant (or divergent) - test demonstrates low correlations with measures of unrelated constructs i.e., evidence to support what the test does not measure demonstration of uniquenessby showing evidence that test does not tap into same constructsas others, can argue it is measuring unique construct.e.g., “High Standards” and “Discrepancy” on the APS-R.
- No consensus rules for size of validity coefficient to provide meaningful information
- -coefficients of .60 or higher are rare
- -.30 - .40 considered to be acceptable
- When coefficient is stat. significant, test may be useful because its scores provide
- more info. about one’s performance than would be known by chance alone
- -even tests with low validity coefficients can be useful