
Must have components of Comprehensive Exam
 Hearing Screening
 Case History
 Oral Mechanism Exam
 A Communication Sample
 Specific areas of Eval

Specific Areas of Eval
 speech sound production
 language
 fluency
 voice

Validity
 extent to which the test measures what it says it will measure
 highly correlated to the purpose of the test
*tests may be stronger in one type of validity than another or have aspects of multiple

Construct Validity
Does the test measure what it says it will measure?

Content Validity
 How well do the test materials measure what the test aims to measure?
 Is the test comprehensive enough?

Criterion Validity
extent to which this score will indicate future performance on a same/similar task
ex) some schools use GRE as a criterion validity measure of how well a student will do in graduate school

Three Types of Validity
 construct
 content
 criterion

Reliability
can the test be repeated under similar conditions with similar results

Three Types of Reliability
 Interrater
 TestRetest
 Splithalf

Interrater Reliability
agreement of two differnt admins when assessing client responses
ex) if one person administering the test and the other is observing and also marking client responses, the two people should have the same results

TestRetest Reliability
the consistency of test results over time
ex) if one person delivers a test and then I deliver the same test a week later, we should have similar results

SplitHalf Reliability
 split the test in half, grade both halves, the results should be similar
 refers to internal consistency of a test

Normal Distribution of Scores
 theory of predictability of scores
 average scores are a predetermined distance from the mean
 data is symmetrical
 bell curve; 34, 14, 2 % on either side of the mean by the standard deviation

Raw Score
 the initial score of a test
 has no true value
 must be changed to more meaningful scores (converted/derived scores)

Grade/Age Equivalence
 median raw score of an age/grade
 usually not used
 least useful, most dangerous
 leads to misunderstandings of child performance
 skills develop faster in early ages so the raw scores increase at a greater rate

Commonly Used Converted Scores
 Percentile Rank
 Standard Scores
 Zscores
 Tscores
 Stanine

Percentile Rank
the percentage of subject scores that fall a or below a particular raw score

Standard Scores
 Mean = 100
 Standard Deviation = 15

Zscores
 Mean = 0
 Standard Deviation = 1

Tscores
 Mean = 50
 Standard Deviation = 10

Stanine
standard score bands that divide distribution into 9 parts

Standard Error of Measurement (SEM)
 stat used to increase precision in determining if an observed score is close to a true score
 estimates range in which true score on a test falls
 no test is error free or perfectly reliable

Calculate SEM based on...
 1. estimate of test reliability
 2. mean and standard deviation of scores (from normative sample)
 3. test taker's observed score

Confidence Interval
 all the values within the range defined bu the confidence limits of a a sample stat
 calculation of SEM lies within a confidence interval
ex) observed score = 50; true score = 53 at a 95% confidence interval
*we are confident that the subject's true score will fall in that range 95 out of 100 test administrations

Criteria for Evaluating Formal Tests
 test administration and scoring
 reliability
 normative sample
 validity

Common Errors in Use of NormReference Tests
 1. Measuring Treatment Progress
 2. Analyzing individual test items for treatment target selection
 3. forgetting they distort what they measure
 4. ignoring cultural makeup of normative sample

Basal
 point from which progress is recorded
 level that an individual passes all items on a test

Ceiling
 highest item number were a certain number of items has been failed
 assumed that all items above this level are incorrect

