Validity and Reliability

Think in terms of the
purpose of tests and

the consistency with
which the purpose is
fulfilled/met
Validity and Reliability

Neither Valid
nor Reliable
Fairly Valid but

not very
Reliable
Reliable
but not
Valid
Valid &
Reliable
Validity
Depends on the PURPOSE

E.g. a ruler may be a valid measuring device for
length, but isnt very valid for measuring volume
Measuring what it is supposed to
Matter of degree (how valid?)
Specific to a particular purpose!
Must be inferred from evidence; cannot be directly
measured
Learning outcomes
1.
2.
Content coverage (relevance?)

Level & type of student engagement (cognitive, affective,
psychomotor) appropriate?
Reliability
Consistency
in the type of result a test yields
Time & space

participants
Not
perfectly similar result but very close-to

being similar
When someone says you are a reliable
person, what do they really mean?
Are you a reliable person?
What do you think?
Forced-choice assessment forms are high in reliability, but weak

in validity (true/false)
Performance-based assessment forms are high in both validity
and reliability (true/false)
A test item is said to be unreliable when most students answered
the item wrongly (true/false)
When a test contains items that do not represent the content
covered during instruction, it is known as an unreliable test
(true/false)
Test items that do not successfully measure the intended learning
outcomes (objectives) are invalid items (true/false)
Assessment that does not represent student learning well enough
are definitely invalid and unreliable (true/false)
A valid test can sometimes be unreliable (true/false)
If a test is valid, it is reliable! (by-product)
Question
In the context of what you understand about
VALIDITY and RELIABILITY, how do you go
about establishing/ensuring them in your own
test papers?
Indicators of quality
Validity
Reliability
Utility
Fairness
Question: how are they all inter-related?
Types of validity measures

Face validity
Construct validity
Content validity
Criterion validity
1.
2.
Predictive
Concurrent
Consequences validity
Face Validity
Does it appear to measure what it is supposed to

measure?
Example: Lets say you are interested in measuring,

Propensity towards violence and aggression. By simply
looking at the following items, state which ones qualify to
measure the variable of interest:
Have you been arrested?

Have you been involved in physical fighting?
Do you get angry easily?
Do you sleep with your socks on?
Is it hard to control your anger?
Do you enjoy playing sports?
Construct Validity
Does the test measure the human
CHARACTERISTIC(s) it is supposed to?
Examples of constructs or human characteristics:
Mathematical reasoning
Verbal reasoning
Musical ability
Spatial ability
Mechanical aptitude
Motivation
Applicable to PBA/authentic assessment

Each construct is broken down into its component parts
E.g. motivation can be broken down to:
Interest
Attention span
Hours spent
Assignments undertaken and submitted, etc.
All of these sub-constructs put together measure motivation
Content Validity
How well elements of the test relate to the content

domain?
How closely content of questions in the test relates
to content of the curriculum?
Directly relates to instructional objectives and the
fulfillment of the same!
Major concern for achievement tests (where content
is emphasized)
Can you test students on things they have not been
taught?
How to establish Content

Validity?
Instructional objectives (looking at your list)
Table of Specification
E.g.
At the end of the chapter, the student will be able
to do the following:
1.
2.
3.
4.
Explain what stars are

Discuss the type of stars and galaxies in our universe
Categorize different constellations by looking at the stars
Differentiate between our stars, the sun, and all other
stars
Table of Specification (An Example)
Criterion Validity
The degree to which content on a test (predictor)

correlates with performance on relevant criterion
measures (concrete criterion in the "real" world?)
If they do correlate highly, it means that the test
(predictor) is a valid one!
E.g. if you taught skills relating to public speaking
and had students do a test on it, the test can be
validated by looking at how it relates to actual
performance (public speaking) of students inside or
outside of the classroom
Two Types of Criterion Validity
Concurrent Criterion Validity = how well performance

on a test estimates current performance on some valued
measure (criterion)? (e.g. test of dictionary skills can
estimate students current skills in the actual use of
dictionary observation)
Predictive Criterion Validity = how well performance on

a test predicts future performance on some valued
measure (criterion)? (e.g. reading readiness test might
be used to predict students achievement in reading)
Both are only possible IF the predictors are VALID
Consequences Validity
The
extent to which the assessment served

its intended purpose
Did the test improve performance?
Motivation? Independent learning?
Did it distort the focus of instruction?
Did it encourage or discourage creativity?
Exploration? Higher order thinking?
Factors that can lower Validity
Unclear directions
Difficult reading vocabulary and sentence structure
Ambiguity in statements
Inadequate time limits
Inappropriate level of difficulty
Poorly constructed test items
Test items inappropriate for the outcomes being measured
Tests that are too short
Improper arrangement of items (complex to easy?)
Identifiable patterns of answers
Teaching
Administration and scoring
Students
Nature of criterion
Reliability
Measure of consistency of test results from one

administration of the test to the next
Generalizability consistency (interwoven concepts)

if a test item is reliable, it can be correlated with
other items to collectively measure a construct or
content mastery
A component of validity
Length of assessment
Measuring Reliability
Test retest
Give the same test twice to the same group with any
time interval between tests
Equivalent forms (similar in content, difficulty level, arrangement, type of
assessment, etc.)
Give two forms of the test to the same group in close

succession
Split-half
Test has two equivalent halves. Give test once, score
two equivalent halves (odd items vs. even items)
Cronbach Alpha (SPSS)
Inter-item consistency one test one administration
Inter-rater Consistency (subjective scoring)
Calculate percent of exact agreement by using
Pearson's product moment and find out the
coefficient of determination (SPSS)
How to improve Reliability?

Quality
of items; concise statements,

homogenous words (some sort of uniformity)
Adequate sampling of content domain;
comprehensiveness of items
Longer assessment less distorted by
chance factors
Developing a scoring plan (esp. for subjective
items rubrics)
Ensure VALIDITY

Validity and Reliability

Enviado por

Dados do documento

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Validity and Reliability

Enviado por

Direitos autorais:

Formatos disponíveis

Think in terms of the

purpose of tests and

Validity and Reliability

Fairly Valid but

Depends on the PURPOSE

Content coverage (relevance?)

in the type of result a test yields

Time & space

perfectly similar result but very close-to

What do you think?

Forced-choice assessment forms are high in reliability, but weak

Question: how are they all inter-related?

Types of validity measures

Does it appear to measure what it is supposed to

Example: Lets say you are interested in measuring,

Have you been arrested?

Applicable to PBA/authentic assessment

How well elements of the test relate to the content

How to establish Content

Explain what stars are

Table of Specification (An Example)

The degree to which content on a test (predictor)

Two Types of Criterion Validity

Concurrent Criterion Validity = how well performance

Predictive Criterion Validity = how well performance on

Both are only possible IF the predictors are VALID

extent to which the assessment served

Factors that can lower Validity

Measure of consistency of test results from one

Generalizability consistency (interwoven concepts)

Give two forms of the test to the same group in close

How to improve Reliability?

of items; concise statements,

Você também pode gostar