Escolar Documentos
Profissional Documentos
Cultura Documentos
Learning Outcomes
INTRODUCTION
There is a lot of debate about how to assess learning, and especially about how to
evaluate performance. Our objective give us guidance on what to assess, because they
are written in terms of what the learners should be able to do. Based on these
objectives, it is very useful to identify all the activities and skills which the learners will
carry out, the conditions under which they will perform these tasks and activities, the
possible results which might be obtained, and the standards by which their
performance will be measured.
1
4. Ask the learner to formulate and solve her own problem by selecting,
generating and applying facts and principles (e.g.What I see as the problem
here and how can I reach a satisfying solution?).
5. Ask the leaner to perform tasks that shows mastery of the learning outcomes.
Assessment is a general term that includes different ways that the teachers used
to gather information in the classroom. Information that help teachers understand their
students, information that is used to plan and monitor their classroom instruction,
information that is used to a worthwhile classroom culture and information that is used
for testing and grading. The most common form of assessment is giving a test. Since test
is a form of assessment, hence, it also answer the question, “how does individual
student perform?” Test is formal and systematic instrument, usually paper and pencil
2
procedure designed to assess the quality, ability, skill or knowledge of the students by
giving a set of question in uniform manner. A test is one of the many types of
assessment procedure used to gather information about the performance of students,
Hence, testing is one of the different methods used to measure the level of performance
or achievement of the learners. Testing also refers to the administration, scoring, and
interpretation of the procedures designed to get information about the extent of the
performance of the students. Oral questionings, observations, projects, performances
and portfolios are the other assessment processes that will be discussed later in detail.
After collecting the assessment data, the teacher will use this to make decisions
or judgment about the performance of the students in a certain instruction.
Evaluation refers to the process of judging the quality of what is good and what
is desirable. It is the comparison of data to a set of standard or learning criteria for the
purpose of judging the worth or quality. Examples, in judging the quality of an essay
written by the students about their opinion regarding the first state of the nation
address of Pres. Benigno C. Aquino, evaluation occurs after the assessment data has
been collected and synthesized because it is only in this time where teacher is in the
position to make judgment about the performance of the students. Teachers evaluate
how well or to what extent the students attained the instructional outcomes.
Nature of Assessment
1. Maximum Performance
It is used to determine what individuals can do when performing at their
best. Examples of instruments using maximum performance are aptitude
tests and achievement tests.
2. Typical Performance
3
It is used to determine what individuals will do under natural conditions.
Examples of instruments using typical performance are attitude, interest,
and personality inventories, observational techniques and peer appraisal.
Format of Assessment
1. Fixed-choice Test
An assessment used to measure knowledge and skills effectively and
efficiently. Standard multiple-choice test is an example of instrument used
in fixed-choice test.
2. Complex-performance Assessment
An assessment procedure used to measure the performance of the learner
in context and on problems valued in their own right. Examples of
instruments used in complex-performance assessment are hands-on
laboratory experiment, projects, essays, oral presentation.
“Teaching and Learning are reciprocal processes that depend on and affect one
another (Swearingen 2002 and Kellough, 1999).” The assessment component of the
instructional processes deals with the learning progress of the students and the
teacher’s effectiveness in imparting knowledge to the students.
When planning assessment, it should start when teacher plans his instruction.
That is, when writing learning outcomes up to the time when the teachers assesses the
extent of achieving the learning outcomes. Teachers made decisions from the beginning
if instruction up to the end of instruction. There are four roles of assessment used in the
instructional process. The first is placement assessment, a type of assessment given at
the beginning if instruction. The second and third types of assessment are formative
assessment and diagnostic assessment and diagnostic assessment given during
instruction and the last is the summative assessment given at the end, of instruction.
1. Beginning of Instruction
Placement Assessment according to Gronlund, Linn, and Miller (2009) is
concerned with the entry performance and typically focuses on the questions:
Does the learner possess the knowledge ad skills needed to begin the planned
instruction? To what extent has the learner already developed the understanding
and skills that are the goals of planned objectives? To what extent do the
student’s interest, work habits, and personality indicate that one mode of
4
instruction might be better than another? The purpose of placement assessment
is to determine the prerequisite skills, degree of mastery of the course objectives
and the nest mode learning.
2. During Instruction
During the instructional process the main concern of a classroom teacher
is to monitor the learning progress of the students. Teacher should assess
whether students achieved the intended learning outcomes set for a particular
lesson. If the students achieved the planned learning outcomes, the teacher
should provide a feedback to reinforce learning. Based on recent researches, it
shows that providing feedback to students is the most significant strategy to
move students forward in their learning. Garnison and Ehringhaus (2007),
stressed in their paper “Formative and Summative Assessment in the
Classroom,” that feedback provides students with an understanding of what they
are doing well, links to classroom learning. If it is not achieved, the teacher will
give a group ,or individual remediation. During this process we shall consider
formative assessment and diagnostic assessment.
Formative Assessment is a type of assessment used to monitor the
learning process of the students during instruction. The purposes of formative
assessment are the ,following: to provide immediate feedback to both student
and teacher regarding the success and failures of learning; to identify the
learning errors that are in need of correction; to provide teachers with
information on how to modify instruction; and also to improve learning and
instruction.
Diagnostic Assessment is a type of assessment given at the beginning of
instruction or during instruction. It aims to identify the strengths and
weaknesses of the students regarding the topics t be discussed. The purpose of
diagnostic assessment are to determine the level of competence of the students;
to identify the students who already have knowledge about the lesson; to
determine the causes of learning problems that cannot be revealed by formative
assessment; and to formulate a plan for remedial action.
3. End of Instruction
Summative Assessmentis type of assessment usually given at the end of
a course or unit. The purposes of summative assessment are to determine the
extend to which the instructional objectives have been meet; to certify student
mastery of the intended learning outcomes as well as use it for assigning grades;
to provide information for judging appropriateness of the instructional
objectives; and to determine the effectiveness of instruction.
1. Norm-referenced Interpretation
It is used to describe student performance according to relative position
in some known group. In this method of interpretation it is assumed that the
5
level of performance of students will not vary much from one class to another
class. Examples: ranks 5th in a classroom group of 40.
2. Criterion-referenced Interpretation
It is used to describe students’ performance according to a specified
domain of clearly defined learning tasks. This method of interpretation is used
when the teacher wants to determine how well the students have learned
specific knowledge of skills in a certain course or subject matter. Examples:
divide three-digit whole numbers correctly and accurately; multiply binomial
terms procedures correctly.
There are ways of describing classroom tests and other assessment procedures.
This table is a summary of the different types of assessment procedures that was
adapted and modified from Gronlund, Linn, and Miller (2009).
6
and/ or best modes
of learning
Formative An assessment Teacher-made tests,
procedure used to custom made tests
determine the from textbook
learner’s learning publishers,
progress, provides observational
feedback to techniques
reinforce learning,
and corrects
learning errors.
Diagnostic An assessment Published
procedure used to diagnostic tests,
determine the teacher-made
causes of learner’s diagnostic tests,
persistent learning observational
difficulties such as techniques
intellectual,
physical, emotional,
and environmental
difficulties.
Summative An assessment Teacher-made
procedure used to survey test,
determine the end- performance rating
of-course scales, product
achievement for scales
assigning grades or
certifying mastery
of objectives.
Criterion- It is used to Teacher-made tests,
referenced describe student custom-made tests
performance from textbook
according to a publishers,
specified domain of observational
clearly defined techniques
learning tasks.
Example: multiplies
three-digit to whole
numbers correctly
and accurately.
Norm- It is used to Standardized
Methods of
referenced describe student’s aptitude and
Interpreting
performance achievement tests,
results
according to relative teacher-made
position in some survey tests,
known group. interest inventories,
Example: rank 5th in adjustment
a classroom group inventories.
of 40.
7
OTHER TYPES OF TEST
Other types of descriptive terms used to describe tests in contrasting types such
as the non-standardized versus standardized tests; objective versus subjective tests;
supply versus fixed-response test; individual versus group test; mastery versus survey
tests, speed versus power tests.
1. Objective test us a type of test in which two or more evaluators give an examinee
the same score.
2. Subjective test is a type of test in which the scores are influenced by the
judgment of the evaluators, meaning there is no one correct answer.
1. Supply test is a type of test that requires the examinees to supply an answer,
such as an essay test item or completion or short answer test item.
2. Fixed-response test is a type of test that requires the examinees to select an
answer from a given option such as multiple-choice test, matching type of test, or
true/ false test.
1. Mastery test is a type of achievement test that measures the degree of mastery of
a limited set of learning outcomes using criterion-reference to interpret the
result.
2. Survey test is a type of test that measurers students’ general achievement over a
broad range of learning outcomes using norm-reference to interpret the result.
8
2. Power test is designed to measure the level of performance rather than speed of
response. It contains test items that are arranged according to increasing degree
of difficulty.
MODES OF ASSESSMENT
Traditional Assessment
It is a type of assessment in which the students choose their answer from a given
list of choices. Examples of this type of assessment are multiple-choice test, standard
true/ false test, matching type test, and fill-in-the-blank test. In traditional assessment,
students are expected to recognize that there is only one correct or best answer for the
question asked.
Alternative Assessment
Performance-based Assessment
9
products. It also involved long-range projects, exhibits, and performances that are
linked to the curriculum. In this kind of assessment, the teacher is an important
collaborator in creating tasks, as well as in developing guidelines for scoring and
interpretation.
CHAPTER 2
Learning Outcomes
10
10. Write measurable and observable learning outcomes.
INTRODUCTION
Instructional goals and objectives play a very important role in both instructional
process and assessment process. This serves as a guide both for teaching and learning
process, communicate the purpose of instruction to other stakeholders, and to provide
guidelines for assessing the performance of the students. Assessing the learning
outcomes of the students is one of the very critical functions of teachers. A classroom
teacher should classify the objectives of the lesson because it is very important for the
selection of the teaching method and the selection of the instructional materials. The
instructional material should be appropriate for the lesson so that the teacher can
motivate the students properly. The objectives can be classified according to the
leaning, outcomes of the lesson that will be discussed.
The terms goals and objectives are two different concepts but they are related to
each other. Goals and objectives are very important, most especially when you want to
achieve something for the students in any classroom activities. Goals can never be
accomplished without objectives and you cannot get the objectives that you need in
order that you can accomplish what you want to achieve. Below are the different
descriptions between goals and objectives.
Goals Objectives
Broad Narrow
General intention Precise
Intangible Tangible
Abstract (less structure) Concrete
Cannot be validated as is Can be validated
Long term aims what you want to Short term aims what you want to achieve
accomplish
Hard to quantify or put in a timeline Must be given a timeline to accomplish to
be more effective
11
Goals, General Educational Program Objectives, and Instructional Objectives
1. Audience
Who? Who are the specific people the objectives are aimed at?
2. Observable Behavior
12
What? What do you expect them to be able to do? This should be an overt,
observable behavior, even if the actual behavior is covert or mental in nature. If
you cannot see it, heat it, touch it, taste it, or smell it, you cannot be sure your
audience really learned it.
3. Special Conditions
The third components of instructional objectives is the special conditions
under which the behavior must be displayed by the students. How? Under what
circumstances will be learning occur? What will the student be given o already
be expected to know to accomplish the learning?
4. Stating Criterion Level
The fourth component of the instructional objectives is stating the
criterion level. The criterion level of acceptable performance specifies how many
of the items must the students answer correctly for the teacher to attain his/her
objectives. How much? Must a specific set of criterion be met? Do you want total
mastery (100%), do you want them to response correctly 90% of the time,
among others? A common (and totally non-scientific) setting is 90% of the time.
Always remember that the criterion level need not be specified on
percentage of the number of items correctly answered. It can be stated as,
number of items correct; number of consecutive items correct; essential features
included in the case of essay question or paper; completion within a specified
time or completion with a certain degree of accuracy.
13
and discuss what was of interest; (3) Understanding the concept of normal
distribution. These examples specify only the activity or experience and broad
educational outcome.
Instructional objective is a clear and concise statement of skill or skills
that students are expected to perform or exhibit after discussing a certain lesson
or unit of instruction. The components of instructional objective are observable
behaviors, special conditions which the behavior must be exhibited and
performance level considered sufficient todemonstrate mastery.
When a teacher developed instructional objectives, he must include an
action verb that specifies learning outcomes. Some educators and education
students are often confused with learning outcome and learning activity. An
activity that implies a certain product or end result of instructional objectives is
called learning outcome. If you write instructional objectives as a means or
processes of attaining the end product, then it is considered as learning activity.
Hence, revise it so that the product of the activity is stated.
Examples:
After developing learning outcomes the next step the teacher must consider is to
identify whether the learning outcome is stated as a measurable and observable
behavior or non-measurable and non-measurable and non-observable behavior. If
learning outcome is measurable then it is observable, therefore, always state the
learning outcomes in observable behavior. Teachers should always develop
instructional objectives that are specific, measurable statement of outcomes of
instruction that indicates whether instructional intents have been achieved (Kubiszyn,
2007). The following are examples of verbs in terms of observable learning outcomes
and unobservable learning outcomes.
14
1. Recite the names of the characters in the story MISERY by Anton Chechov.
2. Add two-digit numbers with 100% accuracy.
3. Circle the initial sounds of words.
4. Change the battery of an engine.
5. List the steps of hypothesis testing in order.
Below are the lists of learning outcomes classified as a learning objective. The
more specific outcome should not be regarded as exclusive; there are merely suggestive
as categories to be considered (Gronlund, Linn, and Miller, 2009).
1. Knowledge
1.1 Terminology
1.2 Specific facts
1.3 Concepts and principles
1.4 Methods and procedures
2. Understanding
2.1 Concepts and principles
2.2 Methods and procedures
2.3 Written materials, graph, maps, and numerical data
2.4 Problem situations
3. Application
3.1 factual information
3.2 concepts and principles
3.3 methods and procedures
3.4 problem solving skills
4. Thinking skills
4.1 critical thinking
4.2 scientific thinking
5. General skills
5.1 laboratory skills
5.2 performance skills
5.3 communication skills
5.4 computational skills
5.5 Social skills
6. Attitudes
15
6.1 Social attitudes
6.2 Scientific attitudes
7. Interests
7.1 Personal interests
7.2 Educational interests
7.3 Vocational interests
8. Appreciations
8.1 Literature, art, and music
8.2 Social and scientific achievements
9. Adjustments
9.1 Social adjustments
9.2 Emotional adjustments
16
Bloom and other educators work on cognitive domain, established and completed
the hierarchy of educational objectives in 1956, it was called as the Bloom’s Taxonomy
of the cognitive domain. The affective and psychomotor domains were also developed
by other group of educators.
1. The objectives should include all important outcomes of the course or subject
matter,
2. The objectives should be in harmony with the content standards of the state and
with the general goals of the school.
3. The objectives should be in harmony with the sound principles of learning.
4. The objectives should be realistic in terms of the abilities of the students, time
and the available facilities.
17
MATCHING TEST ITEMS TO INSTRUCTIONAL OBJECTIVES
When constructing test items, always remembers that they should match the
instructional objectives. The learning outcomes and the learning conditions specified in
the test items should match with the learning outcomes and conditions stated in the
objectives. If a test developer followed this basic rule, then the test is ensured to have
content validity. The content validity is very important so that your goal is to assess the
achievements of the students, hence, don’t ask tricky questions. To measure the
achievement of the students ask them to demonstrate a mastery of skills that was
specified in the conditions in the instructional objectives.
Match?
Yes No
1. Objective: discriminate fact from opinion from Pres.
Benigno C. Aquino’s first State of the Nation Address /
(SONA).
Test item: From the State of the Nation Address (SONA)
speech of President Aquino, give five (5) examples of
facts and five (5) examples of opinions.
2. Objectives: Recall the names and capitals of all the
different provinces of Regions I and II in the Philippines. /
Test items: List the names and capitals of two provinces
in Region I and three provinces in Region II.
3. Objective: List the main event in chronological order,
after reading the short story a VENDETTA by Guy de /
Maupassant.
Test item: From the short story A VENDETTA by Guy de
Maupassant, list the main event in chronological order.
4. Objective: Circle the nouns and pronouns from the given
list of words. /
Test item: Give five examples of pronouns and five
examples of verbs.
5. Objective: Make a freehand drawing about Region II
using your map as a guide. /
Test item: without using your map, draw the map of
Region II.
Lorin Anderson a former student of Bloom together with Krathwolh, revised the
Bloom’s taxonomy of cognitive domain in the mid-90s in order to fit the more outcome-
18
focused modern education objectives. There are two major changes: (1) the names in
the six categories from noun to active verb, and (2) the arrangement of the order of the
last two highest levels as shown in the given figure below. This new taxonomy reflects a
more active from of thinking and is perhaps more accurate.
1956 2001
Evaluation Creating
Synthesis Evaluating
Analysis Analyzing
Application Applying
Comprehension Understanding
Knowledge Remembering
Noun to Verb From
*Adapted with written permission from Leslie Owen Wilson’s curriculum Pages
Beyond Bloom – A New Version of the Cognitive Taxonomy.
19
3. Application: the ability to use 3.Applying: Objectives written on the applying
learned material, or to implement level require the learner to implement (use) the
material in new and concrete information: Carrying out or using a procedure
situations. through executing, or implementing. Applying
Examples of verbs that relate to relates and refers to situations where learned
this function are: apply, relate, material is used through products like models,
develop, translate, use, operate, presentations, interviews or simulations.
organize, employ, restructure, Sample verbs appropriate or objectives
interpret, demonstrate, illustrate, written at the applying level: apply, relate,
practice, calculate, show, exhibit, develop, translate, use, operate, organize,
dramatize employ, restructure, interpret, demonstrate,
illustrate, practice, calculate, show, exhibit,
dramatize
4. Analysis: the ability to break 4. Analyzing: Objectives written on the
down or distinguish the parts of the analysis level requires the learner to break the
material into their components so information into component parts and describe
that their organizational structure the relationship. Breaking material or concepts
may be better understood. into parts, determining how the parts relate or
Examples of verbs that relate to interrelate to one another or to an overall
this function are: analyze, compare, structure or purpose. Mental actions included in
probe, inquire, examine, contrast, this function are differentiating, organizing and
categorize, differentiate, investigate attributing, as well as being able to distinguish
detect, survey, classify, deduce, between the components or parts. When one is
experiment, scrutinize, discover, analyzing, he/she can illustrate this mental
inspect dissect, discriminate function be creating spreadsheets, survey,
separate charts, or diagrams, graphic representations.
Samples verbs appropriate for objectives
written at the analyzing level: analyze,, compare,
probe, inquire, examine, contrast, categorize,
differentiate, contrast, investigate, detect,
survey, classify, deduce, experiment, scrutinize,
discover, inspect, dissect, discriminate, separate
20
written at the evaluating level: appraise, choose,
compare, conclude, decide, defend, evaluate,
give your opinion, judge, justify, prioritize, rank,
rate, select, rate, support, value
6. Evaluation: The ability to 6.Creating: Objectives written on the
judge, check, and even critique the creating level require the student to generate
value of material for a given purpose. new idea and ways of viewing things. Putting
Examples of verbs that relate to elements together to from a coherent or
this function are: judge, assess, functional whole; reorganizing elements into a
compare, evaluate, conclude, new pattern or structure through generating,
measure, deduce, argue, decide, planning, or producing. Creating requires users
choose, rate, select, estimate, to put parts together in a new ways or
validate, consider, appraise, value, synthesize parts into something new and
criticize, infer different form or product. This process is the
most difficult mental function in the new
taxonomy.
This one used be No. 5 in Bloom’s
taxonomy and was known as the synthesis.
Sample verbs appropriate for objectives
written at the creating level: Change, combine,
compose, construct, create, invent, design,
formulate, generate, produce, revise,
reconstruct, rearrange, visualize, write, plan
*adapted with written permission from Leslie Owen Wilson’s Curriculum Pages
Beyond Bloom- A New Version of the Cognitive Taxonomy.
Cognitive Domain
Instructional Objectives:
At the end of the topic, the students should be able to identify the
different steps in testing hypothesis.
Test Item:
What are the different steps in testing hypothesis?
21
2. Comprehension involves students’ ability to read course content, interpret
important information and put other’s ideas into words. Test questions should
focus on the use of facts, rule and principles.
Instructional objective:
At the end of the lesson, the students should be able to summarize ,the
main events of the story INVICTUS in grammatically correct English.
Test Item:
Summarize the main events in the story INVICTUS in grammatically
correct English.
3. Application students take new concepts and apply them to new situation. Test
questions focus on applying facts and principles.
Instructional objective:
At the end of the lesson the students should be able to write a short poem
in iambic pentameter.
Test Item:
4. Analysis students have the ability to take new information and break it down
into parts and differentiate between them. The test questions focus on
separation of a whole into component parts.
Instructional objectives:
At the end of the lesson, the students should be able to describe the
statistical tools needed in testing the difference between two means
22
Test Item:
What kind of statistical test would you, run to see if there is a significant
different between pre-test and post-test?
Instructional objectives:
At the end of the lesson, the students should be able to compare and
contrast the two types of error.
Test Item:
What is the difference between type I and Type II error?
Instructional objectives:
At the end of the lesson, the students should be able to conclude the
relationship between two means.
Test Item:
What should the researcher conclude about the relationship in the
population?
Affective Domain
24
comparing, relating, and organization, family, and self,.
synthesizing values. The
learners are willing to be an Sample verbs appropriate for
advocate. objectives written at the
organizing level: adheres,
alters, arranges, combines,
compares, completes, defends,
explains, formulates, generalizes,
identifies, integrates, modifies,
orders, organizes, prepares,
relates, synthesizes
5. Characterization by Incorporate ideas completely Examples:
value or value set into practice, recognized by the
use of them. The value system Shows self-reliance when
that controls their behavior. working independently.
Instructional objectives are
concerned with the student’s Values people for what they are,
general patterns of adjustment not how they look.
such as personal, social, and Sample verbs appropriate for
emotional. The learners are objectives written at the
willing to change one’s characterizing level: acts,
behavior, lifestyle, or way of discriminates, displays,
life influences, listens, modifies,
performs, practices, proposes,
qualifies, questions, revises,
serves, solves, verifies
Psychomotor Domain
25
mental, physical, and emotional Recognizes one’s abilities and
sets. These three sets are limitations. Shows desire to learn
dispositions that predetermine a a new process (motivation).
person’s response to different Note: this subdivision of
situations (so metimes called Psychomotor domain is closely
mindsets). related to the “responding to
phenomena” subdivision of the
Affective domain.
Sample verbs appropriate for
objectives written at the set
level: begins, displays, explains,
moves, proceeds, reacts, shoes,
states, volunteers
3. Guided Response The early stages in learning a Examples:
complex skill that includes Performs a mathematical
imitation and trial and error. equation as demonstrated.
Adequacy of performance is
achieved by practicing. Follow instructions to build a
model.
Sample verbs appropriate fro
objectives written at the
guided response level: copies,
traces, follows, reacts,
reproduces, responds
4. Mechanism This is the intermediate stage in Examples:
learning a complex skill. Learned Uses a personal computer.
responses have become habitual
and the movements can be Repairs a leaking faucet.
performed with some confidence
and proficiency. Drives a car.
Sample verbs appropriate
objectives written at the
mechanism level: assembles,
calibrates, constructs,
dismantles, displays, fastens,
fixes, grinds, heats, manipulates,
measures, mends, mixes,
organizes, sketches
5. Complex Overt The skillful performance of Examples:
Response motor and acts that involves Operates a computer quickly and
complex movement patters. accurately.
Proficiency is indicated by a
quick, accurate, and highly Displays competence while
coordinated performance, playing the piano.
requiring a minimum of energy. Samples verbs appropriate for
This category includes objectives written at the
performing without hesitation, complex overt response level:
and automatic performance. For assembles, builds, calibrates,
example, players often utter constructs, dismantles, displays,
sounds of satisfaction or fasten, fixes, grinds, heats,
expletives as soon as they hit a manipulates, measures, mends,
tennis ball or throw a football, mixes, organizes, sketches
because they can tell by the fell
of the act what the result will Note: the key words are the same
produce. as mechanism, but will have
adverbs or adjectives that
indicate that the performance is
quicker, better, more accurate,
etc.
26
6. Adaption Skills are well developed and the Examples:
individual can modify movement Responds effectively to
patterns to fit special unexpected experiences.
requirements.
Modifies instruction to meet the
needs of the learners.
Samples verbs appropriate for
objectives written at the
adaption level: adapts, alters,
changes, rearranges, reorganizes,
revises, varies
7. Origination Creating new movement Examples:
patterns to fit a particular Creates a new gymnastic routine.
situation or specific problem. Sample verbs appropriate for
Learning outcomes emphasize objectives written at the
creativity based upon highly origination level: arranges,
developed skills. builds, combines, composes,
constructs, creates, designs,
initiates, makes, originates
Aside from the discussion of Simpson (1972) about the psychomotor domain,
there are two other popular versions commonly used by educators. The works of Dave,
R. H. (1975) and Harrow, Anita (1972) and Kubiszyn and Borich (2007) were discussed
below.
27
Harrow’s (1972), Kubisxyn and Borich (2007)
CHAPTER 3
Learning Outcomes
28
7. Discuss the different format of assessment tools;
8. Determine the advantages and disadvantages of the different format of test item;
9. Identify the different rules in constructing multiple-choice test, matching type
test, completion test, true or false test; and
10. Construct multiple-choice test, matching type test, completion test, true or false
test.
INSTRODUCTION
Ebel and Frisbie (1999) as cited by Garcia (2008) listed five basic principle that
should guide teachers in assessing the learning progress of the students and in
developing their own assessment tools. These principles are discussed below.
Assessing the performance of every student is a very critical task for classroom
teacher. It is very important that a classroom teacher should prepare the assessment
29
tool appropriately. Teacher-made tests are developed by a classroom teacher to assess
the learning progress of the students within the classroom. It has weaknesses and
strengths. The strengths of a teacher-made test lie on its applicabililtyand relevance in
the setting where they are utilized. Its weaknesses are the limited time and resources
for the teacher to utilize the test and also some of the technicalities involved in the
development of the assessment tools.
Test construction believed that every assessment tool should possess good
qualities. Most literatures consider the most common technical concepts in assessment
are the validity and reliability. For any type of assessment, whether traditional or
authentic, it should be carefully developed so that it may serve whatever purpose it is
intended for and the test results must be consistent with the type of assessment that
will be utilized.
In this section, we shall discuss the different terms such as clarity of the learning
target, appropriateness of an assessment tool, fairness, objectivity, comprehensiveness,
and ease of scoring and administering. Once these qualities of a good test are taken into
consideration in developing an assessment tool, the teacher will have accurate
information about the performance of each individual pupils or student.
When a teacher plans for his classroom instruction, the learning target should be
clearly stated and must be focused on student learning objectives rather than teacher
activity. The learning outcomes must be Specific, Measurable, Attainable, Realistic and
Time-bound (SMART) as discussed in the previous chapter. The performance task of the
students should also be clearly presented so that they can accurately demonstrate what
they are supposed to do and how the final product should be done. The teacher should
also discuss clearly with the students the evaluation procedures, the criteria to be used
and the skills to be assessed in the task.
The type of test used should always match the instructional objectives or
learning outcomes of the subject matter posed during the delivery of the instruction.
Teachers should be skilled in choosing and developing assessment methods appropriate
for instructional decisions. The kinds of assessment tools commonly used to assess the
learning progress of the students will be discussed in details in this chapter and in the
succeeding chapter.
1. Objective Test. It is a type of test that requires students to select the correct
response from several alternatives or to supply a word or short phrase to answer
a question or complete a statement. It includes true-false, matching type, and
multiple-choice questions. The word objective refers to the scoring, it indicates
that there is only one correct answer.
30
2. Subjective Test. It is a type of test that permits the student to organize and
present an original answer. It includes either short answer questions or long
general questions. This type of test has no specific answer. Hence, it is usually
scored on an opinion basis, although there will be certain facts and
understanding expected in the answer.
3. Performance Assessment. (Mueller, 2010) is an assessment in which students
are asked to perform real-world tasks that demonstrate meaningful application
of essential knowledge and skills. It is can appropriately measure learning
objectives which focus on the ability of the students to demonstrate skills or
knowledge in real-life situations.
4. Portfolio Assessment. It is an assessment that is based on the systematic,
longitudinal collection of student work created in response to specific known
instructional objectives and evaluated in relation to the same criteria (Ferenz, K.,
2001). Portfolio is a purposeful collection of student’s work that exhibits that
student’s efforts, progress and achievements in one or more areas over a period
of time. It measures the growth and development of students.
5. Oral Questioning. This method is used to collect assessment data by asking oral
questions. The most commonly used of all forms of assessment in class, assuming
that the learner hears and shares the use of common language with the teacher
during instruction. The ability of the students to communicate orally is very
relevant to this type of assessment. This is also a form of formative assessment.
6. Observation Technique. Another method of collecting assessment data is
through observation. The teacher will observe how students carry out certain
activities either observing the process of product. There are two types of
observation techniques: formal and informal observations. Formal observation
are planned in advance like when the teacher assess oral report or presentation
in class while informal observation is done spontaneously, during instruction like
observing the working behavior of students while performing a laboratory
experiment in a biology class and the like. The behavior of students involved in
hid performance during instruction is systematically monitored, described,
classified, and analyzed.
7. Self-report. The response of the students may be used to evaluate both
performance and attitude. Assessment tools could include sentence completion,
likert scales, checklists, or holistic scales.
31
scores when test administered twice to the same group of students and with a
reliability index of 0.61 above,.\
3. Fairness means the test item should not have any biases. It should not be
offensive to any examinee subgroup. A test can only be good if it is fair to all the
examinees.
4. Objectivity refers to the agreement of two or more raters of test administrators
concerning the score of a student. If the two rates who assess the same student
on the same test cannot agree on the score, the test lacks objectivity and neither
of the score from the judges is valid. Lack of objectivity reduces test validity in
the same way that the lack of reliability influence validity.
5. Scorability means that the test should be easy to score, direction for scoring
should be clearly in the instruction. Provide the students an answer sheet and
the answer key for the one who will check the test.
6. Adequacy means that the test should contain a wide range of sampling of items
to determine the educational outcomes or abilities so that the resulting scores
are representatives of the total performance in the areas measured.
7. Administrabilitymeans that the test should be administered uniformly to all
students so that the scores obtained will not very due to factors other than
differences of the students’ knowledge and skills. There should be a clear
provision for instruction for the students, proctors and even the one who will
check ,the test or the test scorer.
8. Practicality and Efficiency refers to the teacher’s familiarity with the methods
used, time required for the assessment, complexity of the administration, ease of
scoring, ease of interpretation of the test results and the materials used must be
at the lowest cost.
Let us discuss in details the different steps needed in developing good assessment
tools. Following the different steps is very important so that the test items developed
will measure the different learning outcomes appropriately. In this case, the teacher can
measure what is supposed to measure. Consider the following discussions in each step.
32
Examine the instructional Objectives of the Topic Previously Discussed
The first step in developing an achievement test is to examine and go back to the
instructional objectives so that you can match with the test items to be constructed.
Table of Specification (TOS) is a chart or table that details the content and level
of cognitive level assessed on a test as well as the types and emphases of test items
(Gareis and Grant, 2008). Table of specification is very important in addressing the
validity and reliability of the test items. The validity of the test means that the
assessment can be used to draw appropriate result from the assessment because the
assessment guarded against any systematic error.
Table of specification provides the test constructor a way to ensure that the
assessment is based from the intended learning outcomes. It is also a way of ensuring
that the number of questions on the test is adequate to ensure dependable results that
are not likely caused by chance. It is also a useful guide in constructing a test and in
determining the type of test items that you need to construct.
Below are the suggested steps in preparing a table of specification used by the
test constructor. Consider these steps in making a two-way chart table of specification.
See also format 1 of the Table of Specification for the other steps.
33
If properly prepared, a table of specification will help you limit the coverage of test
and identify the necessary skills or cognitive level required to answer the test item
correctly.
The first format of a table of specification is composed of the specific objectives, the
cognitive level, type of test used, the item number and the total points needed in each
item. Below is the template of the said format.
Cognitive Level pertains to the intellectual skill or ability to correctly answer a test
item using Bloom’s taxonomy of educational objectives. We sometimes refer to this as
the cognitive demand of a test item. Thus, entries in this column could be “knowledge,
comprehension, application, analysis, synthesis and evaluation.
Type of Test Item identifies the type or kind of test a test item belongs to. Examples
of entries in ths column could be “multiple-choice, true or false, or even essay.”
Item Number simply identifies the question number as it appears in the test.
34
3 x 10
Number of items = -------------
10
30
Number o items = ------
10
Number of items for the topic synthetic division = 3
Note:
The number if item for each level will depend on the skills the teacher wants to
develop in his students. In the case of tertiary level, the teacher must develop more
higher-order thinking skills (HOTS) questions.
For elementary and secondary levels, the guidelines in constructing test will be
as stipulated in the DepEd Order 33, Series 2004 must be followed. That is, factual
35
information 60%, moderately difficult or more advanced questions 30% and higher
order thinking skills 10% for distinguishing honor students.
In this section, we shall discuss the different format of objective type of test
items, the steps in developing objective and subjective test, the advantages and its
limitations. The different guidelines of constructing different types of objective and
subjective test items will also be discussed in this section.
Kubisxyn and borich (2007) suggested some general guidelines for writing test
items ,to help classroom teachers improve the quality of test items to write.
1. Begin writing items far enough or in advance so that you will have time to revise
them,.
2. Match items to intended outcomes at appropriate level of difficulty to provide
valid measure of instructional objectives. Limit the question to the skill being
assessed.
3. Be sure each item deals with an important aspect of the content area and not
with trivia.
4. Be sure the problem posed is clear and unambiguous.
5. Be sure that the item is independent with all other items. The answer to one item
should not be required as a condition in answering the next item. A hint to one
answer should not be embedded to another item.
6. Be sure the item has one or best answer on which expert would agree.
7. Prevent unintended clues to an answer in the statement or question.
Grammatical inconsistencies such a or an giveclues to the correct answer to
those students who are not well prepared for the test.
8. Avoid replication of the textbook in writing test items; do not quote directly from
the textual materials. You are usually not interested in how well students
memorize the text. Besides, taken out of context, direct quotes from the text are
often ambiguous.
9. Avoid trick or catch questions in an achievement test. Do not waste time testing
how well the students can interpret your intentions.
10. Try to write items that require higher-order thinking skills.
Consider the following average time in constructing the number of test items.
The length of time and the type of item used are also factors to be considered in
determining the number of items to be constructed in an achievement test. These
guidelines will be very important in determining appropriate assessment for college
students.
36
Assessment Format Average Time to Answer
True-false 30 seconds
Multiple-choice 60 seconds
Multiple-choice of higher level
90 seconds
learning objectives
Short Answer 120 seconds
Completion 60 seconds
Matching 30 seconds per response
Short Essay 10-15 minutes
Extended Essay 30 minutes
Visual Image 30 seconds
The number of items included in a given assessment will also depend on the
length of the class period and the type of items utilized. The following guidelines will
assist you in determining an assessment appropriate for college-level students aside
from the previous formula discussed.
Yes No
The item is appropriate to measure a learning objective.
The item format is the most effective means of measuring the desired
knowledge.
The item is clearly worded and can be easily understood by the target
student population.
The items of the same format are grouped together.
There are various item types include in the assessment.
The students have enough time to answer all test items.
The test instructions are specific and clear,.
The number of questions targeting each objective matches the weight of
importance of that objective.
The scoring guidelines are discussed clearly and available to students.
After constructing the test items following the different principles of constructing
test item, the next step to consider is to assemble the test items. There are two steps in
assembling the test: (1) packaging the test; and (2) reproducing the test,.
a. Group all test items with similar format. All items in similar format must be
grouped so that the students will not be confused.
b. Arrange test items from easy to difficult. The test items must be arranged
from easy to difficult so that students will answer the first few items correctly
and build confidence at the start of the test.
c. Space the test items for easy reading.
d. Keep items and option in the same page.
37
e. Place the illustrations near the description.
f. Check the answer key.
g. Decide where to record the answer.
Write Directions
Check the test directions for each item format to be sure that it is clear for the
students to understand. The test direction should contain the numbers of items to
which they apply; how to record their answers; the basis of which they select answer;
and the criteria for scoring or the scoring system.
Before reproducing the test, it is very important to proofread first the test items
for typographical and grammatical errors and make necessary corrections if any. If
possible, let others examine the test to validate its content. This can save time during
the examination and avoid destruction of the students.
Be sure to check your answer key so that the correct answers follow a fairly
random sequence. Avoid answers such as TFTFTF, etc., or TTFFF for a true or false type,
and A B C D A B C D patterns for multiple-choice type. The number of true answers must
be equally the same with dales answers and also among the multiple-choice options.
Analyzing and improving the test should be done after checking, scoring and
recording the test. The details of this part will be discussed in the succeeding chapter.
There are two general types of test item to use in achievement test using paper
and pencil test. It is classified as selection-type items and supply type items.
Selection type items require students to select the correct response from several
options. This is also known as objective test item. Selection type items can be classified
as: Multiple-choice; matching type; true or false; or interpretative exercises.
Objective test item requires only one correct answer in each item.
38
Kinds of Objective Type Test
In this section, we shall discuss the different format of objectives types of test
items and the general guidelines in constructing multiple-choice type of test, guidelines
in constructing the stem, options and distracters, advantages and disadvantages of
multiple-choice, guidelines in constructing matching type of test, advantages and
disadvantages of matching type of test, guidelines in constructing true or false and
comprehension types of test, advantages and disadvantages of true or false and
interpretative exercises.
a. Multiple-choice Test
Multiple-choice item consists of three parts: the stem, the keyed option and the
incorrect options or alternatives. The stem represents the problem or question usually
expressed in completion form or question form. The keyed option is the correct answer.
The incorrect options or alternativesalso called distracters or foil.
39
Guidelines in Constructing the Stem
1. Knowledge Level
The most stable measure(s) of central tendency is the _______________.
A. Mean
B. Mean and median
C. Median
D. Mode
40
This kind of question is a knowledge level type because the students are required
only to recall the properties of the mean. The correct answer is option A.
2. Comprehension Level
Which most of the following statements describe normal distribution?
A. The mean is greater than the median.
B. The mean median and mode are equal.
C. The scores are more concentrated at the other part of the distribution.
D. Most of the scores are high.
This kind of question is a comprehension level type because the students are
required to describe the scores that are normally distributed. The correct answer
is option B.
3. Application Level
What is the standard deviation of the following scores of 10 students in
mathematics quiz, 10, 13, 16, 16, 17, 19, 20, 20, 20, 25?
A. 3.90
B. 3.95
C. 4.20
D. 4.25
This kind of question is an application level because the students are asked to
apply the formula and solve for the variance. The correct answer is option C.
4. Analysis Level
What is the statistical test used when you test the mean difference between pre-
test?
A. Analysis of variance
B. t-test
C. Correlation
D. Regression analysis
This kind of question is an example of analysis level type because students are
required to distinguish which type of test is used. The correct answer is option B.
41
5. Ineffective in assessing the problem solving skills of the students.
6. Not applicable when assessing the student’s ability to organize and express ideas.
b. Matching type
Matching type item consist of two columns. Column A contains the description
and must be place at the left side while column B contains the options and placed
at the right side. The examinees are asked to match the option that are
associated with the description.
Direction: Match the function of the part of computer in Column A with its name
in Column B. Write the letter of your choice before the number.
Column A Column B
42
_____ 2. Consider as the brain of the computer B. Hard Drive
_____ 10. Permits a computer to store large amount of data J. Read Only Memory
K. Software
Another format of an objective type of test is the true or false type of test items. In
this type of test, the examinees determine whether the statement presented true or
false. True or false test item is an example of a “force-choice test” because there are only
two possible choices in this type of test. The students are required to0 choose the
answer true or false in recognition to a correct statement or incorrect statement.
True or False type of test is appropriate in assessing the behavioral objectives such
as “identify” “select,” or “recognize”. It is also suited to assess the knowledge and
43
comprehension level in cognitive domain. This type of test is appropriate when there
are only two plausible alternatives or distracters.
Direction: Write your answer before the number in each item. Write T if the
statement is true and F if the statement if false.
44
2. It easier to prepare compared to multiple-choice and matching type of test.
3. It is easier to score because it can be scored objectively compared to a test that
depends on the judgment of the rater(s).
4. T is useful when there are two alternative only.
5. The score is more reliable than essay test.
1. Limited only to low level of thinking skills such as knowledge and comprehension,
or recognition or recall information.
2. High probability of guessing the correct answer (5%) compared to multiple-
choice which consist of four option (25%).
Supply type items require students to create and supply their own answer or
perform a certain task to show mastery of knowledge or skills. It is also known as
constructed response test. Supply type items or constructed response test are classified
as:
Another way of assessing the performance of the students is by using the performance-
base assessment and portfolio assessment which are categorized under constructed
response test. Let us discuss the details of the selection type and supply type test items
in this selection while the performance-based assessment and portfolio assessment will
be discussed in the succeeding chapters.
Subjective test item requires the students to organize and present an original answer
(essay test ) and perform task to show mastery of learning (performance-based
assessment and portfolio assessment) or supply a word or phrase to answer a certain
question (completion or short answer type of test).
Essay test is a form of subjective type of test. Essay test measures complex cognitive
skills or processes. This type of test has no one specific answer per students. It is usually
scored on an option basis, although there will be certain facts and understanding
expected in the answer. There are two kinds of essay items: extended response essay
and restricted response essays.
Subjective types of test is another test format where the students supplies answer
rather than select the correct answer. In this selection, we shall consider the completion
type items or short answer test and essay type item. There are two types of essay items
45
according to the length of the answer: extended response essay and restricted response
essay.
The teacher must present and discuss the criteria used in assessing the answer of the
students in advance to help them to prepare from the test.
46
Examples of completion and short answer
Direction: Write your answer before the number in each item. Write the word(s),
phrase, or symbol(s) to complete the statement.
Essay Item 1. Which supply type item Essay Item 1. Supply type item used to
is used to measure the ability to measure the ability too organize and
organize and integrated material? integrate material is called _________.
1. It is only appropriate for questions that can be answered with short responses.
2. There is a difficult in scoring when the questions are not prepared properly and
clearly. The question should be clearly stated so that the answer of the student is
clear.
3. It can assess only knowledge, comprehension and application levels in Bloom’s
taxonomy of cognitive domain.
4. It is not adaptable in measuring complex learning outcomes.
47
5. Scoring is tedious and time consuming.
b. Essay Items
It is appropriate when assessing students’ ability to organize and present their
original ideas. It consists of a few number of questions wherein the examinee is
expected to demonstrate the ability to recall factual knowledge; organize his
knowledge; and present his knowledge in logical and integrated answer.
There are two types of essay item: extended response and restricted response
essay.
An essay test that allows the students to determine the length and
complexity of the response is called extended response essay item (Kubiszyn
and Borich, 2007). It is very useful in assessing the synthesis and evaluation
skills of the students. When the objective is to determine whether the
students can organize ideas, integrated and express ideas, evaluate
information in knowledge, it is best to use extended response essay test.
1. Present and describe the modern theory of evolution and discuss how
it is supported by evidence from the areas of (a) comparative
anatomy, (b) population genetic.
2. From the statement, “Wealthy politicians cannot offer fair
representation to all the people.” What do you think is the reasoning
of the statement? Explain your answer.
An essay item that places strict limits on both content and the response
given by the students is called restricted response essay item. In this type of
essay the content is ,usually restricted by the scope of the topic to be
discussed and the limitations on the form of the response is indicated in the
question.
48
When there is a restriction on the form and scope of the answer of the
students in an essay test, there can be advantages and disadvantages. The advantages
are: it is easier to prepare questions; it is easier to score; and it is more directly related
to the specific learning outcomes. The disadvantages are: it provides little opportunity
for the students to demonstrate their abilities to organize ideas, to integrate materials,
and to develop new patterns of answers; it measures learning outcomes at
comprehension, application and analysis levels only.
1. List the major facts and opinions in the first state of the nation address
(SONA) of Pres. BenignoCojuangcon Aquino, Jr. Limit your answer to one
page only. The score will depend on the content, organization and accuracy of
your answer.
2. Point out the strength =s and weaknesses of a multiple-choice type of test.
Limit your answer to five strengths and five weaknesses. Explain each
answer in not more than two sentences.
1. Choose a leader you admire most and explain why you admire him or her.
2. Pick a controversial issue in the Aquino administration. Discuss the issue and
suggest a solution.
3. If you were the principal of a certain school, describe how would you
demonstrate your leadership ability inside and outside of the school.
4. Describe the difference between Norm-referenced assessment and Criterion-
referenced assessment.
5. Do you agree or disagree with the statement, “Education comes not from
books but from practical experience. “Support your position.
49
Types of Complex Outcomes and Related Terms
1. It is easier to prepare and less time consuming compared to other paper and
pencil tests.
2. It measures higher-order thinking skills (analysis, synthesis and evaluation).
3. It allows students’ freedom to express individuality in answering the given
question.
4. The students have a chance to express their own ideas in order to plan their own
answer.
5. It reduces guessing answer compared to any of the objective type of test.
6. It presents more realistic task to the students.
7. It emphasizes on the integration and application of ideas.
50
1. It cannot provide an objective measure of the achievement of the students.
2. It needs so much time to grade and prepare scoring criteria.
3. The scores are usually not reliable most especially without scoring criteria.
4. It means limited amount of contents and objectives.
5. Low variation of scores.
6. It usually encourages bluffing.
Yes No
The test item is appropriate for measuring the intended learning outcomes.
The test item task matches with the learning task to be measured.
The questions constructed measure complex learning outcomes.
It is states in the questions what is being measured and how the answer are
to be evaluated.
The terminology used clarified and limits the task.
All students are required to answer the same question.
There is an establish time limit to answer each question.
Provisions for scoring answers are given (criteria for evaluating answer).
51
CHAPTER 4
Learning Objectives
INTRODUCTION
After designing the assessment tools, package the test, administer the test to the,
students, check the test papers, score and then record them. Return the test papers and
then give feedback to the students regarding the result of the test.
Assuming that you have already assembled the test, you write the instructional
objectives, prepare the table of specification, and write the test items that match with
the instructional objectives, the next thing to do is to package the test and reproduce it
as discussed in the previous chapter.
52
ADMINISTERING THE EXAMINATION
After constructing the test items and putting them in order, the nest step is to
administer the test to the students. The administration procedures greatly affect the
performance of the students in the test. The test administration does not simply means
giving the test questions to the students ad collecting the test papers after the given
time. Below are the guidelines in administering the test before, during and after the test.
53
Guideline After the Examination
After the examination, the next activity that the teacher needs to do is to score
the test papers, record the result of the examination; return the test papers and last to
discuss the test items in the class so that you can analyze and improve the test items for
future use.
1. Grade the papers (and add comments if you can); do test analysis (see the
module on test analysis) after scoring and before returning papers to students if
at all possible. If it is impossible to do your test analysis before returning the
papers, be sure to do it at another time. It is important to do both the evaluation
of your students and the improvement of your tests.
2. If you are recording grades or scores, record them in pencil in your class record
before returning the papers. If there are errors/ adjustments in grading they
(grades) are easier to change when recorded in pencil.
3. Return papers in a timely manner.
4. Discuss test items with the students. If students have questions, agree to look
over their papers again, as well as the papers of others who have the same
question. It is usually better not to agree to make changes in grades on the spur
of the moment while discussing the tests with the students but to give yourself
time to consider what action you want to take. The test analysis may have
already alerted you to a problem with a particular question that is common to
several students, and you may already have made a decision regarding, that
question (to disregard the question and reduce the highest possible score
according, to give all students credit for that question, among others).
After administering and scoring the test, the teacher should also analyze the
quality of each item in the test. Through this you can identify the item that is good, item
that needs improvement or items to be removed from the test. But when do we consider
that the test is good? How do we evaluate the quality of each item in the test? Why is it
necessary to evaluate each item in the test? Lewis Aiken (1997) an author or
psychological and educational measurement pointed out that a “postmortem” is just as
necessary in classroom assessment as it is in medicine.
In this section, we shall introduce the technique to help teachers determine the
quality of a test item known as item analysis. One of the purposes of item analysis is to
improve the quality of the assessment tools. Through this process, we can identify the
item that is to be retained, revised or rejected and also the content of the lesson that is
mastered or not.
There are two kinds of item analysis, quantitative item analysis and qualitative
item analysis (Kubiszyn and Borich, 2007).
54
Item Analysis
1. Item analysis data provide a basis for efficient class discussion of the test
results.
2. Item analysis data provide a basis for remedial work.
3. Item analysis data provide a basis for general improvement of classroom
instruction.
4. Item analysis data provide a basis for increased skills in test construction.
5. Item analysis procedures provides a basis for constructing test bank.
There are three common types of quantitative tem analysis which provide
teachers with three different types of information about individual test items. These are
difficulty index, discrimination index, and response options analysis.
1. Difficulty Index
It refers to the proportion of the number of students in the upper and
lower groups who answered an item correctly. The larger the proportion, the
more students, who have learned the subject is measured by the item. To
compute the difficulty index of an item, use the formula:
𝑛
DF= N, where
DF = difficulty index
N = number of the students selecting item correctly in the upper group
and in the lower group.
N = total number of students who answered the test
Level of Difficulty
To determine the level of difficulty of an item, find first the difficulty index
using the formula and identify the level of difficulty using, the range given below.
55
0.61 – 0.80 Easy
0.81 – 1.00 Very Easy
The higher the value of the index of difficulty, the easier the item is. Hence, more
students got the correct answer and more students mastered the content measured by
that item.
2. Discrimination Index
The power of the item to discriminate the students between those who
scored high and those who scored low in the overall test. In other words, it is the
power of the item to discriminate the students who know the lesson and those
who do not know the lesson.
It also refers to the number of students in the upper group who got an
item correctly minus the number of students in the power group who got an item
correctly. Divide the difference the difference by either the number of the
students in the upper group or number of students in the lower group or get the
higher number if they are not equal.
Discrimination index is the basis of measuring the validity of an item. This
index can be interpreted as an indication of the extent to which overall
knowledge of the content area or mastery of the skills is related to the response
on an item.
1. Positive discrimination happens when more students in the uppe group got the
item correctly than those students in the lower group.
2. Negative discrimination occurs when more students in the lower group got the
item correctly than the students in the upper group.
3. Zero discrimination happens when a number of students in the upper group
and lower who answer the test correctly are equal, hence, the test item cannot
distinguish the students who performed in the overall test and the students
whose performance are very poor.
Level of Discrimination
Ebel and Frisbie (1986) as cited by Hetzel (1997) recommended the use of Level
of Discrimination of an Item for easier interpretation.
56
Index Range Discrimination Level
0.19 and Poor item, should be eliminated or need to be revised
below
0.20 – 0.29 Marginal item, needs some revision
0.30 – 0.39 Reasonably good item but possibly for improvement
0.40 and Very good item
above
CUG – CLG
DI = ------------- , where
D
DI = discrimination index value
CUG = number of the students selecting the correct answer in the upper group
CLG = number of the students selecting the correct answer in the lower group
Note: Consider the higher number in case the sizes in upper and lower group a rot
equal.
Yes No
1. Does the key discriminate positively?
2. Does the incorrect options discriminate negatively?
If the answer to questions 1 and 2 are both YES, retain the item.
If the answers to questions 1 and 2 are either YES or NO, revise the item.
If the answer to questions 1 and 2 are both NO, eliminate or reject the item.
Distracter Analysis
1. Distracter
Distracter is the term used for the incorrect options in the muliplr-choice type of
test while the correct answer represents the key. It is very important for the test
writer to know if the distracters are effective or good distracters. Using
quantitative item analysis we can determine if the options are good or if the
distracters are effective.
Item analysis can identify non-performing test items, but this item seldom
indicates the error or the problem in the given item. There are factors to be
considered why students failed to get the correct answer in the given question.
58
h. The student failed to study the lesson.
2. Miskeyed item
The test item is a potential miskey if there re more students from the upper
group who choose the incorrect options than the key.
3. Guessing item
Students from the upper group have equal spread of choices among the given
alternatives. Students from the upper group guess their answers because of the
following reasons:
a. The content of the test is not discussed in the class or in the text.
b. The test item is very difficult.
c. The question is trivial.
4. Ambiguous item
This happens when more students from the upper group choose equally an
incorrect option and the keyed answer.
Consider the following examples in analyzing the test item and some notes on
how to improve the item based from the results of items analysis.
59
Option A B* C D E
Upper 3 10 4 0 3
Group
Lower 4 4 8 0 4
Group
Example 2.A class is composed of 50 students. Use 27% to get the upper and the
lower groups. Analysis the item given the following results. Option D is the correct
answer. What will you do with the test item?
Option A B C D* E
Upper Group 3 1 2 6 2
(27%)
Lower Group 5 0 4 4 1
(27%)
60
1. Compute the difficulty index
n = 6 +4 = 10
N = 28
𝑛
DF – N
10
DF = 28
DF = 0.36 of 36%
Example 3.A class is composed of 50 students. Use 27% to get the upper and the
lower groups. Analyze the item given the following results. Option E is the correct
answer. What will you do with the test item?
Option A B C D E*
Upper Group 2 3 2 2 5
(27%)
Lower Group 2 2 1 1 8
(27%)
61
1. Compute the difficulty index:
n = 5 + 8 = 13
N = 28
𝑛
DF = N
13
DF = 28
DF = 0.46 or 46%
2. Compute the discrimination index.
CUG = 5
CLG = 8
D=4
CUG – CLG
DI = ----------------
D
5−8
DI =
14
−3
DI = 14
DI = 0.21 or -21%
3. Make an analysis.
a. 46% of the students got the answer to test item correctly, hence, the test item
is moderately difficult.
b. More students from the lower group got the item correctly; therefore, it is a
negative discrimination. The discrimination index is -21%.
c. No need to analyze the distracters because the item discriminates negatively.
d. Modify all the distracters because they are not effective. Most of the students
in the upper group chose the incorrect options. The options are effective if
most of the students in the lower group chose the incorrect options.
4. Conclusion: Reject the item because it has a negative discrimination index.
Example 4.Potential Miskeyed Item. Make an item analysis about the table below.
What will you do with the test that is a potential miskeyed item?
Option A* B C D E
Upper Group 1 2 3 10 4
Lower Group 3 4 4 4 5
62
DF = 0.10 or 10%
2. Compute the discrimination index.
CUG = 1
CLG = 3
D = 20
CUG – CLG
DI = ----------------
D
1−3
DI = 20
−2
DI = 20
DI = 0.10 or -10%
3. Make an analysis.
a. More students from the upper group choose option D than option A, even
though option A is supposedly the correct answer.
b. Most likely the teacher has written the wrong answer key.
c. The teacher checks and finds out that he/she did not miskey the answer that
he/ she though is the correct answer.
d. If the teacher,miskeyed it, he/ she must check and retally the scores of the
students’ test papers before giving them back.
e. If option A is really the correct answer, revise to weaken option D, distracters
are not supposed to draw more attention than the keyed answer.
f. Only 10% of the students got the answer to the test item correctly, hence, the
test item is very difficult.
g. More students from the lower group got the item correctly, therefore a
negative discrimination resulted. The discrimination index is -10%.
h. No need to analyze the distracters because the test item is very difficult and
discriminates negatively.
4. Conclusion: Reject the item because it is very difficult and has a negative
discrimination.
Option A B C D E*
Upper Group 7 1 1 2 8
Lower Group 6 2 3 3 6
63
14
DF = 39
DF = 0.36 or 36%
2. Compute the discrimination index.
CUG = 8
CLG = 6
D = 20
CUG – CLG
DI = ----------------
D
8−6
DI = 20
2
DI = 20
DI = 0.10 or 10%
3. Make an analysis.
a. Only 36% of the students got the answer to the test item correctly, hence, the
test item is difficult.
b. More students from the upper group got the item correctly, hence, it
discriminates positely. The discrimination index is 10%.
c. About equal numbers of top students went for option A and option E, this
implies that they could not tell which is the correct answer. The students do
not know the content of the test, thus, a reteach is needed.
4. Conclusion: revise the test item because it is ambiguous.
Example 6.Guessing Item.Below is the result of item analysis for a test with students’
answers mostly based on a guess. Are you going to reject, revise or retain the test item?
Option A B C* D E
Upper Group 4 3 4 3 6
Lower Group 3 4 3 4 5
64
4−3
DI = 20
1
DI = 20
DI = 0.05 or 5%
3. Make an analysis.
a. Only 18% of the students got the answer to the test item correctly, hence, the
test item is very difficult.
b. More students from the upper group got the correct answer to the test item;
therefore, the test item is a positive discrimination. The discrimination index
is 5%.
c. Students respond about equally to all alternatives, an indication that they are
quessing.
Three possibilities why student guesses the answer on a test item:
The content of the test item had not yet been discussed in the class
because the test is designed in advanced;
Test items were badly written that students have no idea what the
question is really about; and
Test items were very difficult as shown from the difficulty index
and low discrimination index.
4. Conclusion: Reject the item because it is very difficult; reteach the material to the
class.
Example 7.Guessing Item.The table below shows an item analysis of a test item with
ineffective distracters. What can you conclude about the test item?
Option A B C* D E
Upper Group 5 3 9 0 3
Lower Group 6 4 6 0 4
65
3
DI = 20
DI = 0.15 or 15%
3. Make an analysis.
a. Only 38% of the students got the answer to the test item correctly, hence, the
test item is difficult.
b. More students from the upper group answered the test item correctly; as a
result, the test got a positive discrimination. The discrimination index is 15%.
c. Options A, B and E are attractive distracters.
d. Option D is ineffective, therefore, change it with more realistic one.
4. Conclusion: Revise the item by changing option D.
66
CHAPTER 5
Learning Outcomes
INTRODUCTION
Statistics is very important tool in the utilization of the assessment data most
especially in describing, analyzing, and interpreting the performance of the students in
the assessment procedures. The teachers should have the necessary background in the
statistical procedures used in assessment of student learning in order to give a correct
description and interpretation about the achievement of the students in a certain test
whether classroom assessment conducted by the teacher, division or national
assessment conducted by the Department of Education.
In this chapter, we shall discuss the important tools in analyzing and interpreting
assessment results. These statistical tools are measures of central tendency, measures
of variation, skewness, correlation, and different types of converted scores.
DEFINITION OF STATISTICS
67
Branches of Statistics
FREQUENCY DISTRIBUTION
1. Class Limit is the grouping or categories defined by the lower and upper limits.
Examples: LL – UL
10 – 14
15 – 19
20 – 24
Lower class limit (LL) represent the smallest number in each group.
Upper class limit (UL) represent the highest number in each group.
2. Class size (c.i) is the width of each class interval.
Examples: LL – UL
10 – 14
15 – 19
20 – 24
3. Class boundaries are the numbers used to separate each category in the
frequency distribution but without gaps create by the class limits. The scores of
the students are discrete. Add 0.5 to the upper limit to get the upper class
boundary and subtract 0.5 to the lower limit to get the lower class boundary in
each group or category.
Examples: LL – UL LCB - UCB
10 – 14 9.5 – 14.5
15 – 19 14.5 – 19.5
20 – 24 19.5 – 24.5
4. Class marks are the midpoint of the lower and upper class limits. The formula is
LL+UL
XM= 2 .
Examples: LL – UL XM
10 – 14 12
68
15 – 19 17
20 – 24 22
1. Compute the value of the range (R). Range is the difference between the highest
score and the lowest score.
R = HS – LS
Determine the class size (c.i). The class size is the quotient when you
divide the range by the desired number of classes or categories. The desired
numbers of classes are usually 5, 10 or 15 they depend in the number of scores
in the distribution. If the desired number of classes is not identified,
𝑅 𝑅
𝑐. 𝑖 = desired number of classes or 𝑐. 𝑖 = 𝐾.
2. Set up the class limits of each class or category. Each class defined by the lower
limit and upper limit. Use the lowest score as the lower limit of the first class.
3. Set up the class boundaries of needed. use the formula
𝐿𝐿 𝑜𝑓 𝑡ℎ𝑒 𝑠𝑒𝑐𝑜𝑛𝑑 𝑐𝑙𝑎𝑠𝑠 − 𝑈𝐿 𝑜𝑓 𝑡ℎ𝑒 𝑓𝑖𝑟𝑠𝑡 𝑐𝑙𝑎𝑠𝑠
𝑐. 𝑖 =
2
4. Tally the scores in the appropriate classes.
5. Find the other parts if necessary such as class marks, among others.
17 25 30 33 25 45 23 19
27 35 45 48 20 38 39 18
44 22 46 26 36 29 15-LS 21
50-HS 47 34 26 37 25 33 49
22 33 44 38 46 41 37 32
R = HS – LS
= 50 – 15
R = 35
n = 40
Solve the value of k.
k = 1 + 3.3 log n
k = 1 + 3.3 log 40
k = 1 + 3.3 (1.602059991)
k = 1 + 5.286797971
k = 6.286797971
k=6
Find the class size.
𝑅
𝑐. 𝑖 =
𝐾
35
𝑐. 𝑖 =
6
69
𝑐. 𝑖 = 5.833
𝒄. 𝒊 = 𝟔
Construct the class limit starting with the lowest score as the lower limit of the
first category. The last category should contain the highest score in the distribution.
Each category should contain 6 as the size of the width (X). Count the number of scores
that falls in each category (f).
15 – 20 //// 4
21 – 26 ///////// 9
27 – 32 /// 3
33 – 38 ////////// 10
39 – 44 //// 4
45 – 50 ////////// 10.
n = 40
Find the class boundaries and class marks of the given score distribution.
X f Class Boundaries XM
70
height of the rectangles corresponds to the class frequencies. Histogram is best used for
graphical representation of discrete data or non-continuous data.
Frequency polygon is constructed by plotting the class marks against the class
frequencies. The x-axis corresponds to the class marks and the y-axis corresponds to the
class frequencies. Connect the points consecutively using a straight line. Frequency
polygon is best used in representing continuous data such as the scores of students in a
given test.
X frequency (f)
15 – 20 4
21 – 26 9
27 – 32 3
33 – 38 10
39 – 44 4
45 – 50 10.
n = 40
There are two major concepts in describing the assessed performance of the
group: measures of central tendency and measures of variability. Measures of central
tendency are used to determine the average score of a group of scores while measure of
variability indicate the spread of scores in the group. These two concepts are very
important and helpful in understanding the performance of the group.
71
1. Mean
Mean is the most commonly used measure of the center of data and it is also
referred as the “arithmetic average.”
Computation of Population Mean
ƩX 𝑥1+ 𝑥2 + 𝑥
3 +⋯ 𝑥𝑛
𝜇= =
N N
ƩX 𝑥1+ 𝑥2 + 𝑥
3 +⋯ 𝑥𝑛
𝑋= =
N N
72
Analysis:
Example 2: Find the Grade Point Average (GPA) of Ritz Glenn for the first
semester of the school year 2010 – 2011. Use the table below:
Ʃ(𝑤𝑖 ) (𝑥𝑖 )
𝑋= Ʃ𝑤𝑖
32
𝑋 = 26
𝑿 = 𝟏. 𝟐𝟑
The Grade Point Average of Ritz Glenn for the first semester SY 2010 – 2011 is 1.23.
Grouped data are the data or scores that are arranged in a frequency
distribution. Frequency distribution is the arrangement of scores according to
category of classes including the frequency. Frequency is the number of observations
falling in a category.
For this particular lesson we shall discuss only one formula in solving the mean
for gouped data which is called midpoint method. The formula is:
Ʃf𝑋𝑚
𝑋= n
73
Ʃf𝑋𝑚 – summation of the product of f𝑋𝑚
1. Find the midpoint or class mark (Xm)of each class or category using the
LL+UL
formula Xm= .
2
2. Multiply the frequency and the corresponding class mark f𝑋𝑚 .
3. Find the sum of the results in step 2.
Ʃf𝑋𝑚
4. Solve the mean using the formula𝑋 = .
n
X F Xm 𝐟𝑿𝒎
10 – 14 5 12 60
15 – 19 2 17 34
20 – 24 3 22 66
25 – 29 5 27 135
30 – 34 2 32 64
35 – 39 9 37 333
40 – 44 6 42 252
45 – 49 3 47 141
50 – 54 5 52 260
n = 40 Ʃf𝑋𝑚 = 1 345
Ʃf𝑋𝑚
𝑋=
n
1 345
𝑋=
40
𝑋 = 33.63
Analysis:
74
4. It may not be an actual score in the distribution.
5. It can be applied to interval level of measurement.
6. It is very easy to compute.
2. Median
X (score)
19
17
16
15
10
5
2
Analysis:
The median score is 15. Fifty percent (50%) or three of the scores are above 15
(19,17,16) and 50% or three of the scores are below 15 (10,5,2).
X (score)
75
30
19
16
15
10
5
2
16 + 15
𝑋̃ =
2
𝑋̃ = 15.5
Analysis:
The median score is 15.5 which means that 50% of the scores in the distribution
are lower than 15.5, those are 15,10,5, and 2; and 50% are greater then 15.5 those are
30,19,17,16 which mean four (4) scores are below 15.5 and four (4) scores are above
15.5.
Formula:
𝑛
−cfp
𝑋̃ = 𝐿𝐵 [ 2 fm ]c.i
𝑋̃ = median value
𝑛
MC = median class is a category containing the 2
cfp = cumulative frequency before the median class if the scores are
arranged from lowest to highest value
76
Example 3: Scores of 40 students in a science class consist of 60 items and they are
tabulated below. The highest score is 54 and the lowest score is 10.
X F cf<
10 – 14 5 5
15 – 19 2 7
20 – 24 3 10
25 - 29 5 15
30 – 34 2 17 (cfp)
35 – 39 9 (fm) 26
40 - 44 6 32
45 – 49 3 35
50 - 54 5 40
n = 40
Solution:
𝑛 40
= = 20
2 2
𝑛
The category containing 2 is 35-39.
MC = 35 – 39
LL of the MC = 35
𝐿𝐵 = 34.5
cfp = 17
fm = 9
c.i = 5
𝑛
−cfp
𝑋̃ = 𝐿𝐵 + [ 2 fm ]c.i
20−17
= 34.5 + [ ]5
9
3
= 34.5+ [9] 5
15
= 34.5 + 9
= 34.5 + 1.67
77
𝑋̃ = 36.17
Analysis:
The median value is 36.17, which means that 50% or 20 scores are less
than 36.17.
3. Mode
Mode is the third measure of central tendency. The mode or the modal score is a
score or scores that occurred most in the distribution. I is classified as unimodal,
bimodal, and trimodal and multimodal.Unimodal is a distribution o scores that consists
of only one mode. Bimodal is a distribution of scores that consists of two modes.
Trimodal is a distribution of scores that consists of three modes or multimodal is a
distribution of scores that consists of more than two modes.
The score that appeared most in section A is 20, hence, the mode of section A is
20. There is only one mode, therefore, score distribution is called unimodal. The modes
of section B are24, since both 18 and 24 appeared twice. There are two modes in
section B, hence, the disctribution is a bimodal distribution. The modes for section C
are18,21, and 25. There are three modes for section C, therefore, it is called a
trimodalor multimodal distribution.
78
Mode for Grouped Data
In solving the mode value using grouped data, use the formula:
𝑑1
𝑥̂ = 𝐿𝐵 + [𝑑 ]c.i
1 +𝑑2
x f
10 – 14 5
15 – 19 2
20 – 24 3
25 – 29 5
30 – 34 2
35 – 39 9
40 – 44 6
45 – 49 3
50 – 54 5
n = 40
Modal Class = 35 – 39
LL of MC = 35
𝐿𝐵 = 34.5
d1 = 9 – 2 = 7
d2 = 9 – 6 = 3
79
c.i = 5
𝑑1
𝑥̂ = 𝐿𝐵 + [𝑑 ]c.i
1 +𝑑2
7
= 34.5 + [7+3]5
35
= 34.5 + 10
𝑥̂ = 3.5 + 3.5
𝑥̂ = 3.8
The mode of the score distribution that consists of 40 students is 38, because 38
occurred several times.
4. Quantiles
Quantile is a score distribution where the scores are divided into different equal
parts. There are three kinds of quantile. The quartile is a score point that divided the
scores in the distribution into four (4) equal parts. Decile is a score point that divides
the scores in the distribution into hundred (100) equal parts.
80
Quartile Decile Percintile
10%
LS LS LS
1%
10% P1
25% D1 P10
D2 P20
Q1 P25
D3 P30
D4 P40
Q2 D5 P50
D6 P60
D7 P70
Q3 P75
D8 P80
D9 P90
P99
HS HS HS
1 1
𝑄1 = [ 4 𝑛 + (1 − 4)]nth score
2 2
𝑄2 = [ 4 𝑛 + (1 − 4)]nth score
3 3
𝑄3 = [ 4 𝑛 + (1 − 4)]nth score
Where,
k = 1,2,3
81
n = number of cases
1 1
𝐷1 = [10 𝑛 + (1 − )]nth score
10
2 2
𝐷2 = [10 𝑛 + (1 − )]nth score
10
3 3
𝐷3 = [10 𝑛 + (1 − )]nth score
10
9 9
𝐷9 = [10 𝑛 + (1 − )]nth score
10
Where,
Dk= is the indicated decile
k = 1,2,3,4,5,6,7,8,9
n = number of cases
82
Example:
Using the given data 6, 8, 10, 12, 12, 14, 15, 16, 20. Find Q1, Q3, D6, D9, P65, P99.
x (score)
6
8
10
12
12
14
15
16
20
1. Solve the value of Q1.
n=9
1 1
𝑄𝑘 = [ 4 𝑛 + (1 − 4)]nth score
1 1
= [ 4 9 + (1 − 4)]nth score
9 3
= [ 4 + 4]nth score
12
= [ ]nth score
4
Q1 = 3rd score
3 3
𝑄3 = [ 4 (9) + (1 − 4)]nth score
27 1
= [ 4 + 4]nth score
28
= [ 4 ]nth score
𝑄3 = 7𝑡ℎ 𝑠𝑐𝑜𝑟𝑒
𝑄3 = 15
Hence, 75% of the scores in the distribution are less than 15.
83
3. Solve the value of D6.
36 6
𝐷6 = [10 (9) + (1 − )]nth score
10
54 4
𝐷6 = [10 + 10]nth score
58
𝐷6 = [10]nth score
D6 = 5.8th score
The value of D6 lies within the sum of the 5th score and 80% of the difference
between 6th and 5th scores.
D6 = 5th score + 0.80 (6th score – 5th score)
= 12 + 0.80 (14 – 12)
= 12 + 0.80 (2)
= 12 + 1.60
D6 = 13.60
Therefore, 60% of the scores in the distribution are less than 13.60.
Therefore, P65lies within the 6th and 7th scores. The value of P65 is the sum of
the 6th and 20% of the difference between the 7th and the 6th scores.
84
P65 = 6th score + 0.20(7th score – 6th score)
= 14 + 0.20 (15 – 14)
= 14 + 0.20 (1)
= 14 + 0.20
P65 = 14.20
Therefore, 65% of the scores in the distribution are less than 14.20.
k = 1, 2, and 3
𝑛
− 𝑐𝑓𝑝1
𝑄1 = 𝐿𝐵1 + [ 4 ]c.i
𝑓𝑞1
85
2𝑛
− 𝑐𝑓𝑝2
4
𝑄2 = 𝐿𝐵2 + [ ]c.i
𝑓𝑞2
3𝑛
− 𝑐𝑓𝑝3
4
𝑄3 = 𝐿𝐵3 + [ ]c.i
𝑓𝑞3
Example 1: The data for the scores of fifty (50) students in Filipino class are
given below. Solve for the value of Q1.
x f cf<
25 – 32 3 3
33 – 40 7 10
41 – 48 5 15
49 – 56 4 19
57 – 64 12 31
65 – 72 6 37
73 –80 8 45
81 – 88 3 48
89 – 97 2 50
n = 50
Solution:
𝑛 50
= = 37.5
4 4
Q1C = 41 – 48
LL = 41
LB= 40.5
cfp = 10
fq = 5
c.i = 8
𝑛
− 𝑐𝑓𝑝1
4
𝑄1 = 𝐿𝐵1 + [ ]c.i
𝑓𝑞1
12.5 − 10
𝑄1 = 40.5 + [ ]8
5
2.5
𝑄1 = 40.5 + [ 5 ]8
10
𝑄1 = 40.5 +
28
𝑄1 = 40.5 + 4
𝑄1 = 44.50
Therefore, 25% of the scores of 50 students who participated in the test are less than
44.50.
86
Example 2: The data for the scores of fifty (50) students in Filipino class are
given below. Solve for the value of Q3.
x f cf<
25 – 32 3 3
33 – 40 7 10
41 – 48 5 15
49 – 56 4 19
57 – 64 12 31
65 – 72 6 37
73 – 80 8 45
81 – 88 3 48
89 – 97 2 50
n = 50
Solution:
3𝑛 3(50)
= = 37.5
4 4
Q3C = 73 – 80
LL = 73
LB= 72.5
cfp = 37
fq = 8
c.i = 8
3𝑛
− 𝑐𝑓𝑝
4
𝑄3 = 𝐿𝐵3 + [ ]c.i
𝑓𝑞3
37.5 − 37
𝑄1 = 72.5 + [ ]8
8
0.5
𝑄3 = 72.5 + [ 8 ]8
4
𝑄3 = 72.5 +
8
𝑄3 = 72.5 + 0.5
𝑄3 = 73.00
Therefore, 75% of the scores in the distribution are less than 73.
where:
Dk = indicated decile
k = 1, 2, 3, 4, 5, 6, 7, 8, 9
LB = lower boundary of the indicted decile class
𝑛 2𝑛 3𝑛
DC = deciles class is a class or category containing 2 for D1, 10 for D2, 10 for D3 ….
9𝑛
for D9
10
87
cfp = cumulative frequency before the indicated decile class when scores are
arranged from lowest to highest
fd = frequency of the indicated decile class
c.i = size of class interval
(1)𝑛
− cfp
𝐷1 = 𝐿𝐵 + [ 10 fd ]c.i
(2)𝑛
− cfp
10
𝐷2 = 𝐿𝐵 + [ ]c.i
fd
(3)𝑛
− cfp
10
𝐷3 = 𝐿𝐵 + [ ]c.i
fd
(4)𝑛
− cfp
𝐷4 = 𝐿𝐵 + [ 10 fd ]c.i
(5)𝑛
− cfp
10
𝐷5 = 𝐿𝐵 + [ ]c.i
fd
(6)𝑛
− cfp
𝐷6 = 𝐿𝐵 + [ 10 fd ]c.i
(7)𝑛
− cfp
𝐷7 = 𝐿𝐵 + [ 10 fd ]c.i
(8)𝑛
− cfp
10
𝐷8 = 𝐿𝐵 + [ ]c.i
fd
(9)𝑛
− cfp
𝐷9 = 𝐿𝐵 + [ 10 fd ]c.i
Example 3: The data for the scores of fifty (50) students in Filipino class are
given below. Solve the value of D5.
x f cf<
25 – 32 3 3
33 – 40 7 10
41 – 48 5 15
49 – 56 4 19
57 – 64 12 31
65 – 72 6 37
73 – 80 8 45
81 – 88 3 48
89 – 97 2 50
n = 50
Solution:
(5)𝑛 (5)50 250
= = = 25
10 10 10
D5C = 57 – 64
88
LL = 57
LB = 56.5
cfp = 19
fd = 12
c.i = 8
(5)𝑛
− cfp
10
𝐷5 = 𝐿𝐵 + [ ]c.i
fd
25 − 19
𝐷5 = 56.5 + [ ]8
12
6
𝐷5 = 56.5 + [12]8
48
𝐷5 = 56.5 +
12
𝐷5 = 56.5 + 4
𝐷5 = 60.5
Hence, 50% of the scores of 50 students are less than 60.5
Examples 4: The data for the scores of fifty (50) students in Filipino class are
given below. Solve for the value of D7.
x f cf<
25 – 32 3 3
33 – 40 7 10
41 – 48 5 15
49 – 56 4 19
57 – 64 12 31
65 – 72 6 37
73 – 80 8 45
81 – 88 3 48
89 – 97 2 50
n = 50
Solution:
7𝑛 7(50) 350
= = = 35
10 10 10
D7C = 65 – 72
LL = 65
LB = 64.5
cfp= 31
fd= 6
c.i = 8
(7)𝑛
− cfp
10
𝐷7 = 𝐿𝐵 + [ ]c.i
fd
35 −31
𝐷7 = 64.5 + [ ]8
6
4
𝐷7 = 64.5 + [6]8
89
32
𝐷7 = 64.5 +
6
𝐷7 = 64.5 + 5.33
𝐷7 = 69.83
Where,
Pk = the indicated perecentile
k = 1, 2, 3, 4, ….. 97, 98, 99
LB = lower boundary of the indicted percentil class
𝑛 2𝑛 3𝑛 98𝑛 99𝑛
PC = percentile class containing 100 for P1, 100 for P2, 100 for P3, ….100for P98, 100 for
P99
cfp = cumulative frequency before the indicated percentile class when scores are
arranged from lowest to highest
fd = frequency of the indicated percentile class
c.i = size of class interval
To derive the formula in solving the indicated percentile, just change the value of
k to the indicated percentile. There are 99 formulas in solving the percentile. Some of
the formulas for percentile are the following.
(1)𝑛
− cfp
100
𝑃1 = 𝐿𝐵 + [ ]c.i
fd
(2)𝑛
− cfp
𝑃2 = 𝐿𝐵 + [ 100 fd ]c.i
(3)𝑛
− cfp
10
𝑃3 = 𝐿𝐵 + [ ]c.i
fd
(4)𝑛
− cfp
𝑃4 = 𝐿𝐵 + [ 100 fd ]c.i
(10)𝑛
− cfp
100
𝑃10 = 𝐿𝐵 + [ ]c.i
fd
(20)𝑛
− cfp
100
𝑃20 = 𝐿𝐵 + [ ]c.i
fd
(25)𝑛
− cfp
𝑃25 = 𝐿𝐵 + [ 100 fd ]c.i
(50)𝑛
− cfp
100
𝑃50 = 𝐿𝐵 + [ fd
]c.i
90
(75)𝑛
− cfp
100
𝑃75 = 𝐿𝐵 + [ ]c.i
fd
(90)𝑛
− cfp
𝑃90 = 𝐿𝐵 + [ 100 fd ]c.i
(95)𝑛
− cfp
100
𝑃95 = 𝐿𝐵 + [ ]c.i
fd
(99)𝑛
− cfp
𝑃99 = 𝐿𝐵 + [ 100 fd ]c.i
91
Example 6: The data for the scores of fifty (50) students in Filipino class are
given below. Solve the value of P91.
x f cf<
25 – 32 3 3
33 – 40 7 10
41 – 48 5 15
49 – 56 4 19
57 – 64 12 31
65 – 72 6 37
73 – 80 8 45
81 – 88 3 48
89 – 97 2 50
n = 50
Solution:
(91)𝑛 91(50) 4 550
= = = 45.50
100 100 100
P91C = 81 – 88
LL = 81
LB = 80.50
cfp = 45
fd = 3
c.i = 8
(91)𝑛
− cfp
100
𝑃91 = 𝐿𝐵 + [ ]c.i
fd
45.50 −45
𝑃91 = 80.50 + [ ]8
3
0.50
𝑃91 = 80.50 + [ ]8
3
4
𝑃91 = 80.50 +
3
𝑃91 = 80.50 + 1.33
𝑃91 = 81.83
Hence, 91% of the scores of 50 students are less than 81.83.
Measures of Variation
92
dispersion. There are several ways of describing the variation of scores: absolute
measures of variation and relative measures of variation.
Section A
Mean = 18.25
S = 5.15 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
28 29 30
Section B 0
Mean = 18.25
S = 6.92 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
28 29 30
Section C
Mean = 18.25
S = 7.63 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
28 29 30
What can you observe about the mean and the standard deviation of the three
groups of scores?
93
Which group of scores is most widespread? Less scattered?
Before answering such questions, let us first discuss the different types of
measures of variation.
There are four kinds of absolute variation. The range, inter-quartile range and
quartile deviation, mean deviation, variance and standard deviation.
1. Range
Range (R) is the difference between the highest score and the lowest score in a
distribution. Range is the simplest and the crudest measure of variation, simplest
because we shall only consider the highest score and the lowest score.
a. Range for Ungrouped Data
R = HS – LS
Where,
R = range value
HS = Highest score
LS = Lowest score
Group A Group B
10(LS) 15(LS)
12 16
15 16
17 17
25 17
26 23
28 25
30 26
35(HS) 30(HS)
RA = HS – LS
RA = 35 – 10
RA = 25
RB = HS – LS
RB = 30 - 15
RB = 15
Analysis:
The range of Group A = 25 is greater than the range of Group B = 15. The
implication of this is that scores in group A are more spread out than the scores in
group B or the scores in Group B are less scattered than the scores in group A.
94
b. Range for Grouped Data
R = HSUB – LSLB
Where,
R = range value
X F
25 – 32 3
33 – 40 7
41 – 48 5
49 – 56 4
57 – 64 12
65 – 72 6
73 – 80 8
81 – 88 3
89 – 97 2
n = 50
LL of LS = 25
LSLB = 24.5
UL of the HS = 97
HSUB = 97.5
R = HSUB – LSLB
R = 97.5 – 24.5
R = 73
Properties of Range
When the range value is large, the scores in the distribution are more dispersed,
widespread or heterogeneous. On the other hand, when the range value is small the
scores in the distribution are less dispersed, less scattered, or homogeneous.
95
2. Inter-quartile Range (IQR) and Quartile Deviation (QD)
Inter-quartile range is the difference between the third quartile and the first
quartile.
IQR = Q3 – Q1
Quartile deviation indicates the distance we need to go above and below the
median to include the middle 50% of the scores. It is based on the range of the middle
50% of the scores, instead the entire set.
𝑸𝟑 − 𝑸𝟏
The formula in computing the value of the quartile deciationis 𝑸𝑫 = , where
𝟐
QD is the quartile deviation value, Q1 is the value of the first quartile and Q3 us the value
of the third quartile.
96
Solve for Q1.
n=9
1 1
𝑄1 = [ 4 𝑛 + (1 − 4)]nth score
1 1
𝑄1 = [ 4 9 + (1 − 4)]nth score
9 13 nth score
𝑄1 = [ 4 + ]
4
12
𝑄1 = [ 4 ]nth score
𝑄1 = 3𝑟𝑑 𝑠𝑐𝑜𝑟𝑒
𝑄1 = 10
𝑸𝟑 − 𝑸𝟏
𝑸𝑫 =
𝟐
15 − 10
=
2
5
=
2
𝑄𝐷 = 2.5
Analysis:
97
b. Quartile Deviation of Grouped Data
𝑸𝟑 − 𝑸𝟏
𝑸𝑫 =
𝟐
Example 2: The data given below are the scores of fifty (50) students in Filipino
class. Solve for the value of quartile deviation (QD).
X F cf<
25 – 32 3 3
33 – 40 7 1.
41 – 48 5 15
49 – 56 4 19
57 – 64 12 31
65 – 72 6 37
73 – 80 8 45
81 – 88 3 48
89 – 97 2 50
n = 50
Solve for the value of Q1.
𝑛 50
= = 12.5
4 4
Q1C = 41 - 48
LL = 41
LB= 40.5
cfp = 10
fq = 5
c.i = 8
𝑛
− 𝑐𝑓𝑝1
𝑄1 = 𝐿𝐵1 + [ 4 ]c.i
𝑓𝑞1
12.5 − 10
𝑄1 = 40.5 + [ ]8
5
2.5
𝑄1 = 40.5 + [ 5 ]8
20
𝑄1 = 40.5 +
5
𝑄1 = 40.5 + 4
𝑄1 = 44.5
98
c.i = 8
3𝑛
− 𝑐𝑓𝑝3
4
𝑄3 = 𝐿𝐵3 + [ ]c.i
𝑓𝑞3
37.5 − 37
𝑄3 = 72.5 + [ ]8
8
0.5
𝑄3 = 72.5 + [ 8 ]8
4
𝑄3 = 72.5 +
8
𝑄3 = 72.5 + 0.5
𝑄3 = 73.00
𝑸𝟑 − 𝑸𝟏
𝑸𝑫 =
𝟐
73 − 44.5
𝑄𝑄 =
2
28.5
𝑄𝑄 =
2
𝑄𝑄 = 14.25
The larger the value of the IQR or QD, the more dispersed the scores at the middle
50% of the distribution. On the other hand, if the IQR or QD is small, the scores are less
dispersed at the middle 50% of the distribution. The point of dispersion is the median
value.
When the value of IQR and QD is small, the scores are clustered within middle
50% of the score distribution. On the other hand, the scores are dispersed in the middle
50% of the distribution when the value of IQR and QD. To determine which group of
distribution is more clustered or disperses you should compare it with another group of
distribution since there is no standard value of a small or large value of IQR and QD.
Mean deviation measure the average deviation of the values from the arithmetic
mean. It gives equal weight to the deviation of every score in the distribution.
99
MD = mean deviation value
x = individual score
x = sample mean
n = number of cases
x x-𝐱 /x - 𝐱/
35 13.8 13.8
30 8.8 8.8
26 4.8 4.8
24 2.8 2.8
20 -1.2 1.2
18 -3.2 3.2
18 -3.2 3.2
16 -5.2 5.2
15 -6.2 6.2
10 -11.2 11.2
Ʃx = 212 Ʃ/x - x/ =
60.4
Ʃ𝑄
x= 𝑄
212
x=
10
x = 21.2
Ʃ|𝑥 − 𝑥|
𝑀𝐷 =
𝑛
60.4
𝑀𝐷 =
10
𝑀𝐷 = 6.04
Analysis:
The mean deviation of the 10 scores of students is 6.04. This means that
on the average, the value deviated from the mean of 21.2 is 6.04.
100
b. Mean Deviation for Grouped Data
Ʃ𝒇|𝑿𝒎 − 𝒙|
𝑴𝑫 =
𝒏
Where,
f = class frequency
x = mean value
n = number of cases
Ʃ𝑓𝑋𝑚
x=
𝑛
1345
x=
40
x = 33.63
101
Ʃ𝒇|𝑿𝒎 − 𝒙|
𝑴𝑫 =
𝒏
𝟒𝟐𝟓. 𝟐𝟐
𝑴𝑫 =
𝟒𝟎
𝑴𝑫 = 𝟏𝟎. 𝟔𝟑
Analysis:
The mean deviation of the 40 scores of students is 10.63. This means that in the
average, the value deviated from the mean of 33.63 is 10.63.
Population Variance
2
Ʃ(𝑥 − µ)2
ơ =
𝑁
Sample Variance
Ʃ(𝑥 − 𝑥̅ )2
𝑠2 =
𝑛−1
Steps in Solving Variance of Ungrouped Data
Example 1: Using the data below, find the variance and standard deviation of the
scores of 10 students in a science quiz. Interpret the result.
x x-𝐱 (x - 𝐱)2
19 4.4 19.36
17 2.4 5.76
16 1.4 1.96
16 1.4 1.96
15 0.4 0.16
14 -0.6 0.36
13 -1.6 2.56
102
12 -2.6 6.76
10 -4.6 21.16
Ʃx = 146 Ʃ(x - 𝐱)2 =
60.40
x = 14.6
Note: If the standard deviation is already solved, square the value of the standard
deviation to get the variance.
103
Example 2: Score distribution of the test results of 40 students in a Filipino class
consisting of 50 items. Solve the variance and standard deviation and interpret the
result.
Population Variance
Ʃ𝑓(𝑋𝑚 − µ)2
ơ2 =
𝑁
2 606.4
ơ2 =
40
2
ơ = 65.16
Sample Variance
Ʃ𝑓(𝑋𝑚 − 𝑥̅ )2
𝑠2 =
𝑛−1
2 606.4
𝑠2 =
39
2
𝑠 = 66.83
Ʃ(𝑋 − µ)2
ơ=√
𝑁
104
Ʃ(𝑥 − 𝑥̅ )2
𝑠= √
𝑛−1
Note: If the variance is already solved, take the square root of the variance to get
the value of the standard deviation.
Example: Using the data on example 1, solve the population and sample
standard deviation.
Ʃ(𝑋 − µ)2
ơ=√
𝑁
60.40
ơ=√
10
ơ = √6.04
ơ = 2.46
Ʃ(𝑥 − 𝑥̅ )2
𝑠=√
𝑛−1
60.40
𝑠=√
9
𝑠 = √6.71
𝑠 = 2.59
105
4. Multiply the squared difference and the corresponding class
frequency.
5. Find the sum of the results in step 3.
6. Solve for the population standard deviation or sample standard
deviation using the formula for grouped data.
Ʃ𝑓(𝑋𝑚 − µ)2
ơ=√
𝑁
2 606.4
ơ=√
40
ơ = √65.16
ơ = 8.07
Ʃ𝑓(𝑋𝑚 − 𝑥̅ )2
𝑠=√
𝑁−1
2 606.4
𝑠=√
39
𝑠 = √66.8308
𝑠 = 8.18
1. If the value if standard deviation is large, on the average, the scores in the
distribution will be far from the mean. Therefore, the scores are spread out
around the mean value. The distribution is also known as heterogeneous.
2. If the value of standard deviation is small, on the average, the scores in the
distribution will be close to the mean. Hence, the scores are less dispersed or the
scores in the distribution are homogeneous.
Going back to the diagram presented, let us answer the questions pose using the
concept of mean and standard deviation with the help of the diagram below.
Section A
Mean = 18.25
S = 5.15 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
28 29 30
106
Section B
Mean = 18.25
S = 6.92 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
28 29 30
Section C
Mean = 18.25
S = 7.63 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
28 29 30
1. What can you observe about the mean and the standard deviation of the
three groups of scores?
Answer: The mean of the three groups of scores is the same which is equal to
18.25 and the standard deviation of section A = 5.15, section B = 6.92, and
section C = 7.63.
2. Which group of students performed well in the class?
Answer: In terms of performance, the three sections of students perform the
same because they have the same mean value of 18.25.
3. Which group of scores is most widespread? Less scattered?
Answer: The standard deviation of section A = 5.15, section B = 6.92 and
section C = 7.63. The scores that are most scattered are those in section C
because they have the largest value of standard deviation which is equal to
7.63. On the other hand, the less scattered group of scores is in section A
which has the smallest value of the standard deviation which is equal to 5.15.
Therefore, the smaller the value of the standard deviation on the average the
closer the scores are to the mean value and the larger the value of the
standard deviation on the average makes the scores scattered from the mean
value.
Using the diagram, there are more scores that are closer to the mean
value in section A than in section B and section C.
107
The formula in computing the coefficient of variation is:
𝑠
𝐶𝑉 = (𝑥̅ ) 100%
Where,
𝑥̅ = mean value
s = standard deviation
Example:
𝑠
𝐶𝑉𝐴 = ( ) 100%
𝑥̅
5.15
= 100%
18.25
𝑪𝑽𝑨 = 𝟐𝟖. 𝟐𝟐%
108
𝑠
𝐶𝑉𝐵 = ( ) 100%
𝑥̅
6.92
= 100%
18.25
𝑪𝑽𝑩 = 𝟑𝟕. 𝟗𝟐%
𝑠
𝐶𝑉𝐶 = ( ) 100%
𝑥̅
7.63
= 100%
18.25
𝑪𝑽𝑪 = 𝟒𝟏. 𝟖𝟏%
Analysis:
The sores in section A are less scattered than the scores in section B and section
C. In other words, scores in section A are more homogeneous than the scores in section
B and Section C. Another way to interpret this is, the scores in section c are more spread
out than the scores in section A and section B, or the scores in section C are more
heterogeneous tan the scores in section A and section B.
Measures of Skewness
Positively skewed or skewed to the right is a distribution where the thin end tail
of the graph goes to the right part of the curve. This happens when most of the scores of
the students are below the mean.
Negatively skewed or skewed to the left is a distribution where the then in tail of
the graph goes to the left part of the curve. This happens when most scores got by the
students are above the mean.
109
𝑥̂ 𝑥̃𝑥̅
𝑥̅ 𝑥̃ 𝑥̂
Negatively skewed distribution means that the students who took the
examination performed well. Most of the scores are high and there are only few low
scores. The shape of the score distribution indicates the performance of the students
but not the reasons why most of the students got high scores. The possible reasons why
students got high score are: the group of students are smart, there is enough time to
finish the examination, the test are very easy, and there is an effective instruction and
the students have prepared themselves for the examination.
110
Example 1: Find the coefficient of skewness of the scores of 40 grade 6 pupils in
a 100-item test in Mathematics if the mean is 82 and the median is 90 with standard
deviation of 15.
Given:
𝑥̅ = 82
𝑥̃ = 90
s = 15
3(𝑥̅ −𝑥̃)
SK= 𝑠
3(82 − 90)
=
15
3(−8)
=
15
−24
=
15
SK= −1.60
Analysis:
Given:
𝑥̅ = 46
𝑥̃ = 40
s = 7.5
111
3(𝑥̅ −𝑥̃)
SK= 𝑠
3(46 − 40)
=
7.5
3(6)
=
7.5
18
=
7.5
SK= 2.40
Analysis:
Normal Distribution
112
𝑥̅ = 𝑥̃ = 𝑥̂
2.14% 2.14%
0.13% 0.13%
When you add up the percentages of the baseline between three s units above
and three s units below, you come up with 99.98%. Let us evaluate the area under the
normal curve between the mean and the standard deviation as indicated in the diagram.
The percentage of cases that falls between the mean value and the value of the mean
plus the value of one standard deviation unit in the normal distribution of scores is
34.13%. And the percentage of cases that falls between the mean value and the value of
the mean minus the value of one standard deviation is 34.13%.
34.13%
113
58 62 66 70 74 78 82 86 90
From the given illustration with mean equals to 74 and a standard deviation of 4,
four points are added to each standard deviation unit above the mean (78, 82, 86, 90)
and four point are subtracted from the mean value for each standard deviation unit
below the mean (70, 66, 62, 58). This is approximately equal to 68.26% or 68% of the
scores in the distribution fall between 72 and 78 as shown in the following figures.
34.13%34.13%
58 62 66 70 74 78 82 86
90
68.26%
114
34.59%34.13%34.13%34.59%
58 62 66 70 74 78 82 86
90
Using the figure above, about 95.44% of the students got score from 66 to 82.
34.13%34.59%
58 62 66 70 74 78 82 86
90
Using the figure above, 47.72% or about 43% of the students got a score from 74
to 82.
We can also use the normal curve to determine the ,percentage of the scores of
students below or above a certain score. 15.86% or 16% of the students got a score
below 70. This can be considered also as a score of 70 is at 16th percentile.
115
13.59%
2.14%
0.13%
58 62 66 70 74 78 82 86 90
34.59%34.59%
13.59%
2.14%
0.13%
58 62 66 70 74 78 82 86 90
About 84.12% or 84% of the scores are below 78. This can be written as P84 = 78.
Standard Scores
In this section, we shall discuss the different kinds of converted scores. The
procedures for converting raw scores to standard scores are presented in this section.
There are four (4) types of standard scores: z-score, t-score, standard nine (stanines),
and percentile ranks.
116
Scores directly obtained from the test are known as actual scores or raw scores.
Such scores cannot be interpret as whether the score is low, average or high. Scores
must be converted or transformed so that they become meaningful and allow some kind
of interpretations and direct comparisons of two scores. Consider the two figures
below:
20 50 80 65 80 95
Figure A Figure B
The shape of the two score distributions above is the same, however, the means
and standard deviation are different. This happens because the ranges of the scores
from figure A differ from figure B. In this case, the scores in those figures cannot be
compared because they belong to two different groups.
The raw scores of all students in the Business Calculus and Production
Management are very important so that we can get the information that describes both
score distributions. Bases from our previous discussion, the mean value and the
standard deviation are necessary to describe the distribution of scores. Let us add the
mean values and standard deviations of the scores of students in Business Calculus and
Production Management as shown:
117
Business Calculus Production Management
x = 92 x= 88
𝑥̅ = 95 𝑥̅ = 80
s=3 s=4
Ritz Glenn’s score in Business Calculus is three (3) points below the class mean
performance nd eight (8) points above the class mean performance in Production
Management. Using the mean value, we can say that Ritz Glenn performed better in
Production Management than in Business Calculus compared with the performance of
the rest of his classmates. How about the standard deviation? The standard deviation
enables us to know how many percent of the scores above or below each score has in
the distribution.
Assuming that the scores in Business Calculus and Production Management are
normally distributed, let us construct s curve that represents the given data. The normal
curve model is used as a basis to compare the distribution with different means and
different standard deviations.
The shaded area represent the percentage of the scores lower than the score of
Ritz Glenn. In Business Calculus the score of Ritz Glenn is one standard deviation unit
below the mean and in Production Management his score is two standard deviation
units above the mean. To determine the exact percentage of the scores below the score
of Ritz Glenn in Business Calculus and Production management use the normal curve
model.
15.86% or approximately 16% of the scores below the score of Ritz Glenn in
Business Calculus or his score is 16th percentile.
118
-4s -3s -2s -1sMean 1s2s3s4s
1. z-scores
To get more exact information about the performance of Ritz Glenn collect the
raw score, mean and standard deviation and determine how far below or above the
mean in standard deviation units is the obtained raw score.
To determine the exact position of each score in the normal distribution use z-
score formula. The z-score is used to convert a raw score to standard score to
determine how far a raw score lies from the mean in standard deviation units. From this
we can also determine whether an individual student performs well in the examination
compared to the performance of the whole class.
The z-score value indicates the distance between the given raw score and the
mean value in units of the standard deviation. The z-value is positive when the raw
score is above the mean while the z is negative when the raw score is below the mean.
z = z-value
x = raw score
𝑥̅ = sample mean
µ = population mean
The z-score formula is very essential when we compare the performance of the
student in his subjects or the performance of two students that belongs to different
119
groups. It can determine the exact location of the scores whether above or below the
mean and how many standard deviation units it is from the mean.
Example: Using the data about Ritz Glenn’s scores in Business Calculus and
Production Management, solve the z-score value.
𝑥 − 𝑥̅
𝑧=
𝑠
92 − 95
𝑍𝐵𝐶 =
3
−3
𝑍𝐵𝐶 =
3
𝑍𝐵𝐶 = −1
z-score of Production Management (PM)
88 − 80
𝑍𝑃𝑀 =
4
8
𝑍𝑃𝑀 =
4
𝑍𝑃𝑀 = +2
Analysis:
The score of Ritz Glenn in Business Calculus is one unit standard deviation below
the mean. His score in Production Management is two units standard deviation above
the mean. Therefore, we can conclude that Ritz Glenn performed better in Production
Management than in Business Calculus.
2. T-scores
There are two possible values of z-score, positive z if the raw score is above the
mean and negative z if the raw score is below the mean. To avoid confusion between
negative and positive value, use T-score to convert raw scores. T-score is another type
of standard score where the mean is 50 and the standard deviation is 10. In z-score the
mean is 0 and the standard deviation is one (1). To convert raw score to T-score, find
first the z-score equivalent of the raw score and use the formula T-score = 10z + 50.
120
From the above discussion, z-score of Business Calculus is -1 and z-score of
Production Management is +2, solve the T-score equivalent:
T-scoreBC = 10z + 50
T-scoreBC = 10(-1) + 50
T-scoreBC = 10 + 50
T-scoreBC = 40
T-scoreBC = 10z + 50
T-scoreBC = 10(2) + 50
T-scoreBC = 20 + 50
T-scoreBC = 70
Analysis:
z-score
-4 -3-2-101 23 4
T-score
10 20 3040506070 80 90
3. Standard Nine
The third type of standard score is the Standard Nine point scale which is also
known as stanine, the origin word is sta(ndard) + nine. A stanineis a nine-point grading
scale ranging from 1 to 9, 1 being the lowest and 9 the highest. Stanine grading is easier
to understand than the other standard score model. The descriptive interpretation of
stanine 1, 2, 3 is below average, the stanine 4, 5, 6 is interpreted as average and the
descriptive interpretation of stanine 7, 8, 9 is above average. Use this graph as a basis of
analysis stanine results.
20%
121
-1.75 -1.25 -0.75 -0.25 0.250.75 1.25 1.75
1st 2nd 3rd 4th 5th 6th 7th8th
9th
Stanine
From the given figure, the mean of stanine is 0 and the central interval is 0.25
standard deviation of the mean, and each other interval is 0.5 standard deviation wide
except for the end tail of the normal curve.
The given figure below indicates the percentage of score in each stanine and the
corresponding descriptions.
Normal
Bell=shaped Curve
122
Percentage of .13% 2.14% 13.59% 34.13% 34.13%
13.59% 2.14% .13%
Cases in 8 portions
Of the curve
4. Percentile Rank
Another way of converting a raw score to standard score is the percentile rank. A
percentile rank indicates the percentage of scores that lies below a given score.
Example, a test score which is greater than 95% of the scores of the examinees is said to
be 95th percentile. If the scores are normally distributed, percentile rank can be inferred
from the standard score. In solving percentile rank use the formula:
𝐶𝐹𝑏 + 0.5𝐹𝑔
𝑃𝑅 = ( ) 𝑥 100
𝑛
Where,
PR = percentile rank
Solving the percentile rank is tedious or needs a very long process, we can short-
cut the solution using the SPSS program or EXCEL program which is more easier to use
and more cheaper than other software.
123
1. Arrange the test scores (TS) from highest to lowest.
2. Make a frequency distribution of each score and the number of students
obtaining each score. (F)
3. Find the cumulative frequency (CF) by adding the frequency in each score from
the bottom upward.
4. Find the percentile rank (PR) in each score using the formula and the result as
indicated in column 4.
Example: The table below shoes a summary of the scores of 40 students in a 45-
item multiple choice of test. Find the percentile ranks of each score in the
distribution.
TS F
45 1
43 2
42 2
41 1
40 1
39 2
37 3
36 2
34 1
33 2
32 2
30 3
29 4
28 1
27 1
25 2
24 1
22 2
21 2
19 1
18 2
16 1
15 1
40
Find the cumulative frequency of the frequency distribution. The third column
represents the cumulative frequency.
TS F CF
45 1 40
43 2 39
42 2 37
41 1 35
124
40 1 34
39 2 33
37 3 31
36 2 28
34 1 26
33 2 25
32 2 23
30 3 21
29 4 18
28 1 14
27 1 13
25 2 12
24 1 10
22 2 9
21 2 7
19 1 5
18 2 4
16 1 2
15 1 1
40
Find the percentile rank of each score.
a. Solution:
Score = 45
CFb = 39
Fg = 1
n = 40
𝐶𝐹𝑏 + 0.5𝐹𝑔
𝑃𝑅 = ( ) 𝑥 100
𝑛
39 + 0.5𝐹𝑔
𝑃𝑅 = ( ) 𝑥 100
40
39 + .5
𝑃𝑅 = ( ) 𝑥 100
40
39.5
𝑃𝑅 = ( ) 𝑥 100
40
𝑃𝑅 = 0.9875 𝑥 100
𝑃𝑅 = 98.75
𝑃𝑅 = 99
Analysis:
125
A raw score of 45 is equal to percentile rank of 99. This means that 99% of the
students who took the examination had raw scores equal to or lower than 45. This can
be written as PR99 = 45.
b. Solution:
Score = 43
CFb = 37
Fg = 2
n = 40
𝐶𝐹𝑏 + 0.5𝐹𝑔
𝑃𝑅 = ( ) 𝑥 100
𝑛
39 + 0.5(2)
𝑃𝑅 = ( ) 𝑥 100
40
37 + 1
𝑃𝑅 = ( ) 𝑥 100
40
38
𝑃𝑅 = ( ) 𝑥 100
40
𝑃𝑅 = 0.95 𝑥 100
𝑃𝑅 = 95
Analysis:
A raw score of 43 is equal to a percentile rank of 95. This means that 95% of the
students who took the examination had raw scores equal to or lower than 43. This can
be written also as PR95 = 43.
c. Solution:
Score = 42
CFb = 35
Fg = 2
n = 40
𝐶𝐹𝑏 + 0.5𝐹𝑔
𝑃𝑅 = ( ) 𝑥 100
𝑛
35 + 0.5(2)
𝑃𝑅 = ( ) 𝑥 100
40
35 + 1
𝑃𝑅 = ( ) 𝑥 100
40
36
𝑃𝑅 = ( ) 𝑥 100
40
𝑃𝑅 = 0.9 𝑥 100
𝑃𝑅 = 90
Analysis:
126
A raw score of 42 is equal to a percentile rank f 90. This means that 90% of the
students who took the examination had raw scores equal to or lower than 43. This can
be written also as PR90 = 42.
Note: continue solving the percentile ranks of each score in the distribution in
the exercise and compare the answers in the percentile ranks distribution in the
succeeding page.
When converting the raw scores to a percentile rank, the raw scores are put on a
scale that has the same meaning with different number of groups and for different
lengths of tests.
TS F CF PR
45 1 40 99
43 2 39 95
42 2 37 90
41 1 35 86
40 1 34 84
39 2 33 80
37 3 31 74
36 2 28 68
34 1 26 64
33 2 25 60
32 2 23 55
30 3 21 49
29 4 18 40
28 1 14 34
27 1 13 31
25 2 12 28
24 1 10 24
22 2 9 20
21 2 7 15
19 1 5 11
18 2 4 8
16 1 2 4
15 1 1 1
n = 40
127
2.14% 2.14%
0.13% 0.13%
Raw Scores
40 45 50 55 60 65 70 75 80
Mean = 60
S
=5
z-
score
-4 -3 -2 -1 0 1 2 3 4
T-
score
10 20 30 40 50 60 70 80
90
Stanine
1 2 3 4 5 6 7 8 9
128
Relationship between Different Standard Scores
This figure shows the relationship between the raw scores and he converted
scores assuming that the distribution is normally distributed. The score distribution has
a mean of 60 and standard deviation of 5. Using these parameters let us consider a raw
score of 75, this raw score is equivalent to a distance of three standard deviations from
the mean value, z-score of 3 and T-score of 80 and stanine of 8. This can be done using
the different process that we have discussed in the previous sections.
DESCRIBING RELATIONSHIPS
Correlation
Another statistical method used in analyzing test results is the correlation. This
is the tool that we are going to utilize if we want to determine the relationship or
association between the scores of students in two different subjects. Is there a
relationship between the Mathematics scores and the Science scores of 15 students?
What type of linear relationship exists between the two sets of scores? Such questions
can be answered using the concepts of correlation. In this chapter, the different type of
computing the correlation coefficient when raw scores and ordinal level of
measurement are given. The graphical method or scattergram of determining the
relationship between two groups of scores are also discussed in this section but only
limited to linear relationship.
Correlation refers to the extent to which the distributions are linearly related or
associated between the two variables. The extent of correlation is indicated numerically
by the coefficient (rxy). The correlation coefficient (rxy) also known as Pearson Product
Moment Correlation Coefficient in honor to Karl Pearson who developed the said
formula. The correlation coefficient ranges from -1 to +1. There are three kinds of
correlation based from the correlation coefficient: (1) positive correlation; (2) negative
correlation; and (3) zero correlation. There are two ways of identifying the correlation
between the two variables: (1) using the formula; and (2) using scatter point or
scattergram.
Kinds of Correlation
1. Positive Correlation
High scores in distribution x are associated with high scores in distribution y. Low
scores in distribution x are associated with low scores in distribution y. This means that
as the value of x increases the value of y increases too or as the value of x decreases, the
y values will also decrease. The line that best fitted to the given points upward to the
right as shown in the scattergram of positive correlations. The slope of the line is
positive.
129
2. Negative Correlation
High scores in distribution x are associated with low scores in distribution y. Low
scores in distribution x are associated with high scores in distribution y. This means
that as the values of x increase, the values of y decrease or when the values of x
decrease, the values of y increase. The line that best fitted to the given points downward
to the right as shown in the scattergram of negative correlations. The slope of the line is
negative.
3. Zero Correlation
No association between scores in distribution x and scores in distribution y. No
single line can be drawn that best fitted to all points as shown in the scattergram of zero
correlation. No discernable pattern can be formed.
The formula in computing the correlation coefficient using th Person Product
Moment Correlation is:
(𝑛)(Ʃ𝑥𝑦) − (Ʃ𝑥)(Ʃ𝑦)
𝑟𝑥𝑦 =
√[(𝑛)(Ʃ𝑥 2 ) − (Ʃ𝑥)2 ][(𝑛)(Ʃ𝑦 2 ) − (Ʃ𝑦)2 ]
Scattergram of Correlation
Another way of determining the correlation of pair of scores is through the use of
graphing. The graphical representation is called scattergram. Using your knowledge in
graphing ordered pairs in the coordinate plane; graph the scores of 8 students in
mathematics and science.
130
2 8
3 10
4 11
5 13
6 16
7 20
8 21
Analysis:
Analysis:
As math scores increase, science scores decrease. Using the given points in the
coordinate plane, a straight line downward to the right can be drawn that is best fitted
to all the points. Hence, the slope of the line is negative.
131
Scores
3 17
4 17
6 11
7 4
7 6
8 15
10 12
10 19
14 13
16 7
17 19
Analysis:
No discernable pattern can be formed from the given set of points. No single line
can be drawn that is best fitted to all point in the plane.
Computation of Correlation
132
(𝑛)(Ʃ𝑥𝑦) − (Ʃ𝑥)(Ʃ𝑦)
𝑟𝑥𝑦 =
√[(𝑛)(Ʃ𝑥 2 ) − (Ʃ𝑥)2 ][(𝑛)(Ʃ𝑦 2 ) − (Ʃ𝑦)2 ]
(10)(10 989) − (299)(333)
𝑟𝑥𝑦 =
√[(10)(10 355) − (299)2 ][(10)(11 961) − (333)2 ]
109 890 − 99 567
𝑟𝑥𝑦 =
√[103 550 − 89 401][119 610 − 110 889]
10 323
𝑟𝑥𝑦 =
√[(14 149)(8 721)]
10 323
𝑟𝑥𝑦 =
√123 393 429
10 323
𝑟𝑥𝑦 =
11 108.25949
𝑟𝑥𝑦 = 0.929308503
𝑟𝑥𝑦 = 0.93
Analysis:
The value of the correlation coefficient is rxy = 0.93, which means that there is a
very high positive correlation between the scores of 10 students in mathematics and in
science. This means that students who are good in mathematics are also good in science.
Another way of finding the correlation between two variables is the Spearman
rho correlation coefficient and it is denoted by a Greek letter rho (ƿ). The Spearman rho
correlation coefficient (ƿ) is a measure of correlation when the given sets of data are
expressead in ordinal level of measurement rather then raw scores as in Pearson r,
Spearman rho (ƿ) was first derived by a British psychologist by the name Spearman In
honor to him, the formula was named Spearman’s rho.
6Ʃ𝐷 2
ƿ = 1 − 𝑁(𝑁2−1)where,
ƿ = Spearman rho correlation coefficient value
D = difference between a pair of ranks
N = number of students/ cases
133
Example: Ten (10) aspirants of the Gabuyo Scholarship at YAG University were
rank in their mathematics scores and English scores. Solve the value of ƿ to the nearest
hundredths. The data is tabulated below:
Rank the scores in mathematics and the scores in science, find the difference of
each pair of scores and square the difference. Find the summation of D2 and solve for ƿ
value.
Solution:
6Ʃ𝐷2
ƿ=1−
𝑁(𝑁 2 − 1)
6(34)
ƿ=1−
10(102 − 1)
134
204
ƿ=1−
10(100 − 1)
204
ƿ=1−
10(99)
204
ƿ=1−
990
ƿ = 1 − 0.21
ƿ = 0.79
Analysis:
The ƿ value is 0.79, which indicates a high positive correlation between the
mathematics scores and science scores of ten aspirants in the Gabuyo Scholarship. The
students who are good in mathematics are also good in science.
CHAPTER 6
Learning Outcomes
1. Deine the following terms, validity, reliability, content validity, construct validity,
criterion-related validity, predictive validity, concurrent validity, test-retest
method, equivalent/ parallel method, split-half method, Kuder-Richardson
formula, validity coefficient, reliability coefficient;
2. Discuss the different approaches of validity;
3. Present and discuss the different methods of solving the reliability of a tests;
4. Identify the different factors affecting the validity of the test;
5. Identify the factors affecting the reliability of the test;
6. Compute the validity coefficient and reliability coefficient; and
7. Interpret the reliability coefficient and validity coefficient of the test.
INTRODUCTION
Test construction believed that every assessment tool should possess good
qualities. Most literatures consider the most common technical concepts in assessment
are the validity and reliability. For any type of assessment whether traditional or
authentic it should be carefully developed so that it may serve whatever purpose it may
have. In this chapter, we shall discuss the different ways of establishing validity and
establishing reliability.
VALIDITY OF A TEST
135
learning. This means that the appropriateness of score-based inferences or decisions
made are based on the students’ test results. It is extent to which a test measure what it
is supposed to measure.
When the assessment tool provides information that is irrelevant to the learning
objectives it was intended to helop, it makes the interpretation of the test result invalid.
Teachers must select use procedures, performance criteria, and settings to all forms of
assessment most especially performance-based assessment so that fairness to all
students is maintained. Assessing student’s performance on the basis of personal
characteristics rather than on the performance of the students lowers the validity of the
assessment.
Types of Validity
136
3. Construct Validity. A type of validation that refers to the measure of the extent
to which a test measure a theoretical and unobservable variable qualities such as
intelligence, math achievement, performance anxiety, and the like, over a period
of time on the basis of gathering evidence. It is established through intensive
study of the test or measurement instrument using convergent/divergent
validation and factor analysis.
a. Convergent validity is a type of construct validation wherein a test has a high
correlation with another test that measures the same construct.
b. Divergent validity is a type of construct validation wherein a test has low
correlation with a test that measures a different construct. In this case, a high
validity occurs only when there is a low correlation coefficient between the
tests that measure different traits.
c. Factors analysis is another methods of assessing the construct validity of a
test using complex statistical procedure conducted with different procedures.
There are other ways of assessing construct validity like test’s internal
consistency, development change and experimental intervention.
1. Validity refers to the decisions we make, and not to the test itself or to the
measurement.
2. Like reliability, validity is not an all-or-nothing concept; it is never totally abset
or absolutely perfect.
3. A validity estimate, called a validity coefficient, refers to specific type of validity.
It ranges between 0 and 1.
4. Validity can never be finally determined; it is specific to each administration of
the test.
137
8. Unintended clues
9. Improper arrangement of test items
VALIDITY COEFFICIENT
The validity coefficient is the computed value of the rxy. In theory, the validity
coefficient has values like the correlation that ranges from 0 to 1. In practice, most of the
validity scores are usually small and they range from 0.3 to 0.5, few exceed 0.6 to 0.7.
Hence, there is a lot of improvement in most of our psychological measurement.
Teacher Criterion xy x2 y2
Bejamine Test (y)
James Test (x)
12 16 192 144 256
22 25 550 481 625
23 31 713 529 961
25 25 625 625 625
28 29 812 784 841
30 28 840 900 784
33 35 1 155 1 089 1 225
42 40 1 680 1 764 1 600
41 45 1 845 1 681 2 025
138
37 40 1 480 1 369 1 600
26 33 858 676 1 089
44 45 1 980 1 936 2 025
36 40 1 440 1 296 1 600
29 35 1 015 841 1 225
37 41 1 517 1 369 1 681
Ʃx = 465 Ʃy = 508 Ʃxy = 16 702 Ʃx2 = 15 487 Ʃy2 = 18 162
(𝑛)(Ʃ𝑥𝑦) − (Ʃ𝑥)(Ʃ𝑦)
𝑟𝑥𝑦 =
√[(𝑛)(Ʃ𝑥 2 ) − (Ʃ𝑥)2 ][(𝑛)(Ʃ𝑦 2 ) − (Ʃ𝑦)2 ]
(15)(16 702) − (465)(508)
𝑟𝑥𝑦 =
√[(15)(15 487) − (465)2 ][(15)(18 162) − (508)2 ]
25 0530 − 236 220
𝑟𝑥𝑦 =
√[232 305 − 216 225][272 430 − 258 064]
14 310
𝑟𝑥𝑦 =
√[(16 080)(14 366)]
14 310
𝑟𝑥𝑦 =
√231 005 280
14 310
𝑟𝑥𝑦 =
15 198.85785
𝑟𝑥𝑦 = 0.941518
𝑟𝑥𝑦 = 0.94
Interpretation:
The correlation coefficient is 0.94, which means that the validity of the test is
high, or 88.36% of the variance in the students’ performance can be attributed to the
test.
RELIABILITY OF A TEST
Reliability refers to the consistency with which it yields the same rank for
individuals who take the test more than once (Kubiszyn and Borich, 2007). That is, how
consistent test results or other assessment results from one measurement to another.
We can say that a test us reliable when it van be used to predict practically the same
scores when test administered twice to the same group of students and with a reliability
index of 0.60 or above.
139
1. Length of the test
2. Moderate item difficulty
3. Objective scoring
4. Heterogeneity of the student group
5. Limited time
140
the internal consistency of a test is the KR-21 formula, which is no limited to test
items that are scores dichotomously.
RELIABILITY COEFFICIENT
k = number of items
p = proportion of the students who got the item correctly (index of difficulty)
q=1–p
k = number of items
x̅ = mean value
q=1–p
141
Interpreting Reliability Coefficient
1. The group variability will affect he size of the reliability coefficient. Higher
coefficient results from heterogeneous groups than from the homogeneous
groups. As group variability increases, reliability goes up.
2. Scoring reliability limits rest score reliability. If tests are scored unreliable error
is introduced. This will limit the reliability of the test scores.
3. Test length affects test score reliability. As the length increases, the test’s
reliability tends to go up.
4. Item difficulty affects test score reliability. As test items become very easy or
very hard, the test’s reliability goes down.
Reliability Interpretation
Coefficient
Above 0.90 Excellent reliability
0.81 – 0.90 Very good for a classroom test
0.71 – 0.80 Good for classroom test. There are probably few items needs to
be improved.
0.61 – 0.70 Somewhat low. The test needs to be supplemented by other
measures (more test) for grades.
0.51 – 0.60 Suggests need for revision of test, unless it is quite short (ten or
fewer items). Needs to be supplemented by other measures
(more test) for grading.
0.50 and Below Questionable reliability. This test should not contribute heavily to
the course grade, and it needs revision.
Let us discuss the steps in solving the reliability coefficient using the different
methods of establishing the validity and reliability of the given tests using the different
examples.
Student FT ST
1 36 38
2 26 34
3 38 38
4 15 27
5 17 25
6 28 26
7 32 35
8 35 36
9 12 19
142
10 35 38
Using the Pearson r formula, find the Ʃx, Ʃy, Ʃxy, Ʃx , Ʃy2.
2
Solution:
(𝑛)(Ʃ𝑥𝑦) − (Ʃ𝑥)(Ʃ𝑦)
𝑟𝑥𝑦 =
√[(𝑛)(Ʃ𝑥 2 ) − (Ʃ𝑥)2 ][(𝑛)(Ʃ𝑦 2 ) − (Ʃ𝑦)2 ]
(10)(9 192) − (274)(316)
𝑟𝑥𝑦 =
√[(10)(8 332) − 2742 ][(10)(10 400) − 3162 ]
𝑟𝑥𝑦 = 0.91
Analysis:
The reliability coefficient using he Pearson r = 0.91, means that is has a very high
reliability. The scores of the 10 students conducted twice with one-day interval are
consistent. Hence, the test has a very high reliability.
Note: Compute the reliability coefficient of the same date using Spearman rho
formula. Is the test reliable?
Example 2: Prof. Vinci Glenn conducted a test to his 10 students in his Biology
class two times after one-week interval. The test given after one week is the parallel
form of the test during the first time the test was conducted. Scores below were
gathered in the first test (FT) and second test or parallel test (PT). Using equivalent or
parallel form method, is the test reliable? Show the compute solution, using the Pearson
r formula.
Student FT PT
1 12 20
2 20 22
3 19 23
4 17 20
143
5 25 25
6 22 20
7 15 19
8 16 18
9 23 25
10 21 24
Using the Pearson r formula, find the Ʃx, Ʃy, Ʃxy, Ʃx2, Ʃy2.
(𝑛)(Ʃ𝑥𝑦) − (Ʃ𝑥)(Ʃ𝑦)
𝑟𝑥𝑦 =
√[(𝑛)(Ʃ𝑥 2 ) − (Ʃ𝑥)2 ][(𝑛)(Ʃ𝑦 2 ) − (Ʃ𝑦)2 ]
(10)(4 174) − (190)(216)
𝑟𝑥𝑦 =
√[(10)(3 754) − (190)2 ][(10)(4 724) − (216)2 ]
𝑟𝑥𝑦 = 0.76
Analysis:
The reliability coefficient using the Pearson r = 0.76, means that it has a high
reliability. The scores of the 10 students conducted twice with one-week interval are
consistent. Hence, the test has a high reliability.
Note: Compute the reliability coefficient of the same data using Spearman rho
formula. Is the test reliable?
Example 3: Prof. Glenn Lord conducted a test to his 10 students I his Chemistry
class. The test was given only once. The scores of the students in odd and even items
below were gathered, (O) odd items and (E) even items. Using split-half method, is the
test reliable? Show the complete solution.
144
20 23
18 22
19 25
26 24
20 18
18 17
2𝑟
Use the formula 𝑟𝑜𝑡 = 1+𝑟𝑜𝑒 to find the reliability of the whole test, find the Ʃx, Ʃy,
𝑜𝑒
Ʃxy, Ʃx2, Ʃy2 to solve the reliability of the odd and even test items.
(𝑛)(Ʃ𝑥𝑦) − (Ʃ𝑥)(Ʃ𝑦)
𝑟𝑥𝑦 =
√[(𝑛)(Ʃ𝑥 2 ) − (Ʃ𝑥)2 ][(𝑛)(Ʃ𝑦 2 ) − (Ʃ𝑦)2 ]
(10)(4 249) − (200)(211)
𝑟𝑥𝑦 =
√[(10)(4 096) − (200)2 ][(10)(4 533) − (211)2 ]
𝑟𝑥𝑦 = 0.33
2(0.33)
𝑟𝑜𝑡 =
1 + 0.33
0.66
𝑟𝑜𝑡 =
1.33
𝑟𝑜𝑡 = 0.50
Analysis:
145
The reliability coefficient using Brown formula is 0.50, which is questionable
reliability. Hence, the test items should be revised.
Example 4: Ms. Gauat administered a 40-item test in English for her Grade VI
pupils in Malanao Elementary School. Below are the scores of 15 pupils, find the
reliability using the Kuder-Richardson formula.
15(11 917)−(405)2
𝑠2 = 15(14)
146
178 755−164 025
𝑠2 = 210
14 730
𝑠2 = 210
𝑠 2 = 70.14
Ʃ𝑥
Mean = 𝑛
405
= 15
Mean = 27
40 27(40−27)
𝐾𝑅21 = 40−1 [1 − 40(70.14)
]
40 27(13)
= 39 [1 − 40(70.14)]
351
= 1.03 [1 − 2 805.60]
= 1.03[1 − 0.1251]
= 1.03[0.8749]
𝐾𝑅21 = 0.90
Analysis:
The reliability coefficient using KR-21 formula is 0.90 which means that the test
has a very good reliability. Meaning, the test is very good for a classroom test.
𝑘 1 − Ʃ𝑝𝑞)
𝐾𝑅21 = [ ]
𝑘−1 𝑠2
147
The first thing to do is to solve the difficulty index in each item and the variance of
the total scores.
𝑛
p= 0. 𝑁, where
q=1–p
Example 5: Mr. Mark Anthony administered a 20-item true or false test for his
English IV class. Below are the scores of 40 students. Find the reliability coefficient
using the KR-20 formula and interpret the computed value, solve also the coefficient
of determination.
Item x p q pq x2
Number
1 25 0.625 0.375 0.234375 625
2 36 0.9 0.1 0.09 1 296
3 28 0.7 0.3 0.21 784
4 23 0.575 0.425 0.244375 529
5 25 0.625 0.375 0.234275 625
6 33 0.825 0.175 0.144375 1 089
7 38 0.95 0.05 0.0475 1 444
8 15 0.375 0.625 0.234275 225
9 23 0.575 0.425 0.244375 529
10 25 0.625 0.375 0.234375 625
11 36 0.9 0.1 0.09 1 296
12 35 0.875 0.125 0.109375 1 225
13 19 0.475 0.525 0.249375 361
14 39 0.975 0.025 0.024375 1 521
15 28 0.7 0.3 0.21 784
16 33 0.725 0.175 0.144375 1 089
17 19 0.475 0.525 0.249375 361
18 37 0.925 0.075 0.069375 1 369
19 36 0.9 0.1 0.09 1 296
20 25 0.625 0.375 0.234275 625
578 3.38875 17 698
25
p of item 1 = 40 = 0.625
148
Note: Continue the same procedures up to the last item.
20 3.38875
𝐾𝑅20 = 20−1 [1 − ]
52.31
20
𝐾𝑅20 = 19 [1 − 0.06478]
20
𝐾𝑅20 = 19 [0.93522]
𝐾𝑅20 = [1.05263][0.93522]
𝐾𝑅20 = 0.9844
𝐾𝑅20 = 0.98
Interpretation:
The reliability coefficient using the KR20 = 0.98 means that it has a very high
reliability or excellent reliability.
= 0.9604
= 96.04%
Interpretation:
149
96.04% of the variance in the students’ performance can be attributed to the test.
CHAPTER 7
INTRODUCTION
One of the alternative method of rating the performance of the students aside
from paper and pencil test is the use of scoring rubrics or rubrics. Scoring rubrics are
used when judging the quality of the work of the learners on performance assessments.
It is a form of scoring guide that is used in evaluating the performance of students or
products resulting from the performance task. Scoring rubrics are very important in
assessing the performance of students using performance-based assessment and
portfolio assessment. In this chapter we shall discuss scoring rubrics, performance-
based assessment and portfolio assessment.
SCORING RUBRICS
One common used of rubrics is when the teachers evaluate the quality of an
essay. The judgment of one evaluator differs from others when there are no criteria to
be followed. One evaluator might put much weight in the content of the topic or one
evaluator might give high mark on the organization aspect of the paper. If we are going
to evaluate the quality of an essay, it must have to have a combination of these factors.
In this case the evaluators judge the paper subjectively, to avoid such case the evaluator
must develop a predetermined criterion for evaluation purposes so that the subjectively
of evaluating is lessened or it becomes more objective.
Type of Rubrics
In this section, we shall discuss the two types of rubrics: the holistic rubric and
the analytic rubric.
Holistic rubrics is a type of rubrics that requires the teacher to score the overall
process or product as a whole (Nitko, 2001; Mertler, 2001). In this case, the evaluator
views the final product as a set of interrelated tasks contributing to the whole. Using
holistic rubric in scoring the performance or product of the students provides overall
impression on the ability of any given product. Some of the advantages are quick scoring
150
and provides overview of students’ performance. However, it does not provide detailed
feedback about the performance of the students in specific criteria.
A teacher can use holistic rubric when he wants a quick snapshot of the
performance of the students. A single dimension is already adequate to define the
quality of the performance of the students.
The teacher can use analytic rubric when he wants to see the relative strengths
and weaknesses of the students’ performance in each criterion, a detailed feedback and
assess complicated performance, and also if the teacher wants the students to conduct
self-assessment on their understanding about their performance.
Mertler (2001) suggested the different steps in developing rubrics used in the
assessment of performances, process, products or both process and product, for
classroom use, in his article “Designing Scoring Rubrics for Your Classroom.” The
information for these procedures was compiledfrom various sources (Airasian, 2000 &
2001; Montagomery, 2001); Nitko, 2001, Tombari&Borich, 1999). The steps were
summarized and discussed, followed by presentations of two sample scoring rubrics.
1. Reexamine the learning objectives to be addressed by the task. This allows you to
match your scoring guide with your objectives and actual instruction.
151
2. Identify specific observable attributes that you want to see (as well as those you
don’t want to see) your students demonstrate in their product, process, or
performance. Specify the characteristics, skills, or behaviors that you will be
looking for, as well as common mistakes you do not want to see. The teacher
must carefully identify the qualities that need to be displayed in the student’s
work to demonstrate proficient performance.
3. Brainstorm characteristics that describe each attribute. Identify ways to describe
above average, average, and below average performance for each observable
attribute identified in step 2.
For holistic rubrics, write throughnarrativedescription for excellent work
and poor work incorporating each attribute into the description. Describe the
highest and lower levels of performance combining the description for all
attribute.
For analytic rubrics, write through narrative description for excellent work
and poor work for each individual attribute. Describe the highest and lowest
levels of performance using the descriptors for each attribute separately.
For holistic rubrics, complete the rubric by describing other levels on the
continuum that ranges from excellent to poor work for the collective attribute.
Write descriptions for all intermediate levels of performance.
For analytic rubrics, complete the rubric by describing other levels on the
continuum that ranges from excellent to poor work for each attribute. Write
descriptions for all intermediate levels of performance for each attribute
separately.
4. Collect samples of student work that exemplify each level. These will help you
score in the future by serving as benchmarks.
5. Revise the rubric, as necessary. Be prepared to reflect on the effectivenesss of the
ruric and revise it prior to its next implementation.
Rubrics
152
Metler (2001) in his article “Designing Scoring Rubrics for Your classroom”
suggested template for holistic rubrics and analytic rubrics.
153
of performance
Samples of holistic rubric and analytic rubric are presented below adapted from
various authors and websites.
The following are examples of holistic rubrics in assessing persuasive essay and
invention report adapted from a leading author in rubric Heidi Goodrich Andrade
(1997).
Criteria Quality
4 3 2 1
Make a claim I make a claim I make a claim I make a claim I do not make a
and explain but don’t but it is buried, claim.
why it is explain why it confused, or
controversial. is unclear.
controversial.
Give reasons I give clear and I give reasons I give 1 or 2 I do not give
in support of accurate in support of reasons which convincing
the claim reasons in the claim, but don’t support reasons in
support of the overlook the claim well, support of the
claim. important and/ or claim.
reasons. confusing
reasons.
Consider I thoroughly I discuss I acknowledge I do not
reasons discuss reasons reasons against that there are reasons against
against the against the claim, but leave reasons against the claim
claim claim and out important the claim but
explain why the reasons and/or don’t explain
claim is valid don’t explain them.
anyway. why the claim
still stands.
Relate the I discuss how I discuss how I say that I do not
claim to democratic democratic democracy and mention
democracy principles and principles and democratic democratic
democracy can democracy can principles are principles or
be used both in be used to relevant but do democracy.
support of and support the not explain
against the claim. how or why
claim. clearly.
Organization My writing is My writing has My writing is My writing is
well organized, a clear usually aimless and
has a beginning, organized but disorganized.
compelling middle and sometimes gets
opening, strong end. I generally off topic. Has
informative use appropriate several errors
body and paragraph in paragraph
154
satisfying format. format.
conclusion.
Has
appropriate
paragraph
format.
Word choice The words I use I use mostly My words are I use the same
are striking nut routine words. dull, uninspired words over and
natural, varied or they sound over and
and vivid. like I am trying over…. Some
too hard to words may be
impress. confusing.
Sentence My sentences I wrote well- My sentences Many run-ons,
fluency are clear, constructed but are often flat or fragments and
complete and routine awkward. awkward
of different sentences. Some run-ons phrasings make
lengths. and fragments. my essay hard
to read.
Conventions I use first My spelling is Frequent Many errors in
person form, correct on errors are grammar,
and I use common distracting to capitalization,
correct words. Some the reader but spelling and
sentence errors in do not interfere punctuation
structure, grammar and with the make my paper
grammar, punctuation. I meaning of my hard to read.
punctuation need to revise paper.
and spelling. it again.
Criteria Quality
4 3 2 1
The report The report The report The report
explains the explains all of explains some does not refer
key purposes of the key of the purposes to the purposes
Purposes the invention p0urposes of of the invention of the
and points out the invention. but misses key invention.
less obvious purposes.
ones as well.
The report The report The report The report
details both details the key neglects some does not detail
key and hidden features of the features of the the features of
Features
features of the invention and invention or the the invention
invention and explains the purposes they or the purposes
explains how purposes they serve. they serve.
155
they serve serve.
several
purposes.
The report The report The report The report
discusses the discusses the discusses either does not
strengths and strengths and the strengths or mention the
weaknesses of weaknesses of weaknesses of strengths or
Critique he invention, the invention. the invention the weaknesses
and suggests but not both. of the
ways in which invention.
it can be
improved.
The report The report The report The report
makes makes makes unclear makes no
appropriate appropriate or connections
connections connections inappropriate between the
between the between the connections invention and
Connections purposes and purposes and between the other things.
feature of the features of the invention and
invention and invention and other
many different one or two phenomena.
kinds of phenomena.
phenomena.
PERFORMANCE-BASED ASSSESSMENT
156
students that represent a set of strategies for the application of knowledge, skills,
and work habits through the performance of tasks that are meaningful and engaging to
students (Hibbard, 1996) and (Brualdi, 1998) in her article “Implementing Performance
Assessment in the Classroom,” From the definitions of the two well-known authors,
students are required to perform a task rather than select an answer from a given list of
options. It also provides teacher information about how the students understand and
apply knowledge and allow the teacher to integrate performance assessment in the
instructional process to provide additional learning activities for the students in the
classroom.
Paper and pencil test measures learning indirectly. When measuring factual
knowledge and when solving well-structured mathematical problems, it is better to use
paper and pencil test. In this case, teacher asked question which indicates skills that
have been learned or mastered. Usually assessed low level thinking skills, or beyond
recall level. While performance-based assessment is a direct measure of learnig or
competence. This indicates that cognitive complex outcomes, affective and psychomotor
skills have been mastered. Examples of performances that can be judged or rated
directly by the evaluators are preparing a microscope slide in laboratory class,
performing gymnastics or a dance in a physical education class, cooking demonstration,
diving in a swimming class. In these kinds of activities, the teacher observes and rates
the students based from their performances. The teacher or evaluator provides
feedback immediately on how the students performed to carry out their performance
task.
PORTFOLIO ASSESSMENT
The portfolio should represent a collection of students’ best work or best efforts,
student-selected samples of work experiences related to outcomes being assessed, and
documents according to growth and development towards mastering identified
outcomes.
157
A portfolio (national education Association, 1993) is a record of learning that
focuses on the student’s work and her/his reflection on that work. Material is collected
through a collaborative effort between the student and staff members and is indicative
of progress towards the essential outcomes.
PART II
The second part of this book is a summative assessment. The questions serve as
reviewer in preparation for the Licensure Examination for Teacher (LET), which are all
applications of the concepts in “Assessment of Learning” or Summative Assessment.
Direction: Write the letter of the correct answer before the number. Write the
letter E if the correct answer is not among the options. No erasures.
____1. Teacher Marivic discovered that her students are weak in sentence construction.
Which test should teacher Marivic administer to determine what other skills(s)
her pupils are weak?
A. Placement Test
B. Formative Test
C. Diagnostic Test
D. Summative Test
158
____2.Teacher Christopher will construct a periodic exam for his Algebra subject. Which
of the following should he consider first?
____3. Which of the test item is most appropriate to attain Teacher Karl’s lesson
objective “multiply fractions and reduce the product to lowest term”?
15 2
c. 45 d. 6
3 2
D. The sum of 5 and 3 is ____.
4 19 5 5
a. 15 b. 15 c. 15 d.8
____4. “Group the following items according to order” can be classified as what type of
question _____?
A. Evaluating
B. Generalizing
C. Classifying
D. Inferring
____5. Which of the following test format does NOT belong to the group?
A. Short answer
B. Multiple choice
C. True or false
D. Matching type
____6. The result of National Assessment Test (NAT) are interpret against a set of
mastery level. This means that NAT is categorized as ____ test.
I. Criterion-referenced
II. Norm-referenced
A. Criterion-referenced only
B. Norm-referenced only
C. Either criterion-referenced or norm-referenced
159
D. Neither criterion-referenced nor norm-referenced
____7. Using statement I to IV, which of the following is NOT true about matching type of
test?
____9. Teacher Ace constructed a matching type test. In his column of descriptions are
combinations of presidents, senators, cabinet members, current issues, and
sports. Which rule of constructing a matching type of test was NOT followed?
____10. Which of the following statement is TRUE when standard deviation is large?
____11. When teacher Gerald used the table of specification in constructing his periodic
test, which of the following characteristics of a good test will be assured to his students?
A. Administrability
160
B. Construct Validity
C. Content Validity
D. Reliability
____12. Teacher Luis wants to test his students’ ability to speak extemporaneously,
which of the following is the most valid assessment tool?
A. Table of Skewness
B. Table of Specifics
C. Table of Species
D. Table of specification
____15. Given the scores: 94, 83, 83, 91, 94, 86, 80, 82, 81, 83, 85. What does the score 83
in the distribution mean /s?
A. Median
B. Mean and mode
C. Mode only
D. Median and mode
____16. Read the sample test item below and answer the question that follows:
A. Infancy
B. Preschool period
C. Before adolescence
D. During adolescence
E. After adolescence
161
What makes the test item poor?
____18. Which of the following can diagnose more weaknesses of the students?
A. Portfolio assessment
B. Traditional assessment
C. Performance assessment
D. Analytic rubric
A. Peter was able to got 90 items correctly out of 100 items in mathematics.
B. Fitch performed better in the test in mathematics than 88% of his classmates.
C. Fitch was able to solve **% of the problem correctly.
162
D. Glenn solved 9 problems out of 15 problems correctly.
____22. Scores of 8 students were: 86, 78, 89, 90, 88, 98, 95, 88. What is the mean value?
A. 87
B. 88
C. 89
D. 90
____24. Which of the following statement best described the performance of the students
when their scores are negatively skewed?
____25. Teacher Renzel conducted item analysis is his examination in Mathematics. The
facility index of item number 6 is 0.65. What does item number 6 mean?
A. Moderately difficult
B. Easy
C. Difficult
D. Very difficult
____26. The discrimination index of a test item is -0.25. What does this mean?
A. More students in the lower group got the item correctly than those in the
upper group/
B. More students in the upper group got the item correctly than those in the
lower group.
C. The number of students in the lower and upper groups who got the item is
equal.
D. More students from the upper group got the item incorrectly.
____27. Teacher Jhonson gave a test in English. Most of the students got scores above the
mean. What is graphical representation of their scores?
163
B. Skewed to the left
C. Mesokurtic
D. Normally distributed
____28. Teacher Dominic give a 50-item test in English. The mean performance of the
group is 27 and the standard deviation is 5. Franz obtained a score of 31. Which
of the following best described his performance.
A. Below average
B. Average
C. Above average
D. Outstanding
____29. The supervisor is talking about ‘grading on the curve” in a district meeting. What
does this expression mean?
____30. Joseph’s score is within 𝑥̅ ± 1 𝑆𝐷. To which of the following groups does he
belong?
A. Below average
B. Average
C. Needs improvement
D. Above average
____31. The computed r = 0.93 for scores in English and Math. What does this mean?
____32. Teacher Kristy conducted an item analysis for her test questions in English. She
found out that item number 10 has a difficulty index of 0.45 and a discrimination index
of 0.37. What should teacher Kristy do with item number 10?
164
____33. About how many percent of the scores fall between -2SD and +2SD units of its
mean?
A. 34%
B. 68%
C. 95%
D. 99%
____34. Which of the following statement best describe skewed score distribution?
A. sd = 1.5
B. sd = 1.65
C. sd = 1.75
D. sd = 2.0
____36. Mark’s raw score in the TLE class is 93 which equals to 96th percentile. What
does this imply?
____37. Which type of assessment is most appropriate for assessing learning difficulties?
A. Formative assessment
B. Placement assessment
C. Summative assessment
D. Diagnostic assessment
A. Portfolio
B. Completion test
165
C. True or false test
D. Multiple-choice test
A. Authentic assessment
B. Numerical grading
C. Grading sheet
D. Scoring rubric
A* B C D
Upper 27% 12 5 8 10
Lower 27% 9 6 12 8
A. Option B
B. Option C
C. Option D
D. Option A
____42. The table shows that as a result of the analysis the test item ____.
____43. Based on the table in situation 1, which of the options should be revised?
A. Options B
B. Option C
C. Option D
D. Option A
A. Very easy
B. Easy
C. Moderately easy
166
D. Difficult
A. Lower group
B. Upper group
C. Could not be determined, data are insufficient
D. None of the above
____46. In which subject(s) did Angel perform most poorly in relation to the group’s
mean performance?
A. English
B. PE
C. Music
D. Mathematics
A. Bodily kinesthetic
B. Logical
C. Linguistic
D. Musical
A. English
B. PE
C. Music
D. Mathematics
Situation 3.For item 49-50. Read and analyze the matching type of test given
below.
Direction:Match column A with column B. Write only the letter of your answer
at the line on the left
Column A Column B
167
____1. December 25 A. Consider the 8th wonder of the world
____6.Bnaue Rice Terraces F. The President of the Philippines who served the
longest
____52. Teacher Vinci wants to test his student’s ability to formulate ideas. Which type of
test should he develop?
168
B. Dependent on the batch of examinees
C. Affected by skewed distribution
D. Not affected by skewed distribution
A. The test item could not discriminate between the lower and upper groups.
B. More from the upper group got the item correctly
C. More from the lower group got the item correctly
D. The test item has low reliability
____57. Teacher Vince is conducting a test, not one examinees approached him for
classification on what to do. Which characteristic of a good test is applied?
A. Fairness
B. Objectivity
C. Administrability
D. Clarity
____58. Teacher Marie wanted to teach her pupils folk dancing. Her check-up test was a
written test on the steps of folk dance. What characteristics of good test does it lack?
A. Objectivity
B. Comprehensiveness
C. Validity
D. Reliability
____59. Teacher Mark Angelo used the table of specifications when he constructed his
periodic test, the students can be assumed that the test has ____.
A. Clarity
B. Content validity
169
C. Relevance
D. Reliability
____60. Which is the most reliable tool for seeing the development in a student’s ability
to sing?
A. Performance assessment
B. Self-assessment
C. Scoring rubric
D. Portfolio assessment
____61. In which competency did Teacher Grace students’ find more easy? In the item
with a difficulty index of ____.
A. 0.31
B. 0.91
C. 0.55
D. 1.0
____62. The criterion of success in Teacher Harold’s objectives is that “the students must
be able to get 85% of the items correctly.” Ana and 24 others got 36 items correctly out
of 50. This means that Teacher Harold:
____63. The discrimination index of item #16 is -0.25. what does this imply?
A. An equal number from the lower and upper group got the item correctly.
B. More from the upper group got the item correctly.
C. More from the lower group got the item correctly.
D. More from the upper group got the item wrong.
____64. The discrimination index of item #18 is +0.35. What does this mean?
____65. The discrimination index of item #20 is 0. What does this mean?
170
C. More from the upper group got the item correctly.
D. More from the upper group got the item wrong.
A. It is a measure of variability.
B. It is the most stable measure of central tendency.
C. It is appropriate when there are extreme scores.
D. It is significantly affected by extreme values.
A. Median
B. Mode
C. mean
D. mode and median
____68. Here is a score distribution: 88, 85, 84, 83, 80, 75, 75, 73, 56, 55, 51, 51, 51, 34,
34, 20. Which of the following best describes the distribution?
A. Bimodal
B. Multimodal
C. Unimodal
D. Cannot be determined
____70. A test item has a difficulty index of 0.85 and discrimination index of -0.10. What
should the teacher do?
____71. Which measure(s) of central tendency is (are) most appropriate when the score
distribution is skewed?
A. Mode
B. Mean and median
C. Median
D. Mean
171
____72. In a one hundred-item test, what does Gil’s score of 70 mean?
A. Mean deviation
B. Median
C. Mode
D. Mean
____74. The sum of all the scores in a distribution is always equal to:
A. Mode
B. Mean
C. Median
D. Mean and median
____76. Study the table below then answer the question that follows.
A. In the interval 40 to 49
B. In between the intervals of 10-19 and 20-29
C. In the interval 30-39
D. In the interval 50-59
172
____77. Using data in #76, how many percent of the students got a score above 39?
A. 10%
B. 13%
C. 39%
D. 53%
____78. Robert’s raw score in the mathematics class is 45 which equals to 96th percentile.
What does this mean?
A. 25th percentile
B. 45th percentile
C. 70th percentile
D. 75th percentile
____82. Karla Marie obtained a NEAT percentile rank of 98. This means that:
173
C. He surpassed in performance of 95% of his fellow examinees.
D. He surpassed in performance of 5% of his fellow examinees.
____84. Mark Erick is 2.5 standard deviation above the mean of his group in Math and 1.5
standard deviation above in English. What does this imply?
A. The lower the standard deviation the more spread the scores are.
B. The higher the standard deviation the less the scores spread.
C. The higher the standard deviation the more the spread the scores.
D. It is a measure of central tendency.
____86. Which group of scores is most varied? The group with ____.
A. sd = 1
B. sd = 2
C. sd = 3
D. sd = 4
A. Quartile deviation
B. Quartile
C. Correlation
D. Skewness
____88. Study the two sets of scores below and answer the question that follows:
A. The scores in set A are more spread out than those in set B.
B. The range for set B is 46.
C. The range for set A is 47.
D. The scores in set b are more spread out than those in set A.
174
____89. Skewed score distribution means:
A. Mesokurtic
B. Skewed to the right
C. Skewed to the left
D. Normally distributed
____91. Most students who took the examination got scores above the mean.
A. Normal curve
B. Playkurtic
C. Positively skewed
D. Negatively skewed
____93. Which of the following method is questionable due to practice and familiarity is
establishing reliability of the test?
A. Split-half
B. Parallel form
C. Test-retest
D. Kuder-Richardson
____94. Which assessment activity is most appropriate to measure the objective “to
explain the meaning of molecular bonding” for the group with strong interpersonal
intelligence?
A. Write down chemical formulas and show how they were derived.
B. Build several molecular structures with multicolored pop beads.
C. Draw diagram that show different bonding patterns.
175
D. Demonstrate molecular bonding using students as atoms.
____95. Which is the most reliable tool for detecting the development in your pupils’
ability to write?
A. Objective assessment
B. Self-assessment
C. Scoring rubric
D. Portfolio assessment
A. Objectivity
B. Scorability
C. Administrability
D. Reliability
____97. In which competency did my students find the greatest difficulty? In the item
with a difficulty index of ____.
A. 0.1
B. 0.9
C. 1.0
D. 0.5
____99. The discrimination index of test item is +.45. What does this mean?
____100. Test item has a difficulty index of .60 and discrimination index of .40.
176
C. Retain the item.
D. Make it a bonus item and reject it.
____101. When Teacher Grace conducted an item analysis, she found out that a
significantly greater number from the upper group of the class got test item number 5
correctly. This means the test item.
____103. The discrimination index of a test item is 0. What does this mean?
A. The test item could not discriminate between the lower and upper.
B. More from the upper group got the item correctly.
C. More from the lower group the item correctly.
D. The test item has low reliability.
____105. A test item has difficulty index of 0.91 and a discrimination index of -0.20. What
should the teacher do?
____106. The computed r for scores in Math and Filipino is -0.43. What does this mean?
177
D. Filipino scores are slightly related to Math scores.
____107. The computed r for scores in English and Science is 0.66. What does this mean?
____108. The scatter gram of two variables are spread evenly in all direction, this means
that:
____109. Teacher Renzel found out that there is a negative correlation between the
scores in English and in Mathematics. What does this mean?
A. Direct evidence
B. Performing a task
C. Contrived
D. Real-life
____111. A short quiz conducted by Teacher Benjamin James to get feedback on how
much the students learned but will not be used for grading purposes is classified as a
____.
A. Diagnostic assessment
B. Placement assessment
C. Summative assessment
D. Formative assessment
____112. Teacher BJ set a 90% accuracy in a 25-item spelling test. Nike obtained a score
of 88% and this can be interpreted as ____.
178
B. He did not meet the set criterion by 2%.
C. He is higher than 88% of the group.
D. He is 2% short of the set percentile score.
____113. Teacher A conducted a test at the end of a lesson to find out if the objectives of
her lesson has been attained. Which of the following type of assessment must be
dministered?
A. Formative assessment
B. Diagnostic assessment
C. Norm- assessment
D. Criterion-referenced
A. Performance test
B. Norm-referenced test
C. Professional test
D. Criterion-referenced test
____115. A certain university wanted an entrance examination that can identify future
outcomes or differences such as who will graduate from college or who will drop out.
The test has ____.
A. Predictive validity
B. Content validity
C. Construct validity
D. Concurrent validity
I. Objectivity
II. Validity
III. Scorability
IV. Reliability
A. I, II, IV
B. II, IV
C. I, II, III
D. I, II, III, IV
____117. If one wants to establish the reliability his test. Which of the following will he
do?
____118. Which of the following is the main purpose of administering a pre-test and
post-test to the students?
____120. Teacher Benjie give a 50-item test where the mean performance of the group is
40 and the standard deviation is 4. James obtained a score of 37. Which of the following
best describe his performance?
A. Below average
B. Average
C. Above average
D. Outstanding
____121.The discrimination index of a test item is 0.39. What does this imply?
A. More students in the lower group got the item correctly than those students
in the upper group.
B. More students in the upper group got the item correctly than those students
in the lower group.
C. The number of students in the lower group and upper group who got the
item is equal.
D. More students from the upper group got the item incorrectly.
____122. Teacher Nike constructed a matching type test. In his column of descriptions
are combinations of dates of evens, current issues, and sports. Which rule of
constructing a matching type of test was NOT followed?
____123. Which of the following is/ are true about matching type of test?
____124. Teacher X discovered that his students are weak in solving age problems.
Which test should Teacher X administer to further determine in which other skill(s) his
pupils are weak?
A. Placement assessment
B. Diagnostic assessment
C. Formative assessment
D. Summative assessment
____125. Teacher May conducted an item analysis of her periodic test. She found out that
item number 6 is non-discriminating. What does this it imply?
I. The item is very difficult and nobody got the correct answer.
II. The instruction is effective.
III. The item is very easy and everybody got the correct answer.
A. I only
B. I and II
C. I and III
D. III only
181
D. II and III
____127. Teacher JR conducted an item analysis of his periodic test. He found out that
item number 16 has a difficulty index of 0.41 and discriminating index of 0.36. What
should teacher JR do with item number 16?
A. 88.46
B. 88.50
C. 88.80
D. 89.00
____129. About how many percent of the cases fall between -2SD to +2SD units from the
mean?
A. 68.26%
B. 95.44%
C. 99.72%
D. 99.98%
A. Formative assessment
B. Placement assessment
C. Summative assessment
D. Diagnostic assessment
____131. Most of the students got scores above the mean. What would be the graphical
representation of their scores?
A. Normally distributed
B. Skewed to the right
C. Negatively skewed
D. Positively Skewed
182
A. Range
B. Quartile deviation
C. Mean deviation
D. Standard deviation
____135. Teacher A is talking “grading on the curve” in a district meeting. What does
“grading on the curve” mean?
____136.Meryll’s raw score in the English class is 95 which equals to 98th percentile.
What does this mean?
____137. Which of the following statement is/ are important in developing a scoreing
rubrics?
183
C. I, II, III
D. I, II, III, IV
A. It is analytical.
B. It is developmental.
C. It is holistic.
D. Neither analytical nor holistic.
____139. Which is the most reliable tool for seeing the development in your pupils’
ability to write?
A. Summative assessment
B. Performance-based assessment
C. Self-evaluation
D. Portfolio assessment
____140.The most appropriate tool to measure the performance in terms of how far a
score is above or below he mean or average.
A. Standard scores
B. Norm-reference
C. Criterion-reference
D. Raw scores
____141. Which of the following statement is the main purpose when a teacher uses a
standardized test?
184
____143. Which is the most important about portfolio and performance-based
assessment?
A. Authentic assessment
B. Numerical grading
C. Letter grading
D. Scoring rubric
____144. Which of the following method is questionable due to practice and familiarity in
extablishing reliability of the test?
A. Split-half
B. Parallel form
C. Test-retest
D. Kuder-Richardson
____146. This is the preplanned collection of sample of student works, assessed results
and other output produced by the students.
A. Diary
B. Portfolio
C. Observation report
D. Anecdotal report
____147. Teacher Fitch Peter will construct a periodic test for his Biology subject. Which
of the following he will need to accomplish first?
____148. What type of error was committed by a researcher if he commit a type 1 error?
185
____149. Teacher Luis wants to test his student’ ability to speak extemporaneous, which
of the following is the most valid assessment tool?
____150. Which of the following statement DOES NOT describe the present grading
system of elementary and secondary public schools?
A. The lowest possible failing grade appeared in the report card is 65%.
B. Student must master at least 75% of the competency per subject.
C. Use transmutation table in the computation of percentage.
D. Averaging method is utilized in the computation of final grade.
186