Module 3 Principles of High Quality Assessment

Report by: Mendez Maricel B.
Subject: Assessment of Student

Learning
Module 3 Principles of High Quality Assessment

Teacher-Made Tests
Most of the tests the students take are teacher-made tests. It means that teachers design them.
These tests are associated with the grades on reports cards. It is prepared by teachers to assess their
student learning.
Tests should be balanced among the ff:
a. Short answer/paragraph answer
b. Words/pictures/maps/diagrams/etc.
c. Easy/difficult questions
d. Factual knowledge/application of knowledge
e. Knowledge/skills
As much as possible, test questions should be given within a meaningful context.
Learning Targets
 State clearly what the child will be learning in all subject areas, these include Reading,
Language Arts, Mathematics, Science, Social Studies, Music, physical education, health, art
and School Counseling.
2 Set of Guiding Principles of High quality Assessment
1. Students are the key assessment users.
2. Clear and appropriate targets are essential.
In achieving clarity of learning targets, teachers must offer intentional teaching. Intentional teaching
means that all instruction and classroom activities are aimed at specific learning targets. Hence, teachers
need to begin with well-defined targets to be able to develop assessments that:
1. Reflect exactly what is taught.
2. What students are expected to learn.
There are many benefits based on the existence of learning that are CLEAR and USABLE in
order to build clear learning targets.
1. Knowledge targets begin with words like: Know, list, name identity and recall.
Procedural knowledge targets call for knowing how to do something.
Ex: use scientific notation to represent very large and very small numbers.
1. Reasoning targets deal with the skillful use or application of knowledge. These targets start
out with mental processes like: predict, infer, classify, hypothesize, compare, conclude,
summarize, analyze, evaluate and generalize.
4 Types of Reasoning
a. Inductive reasoning uses specific facts or evidence to infer general conclusions.
b. Deductive reasoning begins with a general rule or principle to infer specific conclusions or
solutions.
c. Analytical reasoning requires examining components or structure of something. It is used
in almost every discipline in identifying parts and describing relationships among them.
d. Comparative reasoning describes similarities and differences between two or more items.
2. Performance skills targets require the students to demonstrate their mastery of a learning
target to be observed
3. Product targets are not used as frequently as other types but are highly valued, calling for
creation of a product
4. Dispositional targets rarely show up on state standards but are important because they reflect
students attitudes about school and learning.
Establishing Learning Targets
Educational Goals
These are very general statements of what students will know and be able to do. Typically they
are written to cover large blocks of instructional time, such as a semester or a year. They provide a
starting point for more specific learning objectives.
Mathematics Examples
1. Students will learn to use mathematics to define and solve problems.
Science Examples
1. Students will learn to apply scientific research methods to investigate research
questions.
Educational Learning Objectives
These are more specific statements of what students will know and be able to do. They should
be written at the appropriate level of generality- not too general that they do not provide instructional
guidance, but not too specific so that they are not too consuming and confining. It is best to focus on a
unit of instruction, as opposed to a daily lesson plan, as the important learning that you want students to
learn, in relation to your goal.
Types of learning Targets
Marzano and Kendall identified 5 types of learning targets:
1. Knowledge and simple understanding.
2. Deep understanding and reasoning.
3. Skills.
4. Products.
5. Affective
Sources of Learning Targets
1. Bloom’s Taxonomy
2. National, Regional and District Standards
3. Textbooks
Bloom’s Taxonomy (The 3 Types of Learning)
There is more than one type of learning. A committee of colleges, led by Benjamin Bloom,
identified 3 domains of educational activities:
1. Cognitive – mental skills (knowledge)
2. Affective – growth in feelings or emotional areas (attitude)
3. Psychomotor – manual or physical skills (skills)
Learning outcomes in the cognitive domain are divided into two major classes:
1. Knowledge
2. Intellectual abilities and skills
The cognitive domain of the taxonomy is useful in planning the achievement test. It focuses on
comprehensive mental processes when identifying learning outcomes.
Stating General Learning Outcomes
The learning outcomes to be measured are useful in test construction when they are stated as
terminal performance which are observe. It means that they should clearly indicate the student
performance to be demonstrated at the end of the learning experience.
Quellmalz (1987) presented list of learning outcomes which are to define in terms of
specific learning outcomes:
At the end of this unit in achievement test planning, the student will demonstrate
that he or she:
1. Knows the meaning of common terms;
2. Knows specific task about test planning;
3. Knows the basic procedures for planning an achievement test;
4. Comprehends the relevant principles of testing; and
5. Applies the principles in test planning.
The next task for the teacher is to list specific types of specific performance that
are to be accepted as evidence that the outcomes have been achieved. The teacher
has to ask himself what specific types of performance will show that a student knows
the meaning of common terms.
1. Knows the meaning of common terms
a. Identify the correct definitions of terms
2. Comprehends the relevant principles of testing
b. Describes each principle in his or her own words
Cognitive Domain
The cognitive domain involves knowledge and the development of intellectual skills. This includes
the recall or recognition of specific facts, procedural patters, and concepts that serve in the development
of intellectual abilities and skills.
6 major categories:
1. Knowledge. Recall data or information.
Ex: Recite a policy.
2. Comprehension. Understand the meaning, translation, interpolation, and interpretation of
instructions and problems. State a problem in ones’ words.
Ex: Rewrite the principles of test writing.
3. Application. Use a concept in a new situation or unprompted use of an abstraction. Applies
what was learned in the classroom into novel situations in the work place.
Ex: Apply laws of statistics to evaluate the reliability of a written test.
4. Analysis. Separates materials or concepts into component parts so that its organizational
structure may be understood. Distinguishes facts and inferences.
Ex: Gather information from a department and selects the required tasks for training.
5. Synthesis. Builds a structure or pattern from diverse elements. Put parts together to form a
whole, with emphasis on creating a new meaning or structure.
Ex: Write a company operations or process manual.
6. Evaluation. Make judgments about the value of ideas or materials.
Ex: Select the most effective solution.
Affective Domain
This domain includes the manner in which we deal with things emotionally, such as feelings,
values, appreciation, enthusiasms, motivations and attitudes.
5 major categories:
1. Receiving phenomena. Awareness, willingness to hear and selected attention.
Ex: Listen to others with respect.
2. Responding to phenomena. Active participation on the part of learners.
Ex: participate in class discussions.
3. Valuing. The worth or value a person attaches to a particular object, phenomenon, or
behavior. This ranges from simple acceptance to the more complex state of commitment.
Ex: Show the ability to solve problems.
4. Organizations. Organizes values into priorities by contrasting different values, resolving
conflicts between them, and creating unique value system. The emphasis is on comparing,
relating and synthesizing values.
Ex: Explain the role of systematic planning in solving problems.
5. Internalizing Values (characterization). Has a value system that controls their behavior. The
behavior is pervasive, consistent, predictable and most importantly, characteristic of the
learner. Instructional objectives are concerned with the student’s general patters of
adjustment (personal, social, emotional).
Psychomotor Domain
It includes physical movement, coordination and use of the motor-skill areas. Development of
these skills requires practice and is measured in terms of speed, precision, distance, procedures or
techniques in execution.
7 major categories:
1. Perception. The ability to use sensory cues to guide motor activity. This ranges from sensory
stimulation, through cue selection to translation.
Ex: Adjust heat of stove to correct temperature by smell and taste of food.
2. Set. Readiness to act. It includes mental, physical and emotional sets. These three sets are
dispositions that predetermine a person’s response to different situations (sometimes called
mindsets)
Ex: Recognize one’s abilities and limitations.
3. Guided Response. The early stages in learning a complex skill that includes imitation, trial and
error. Adequacy of performance is achieved by practicing.
Ex: Perform a mathematical equation as demonstrated.
4. Mechanism. This is the intermediate stage in learning a complex skill. Learned responses
have become habitual and the movements can be performed with some confidence and
proficiency.
Ex: Use a personal computer.
5. Complex Overt Response. The skill full performance of motor acts that involve complex
movement patters. Proficiency is indicated by a quick, accurate and highly coordinated
performance, requiring a minimum of energy.
Ex: Operate a computer quickly and accurately.
6. Adaptation. Skills are well developed and the individual can modify movement patters to fit
special requirements.
Ex: Respond effectively to unexpected experiences.
7. Origination. Creating new movement patters to fit a particular situation or specific problem.
Learning outcomes emphasize creativity based upon highly developed skills.
Ex: Construct a new theory.
Learning targets for Performance Assessment
3 types of standards:
1. Subject area
2. Thinking and reasoning
3. Lifelong learning
How to Select Assessment Targets?
Subject area standards focus on the content to be taught and mastered. For example, some
standards for earth and space content in science include:
1. Understand basic features of the Earth.
Thinking and reasoning standards are often embedded in content standards, but focus on
how students demonstrate their thinking processes and reasoning strategies.
Ex: understands and applies basic principles of presenting an argument.
Lifelong and reasoning standards focus on interpersonal and intrapersonal skills.
Interpersonal skills address how well learners communicate and work with others.
Intrapersonal standards address how one regulates one’s own attitudes, such as self-control.
Learning Targets for Reading
By the end of second grade……..
1. Use skills and strategies
2. Understand what is read
3. Read fluently
4. Show effort to become a life-long reader
5. Spelling
Learning Targets for Math
By the end of second grade…..
Content strands:
Number sense – Explore and use numbers (especially 0-100) through varied and multiple experiences,
including:
1. Number and numeration
2. Computation
3. Estimation
Measurement
Attributes and dimensions
1. Identify and use appropriate measurement tools
2. Estimate ad measure length, weight, capacity, time, and temperature using non-
standard units
3. Measure to the nearest whole unit
4. Make change from one dollar
What is a Test?
Test is a deliberate attempt by people to acquire information about themselves or others.
3 functions of test:
1. They provide information that are useful for improvement of instruction;
2. In making administrative decisions; and
3. For guidance purposes
The word “test” is usually used to describe a systematic procedure for obtaining a sample of
student behavior. It is a sample of behavior, products, answers, or performance from the
domain.
3 basic concepts in understanding what a test is:
1. A test focuses on a particular domain.
What is a domain?
A test is designed to measure a particular body of knowledge, skills, abilities, or performances
which are of interest to the test user. A test domain can represent something as simple as 4th
Mathematics, or a more abstract construct such as “intelligence.”
A construct is a theoretical idea developed to describe and to organize some aspect of
existing knowledge. Construct allows teachers to describe and organize observed differences in
behavior among people.
2. A test is a sample of behavior, products, answers, or performances from the domain.
What is sampling? A test is a sample of behavior, products, or performances from a larger
domain of interest.
3. A test is made up of items.
Items sampled from a domain represent the basic building blocks of a test.
2 main types of items:
1. Selection types of items require the students to select the correct or the best answer
from the given options.
2. Supply types of test items are fill-in-the blanks, or essay types.
6 Qualities of good test instruments:
1. Tests are better, if they are relatively objective. A test is objective, if –using the same scoring
key- whoever scores the test will arrive at the same score-assuming no clerical errors.
Objectivity represents the agreement of two or more competent judges, scores, or test
administrators concerning a measurement; in essence it is the reliability of test scores between
or among more than one evaluator.
2. A good test should also be relatively reliable. To be reliable, the test must be relatively objective.
Reliability refers to the extent to which assessments are consistent.
Types of reliability
Test-retest reliability is a measure of reliability obtained by administering the same test twice
over a period of time to a group of individuals.
a. Parallel forms reliability is a measure of reliability obtained by administering
different versions of an assessment tool- both versions must contain items that probe
the same construct, skill, knowledge base – the same group of individuals
b. Inter-rater reliability is a measure of reliability used to access the degree to which
different judges or raters agree in their assessment decisions.
c. Internal consistency reliability is a measure of reliability used to evaluate the
degree to which different test items of the same construct produce similar results.
 Average inter-item correlation is a subtype of internal consistency reliability. It is
obtained by taking all of the items on a test that probe the same construct-reading
comprehension – determining the correlation coefficient for each pair items and
finally taking the average of all these correlation coefficients.
 Split-half reliability is another subtype of internal consistency reliability. The
process of obtaining split-half reliability is begun by splitting in a half all items of a
test that are intended to probe the same area of knowledge in order to form two
sets of items.
3. A third quality of a good test should have is validity. To be valid, a test should measure what it
claims to measure. Although it needs to be relatively reliable to be valid, merely because it is
reliable does not mean that it will be valid.
6 Types of Validity
1. Face validity. It ascertains that the measure appears to be assessing the intended construct
under study.
2. Construct validity. This is used to ensure that the measure is actually measure of what it is
intended to measure and not other variables.
What is a construct?
Constructs are attributes that exist in the theoretical sense. Thus, they do not exist in
either the literal or physical sense.
2 methods of establishing a test’s construct validity:
a. Convergent / divergent validation. A test has convergent validity if it has a high
correlation with another test that measures the same construct.
b. Factor analysis is a complex statistical procedure which is conducted for a variety of
purposes, one of which is to access the construct validity of a test or a number of tests.
Other methods of assessing construct validity
Item analysis. There are a variety of techniques for performing an item analysis,
which is often used, for example, to determine which items will be kept for the final
version of a test. It is used to “build” reliability and validity are into the test from the
start.
Item difficulty. An items difficulty level is usually measured in terms of the
percentage of examinees who answer the item correctly.
Item discrimination. It refers to the degree to which items differentiate among
examinees in terms of the characteristic being measured (e.g., between high and low
scores).
A teacher can assess the test’s internal consistency. That is, if a test has construct
validity, scores on the individuals test items should correlate highly with the total test
score. This is evidence that the test is measuring a single construct.
 Developmental changes are tests measuring certain constructs cab be shown
to have construct validity if the scores on the tests show predictable
developmental changes over time.
 Experiential Intervention, that is if a test has construct validity, scores should
change following an experiential manipulation, in the direction predicted by the
theory underlying the construct.
4. Criterion-related validity. This is used to predict future or current performance – it correlates
test results with another criterion of interest.
5. Formative validity. When applied to outcomes assessments, it is used to assess how well a
measure is able to provide information to help improve the program under study.
6. Sampling validity. This is similar to content validity. It ensures that the measure covers broad
range of areas within the concept under study.
Ways to improve validity
1. Make sure your goals and objectives are clearly defined and achievable.
2. Match your assessment measure to your goals and objectives.
3. Get students involved; have the students look over the assessment for troublesome
wording or other difficulties
4. If possible, compare your measure with other measures, or data that may be
available.
Practicality and efficiency of assessment of student learning
Teachers need to be familiar with the tools of assessment. In the development and use of
classroom assessment tools, certain issues must be addressed in relation to the following
important criteria.
1. Purpose and impact.
2. Validity and fairness
3. Reliability
4. Significance
5. Efficiency
Ethics in assessment
As an educator who uses assessments, you are expected to uphold principles of
professional conduct such as:
1. Protecting the safety, health and welfare of all examinees;
2. Knowing about and behaving in compliance with laws relevant to activities;
3. Maintaining and improving competence in assessment;
4. Providing assessment services only in your area of expertise;
5. Adhering to, and promoting high standards of professional conduct within and between
educational institutions;
6. Promoting the understanding of sound assessment practices; and
7. Performing your professional responsibilities with honesty, integrity, due care and fairness.
5 categories of assessment-related activities, each having their own ethical concerns:
1. Crafting assessment procedures- craft them so they are of high quality.
2. Choosing assessment procedures – make sure they are appropriate.
3. Administering assessment procedures – assure the administration process is fair and does
not result in un-interpretable results.
4. Scoring assessment results- evaluate responses accurately and report them in a timely manner.
5. Communicating assessment results- providing complete, useful and correct information that will
promote positive consequences.
What are non-tests?
Good instruction involves observing and analyzing student performance and the most valuable assessment
activities should be learning experiences as well.
Ex: of non-test:
1. Oral and written reports. Students research a topic and then present either orally or in written
form.
2. Teacher observation. The teacher observes students while they work to make certain the students
understand the assignment and are on task
3. Journal. Students write daily on assigned or personal topics.
4. Portfolio of student’s work. Teacher collects samples of student’s work and saves for determined
amount of time.
5. Slates or hand signals. Students use slates or hand signals as a means of signaling answers to the
teacher.
6. Games. Teachers utilize fun activities to have students practice and review concepts.
7. Projects. The students research a topic and present it in a creative way.
8. Debates. The students take opposing positions on topic and defend their position
9. Checklist. The teacher will make a list of objectives that students need to master and then check
off the skill as the student masters it.
10. Cartooning. Students will use drawings to depict situation and ideas.
11. Models. The students produce a miniature replica of a given topic.
12. Notes. Students write a summary of a lesson.
13. Daily assignments. The student completes work assigned on a daily basis to be completed at
school or home.
14. Panel. A group of students verbally present information.
15. Learning centers. Students use teacher provided activities for hands-on learning.
16. Demonstrations. Students present a visual enactment of a particular skill or activity.
17. Problem solving. Students follow a step by step solution of a problem.
18. Discussions. Students in a group verbally interact on a given topic.
19. Organized notes sheets and study guides. Students collect information to help pass a test.
Other non-test instruments
1. Anecdotal records. It is a written record kept in a positive tone of a child’s progress based
on milestones particular to that child’s social, emotional, physical, aesthetic and cognitive
development.
2. Observation checklists. It is a listing of specific concepts, skills, processes or attitudes. It is
designed to allow the observer to quickly record the presence or absence of specific qualities
or understanding.
Observation report
The following are questions and answers about observation.
1. What is an observation?
It is an informal visual assessment of student learning.
2. What is an observation’s objective?
To help the teacher see how a student is learning in order to check on the effectiveness of instruction, to
find out if there is a need to change instruction, and to assess student learning.
3. What does a good observation accomplish?
Provide immediate feedback about the student learning.
4. What is a good observation design?
A rigorous observation is a structured model for the visual assessment of every student over time so that
the student learning experience can be carefully documented.
5. Do you have to observe every student?
No, an observation can be focused on one student, one student over time, or many students over time. An
observation could use a subset, or sample, of the total number of students.
6. What are four types of survey instruments?
Self-administered questionnaires, interviews, structured record reviews and structured observations.
7. What is a valid and reliable observation?
Validity is established when the instruments measures what it is supposed to measure, e.g., content, skills,
knowledge.
Reliability is when the instrument measures that content, skills, or knowledge accurately across students,
classes, schools, etc.
8. What type of result do you get with observations?
Observations answer questions of immediate worth.
9. What is a good observation report?
A short, concise document that both reveals and shows the most important results.
Guidelines for Practice
Observation is like acting or directing or writing any other complex skill-set. It takes practice. Continuous
practice. Most importantly, an observer should have a sense of purpose and a question or two that she is looking
to answer in the observations.When observing, make sure to take notes as best you can during the session, and
then flesh them out immediately afterward or as close to immediately as you can master.
Remember: descriptions are factual, accurate and thorough when taking notes. Avoid judging the
participants and instead relies on what can be seen and known. Do not worry if you feel you are missing
something.
Remember: observe periods of informal interaction and unplanned activity (breaks, free time,
arrivals and departures) as well as what “does not happen.” Practice humility and non-judgment when
observing and reporting. Whenever possible, assume no malice.
Observation Process
1. Preparation. Prior to observing, meet with Teaching Staff to discuss residency and goals. Do not show
up unannounced to observe a class.
Gather as much of the demographic information as possible.
a. Name of observer
b. Name of teacher observed
c. School/ center/ organization where class took place.
2. Sample observation prompts. When observing a classroom, it can be helpful to have a list of behaviors
as a reference. These prompts can be generated based on specific residency and are often applicable
to multiple situations.
3. Inventory. An inventory of students’ learning styles can build self-esteem by helping them discover
their strengths; learn about areas in which they might need to make more effort; and appreciate the
differences among themselves.
4. Portfolio. Collection of student-produced materials provided over an extended period of time that
allows a teacher to evaluate student growth and overall learning progress during that period of time.
Test Standardization
Standardization is the process of trying out the test on a group of people to see the scores which are
typically obtained. A standardized test is a test administered and scored in a consistent manner. The tests are
designed in such a way that the “questions, conditions for administering, scoring procedures and interpretations
are consistent and are administered and scored in a predetermined standard manner.
Understanding norms and test scores
To understand norms and statistical assessment one first needs to understand standardization. Standardization is the
process of testing a group of people to see the scores that are typically attained.
What is standardized testing?
These are tools designed to allow measure of student performance relative to all others taking the same test.
History of standardized testing
In 1909, the Thorndike handwriting scale was the first standardized achievement test used in public schools.
Standardized tests also determine a student’s academic level. “They become the basis for early tracking then ongoing tracking,
reflecting the belief that homogeneous achievement groups facilitate more efficient and effective teaching and learning.”
2 Types of standardized testing:
1. Norm-referenced testing is measures performance relative to all other students taking the same test.
This is the type of test you can use if you want to know how a student is compared to the rest.
2. Criterion referenced testing measures factual knowledge of standardized tests, these tests can be
divided even further into performance tests or aptitude tests.
Performance tests are assessments of what learning has already occurred in a particular subject area.
Aptitude tests are assessments of abilities or skills considered important to future success in school.
Application in classrooms and similar settings
In theory, test scores looked at over time will reveal how much progress schools have made in their efforts to maintain
or raise academic standards. They used to asses program success or failure in connections to the student’s learning.
Establishing test validity
According to Calmorin and Calmorin the degree of validity is most important attribute of a test. Validity refers to the
degree to which a test is capable of achieving certain aims. Item analysis is done after the first try out of a test.
After the item analysis, the tester uses the following table of equivalents in interpreting the difficulty index:
.00-20 - very difficult
.21-.80 - moderately difficult
.81-1.00 - very easy
Item Revision. On the basis of item analysis data, test items are revised for improvement. After revising the
test items that need revision, the tester needs another try out. The revised test must be administered to the same set
of samples.
How to establish reliability?
Test reliability is an element in test instruction and test standardization and is the degree to which a measure
consistently returns the same result when repeated under similar conditions. Reliability does not imply validity. That is,
a reliable measure is measuring.
Reliability may be estimated through a variety of methods that fall into two types: single-administration and
multiple-administration. Multiple-administration methods require that two assessments are administered.
1. Test-retest reliability is estimated as the person product-moment correlation coefficient
between two administrations of the same measure. This is sometimes known as the
coefficient of stability.
2. Alternative forms reliability is estimated by the person product- moment correlation
coefficient of two different forms of measure, usually administered together. This is
sometimes known as the coefficient of equivalence.
Single-administration methods include split-half and internal consistency.
1. Split-half reliability treats the two halves of a measure as alternate forms. This “halves reliability” estimate is then
stepped up to the full test length using the Spearman Brown Prediction Formula. This is sometimes referred to as the
coefficient of internal consistency.
2. The most common internal consistency measure is Cronbach’s alpha, which is usually interpreted as the mean of all
possible split-half coefficients.
These measures of reliability differ in their sensitivity to different sources of error and so need not be equal. Also,
reliability is a property of the scores of a measure rather than the measure itself and are thus said to be sample
dependent.
Reliability Estimation Using a Split-half Methodology
The split-half design in effect creates two comparable test administrations. The items in a test are split into
two tests that are equivalent in content and difficulty. Often this is done by splitting among odd and even numbered
items. This assumes that the assessment is homogenous in content.
Once the test is split, reliability is estimated as the correlation of two separate tests with an adjustment for
the test length. Other things being equal, the longer the test, the more reliable it will be when reliability concerns
internal consistency. This is because the sample of behavior is larger.

Module 3 Principles of High Quality Assessment

Enviado por

Dados do documento

Descrição original:

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Module 3 Principles of High Quality Assessment

Enviado por

Direitos autorais:

Formatos disponíveis

Report by: Mendez Maricel B.

Subject: Assessment of Student

Module 3 Principles of High Quality Assessment

Você também pode gostar