Escolar Documentos
Profissional Documentos
Cultura Documentos
Fontys University of Applied Sciences, FHKE, pabo Eindhoven, De Lismortel 25, 5612 AR Eindhoven, The Netherlands
Utrecht University of Applied Sciences, Faculty of Education, Research Group Vocational Education, P.O. Box 14007, 3508 SB Utrecht, The Netherlands
c
HAN University of Applied Sciences, Faculty of Education, Research Centre for Quality for Learning, P.O. Box 30011, 6503 HN Nijmegen, The Netherlands
b
A R T I C L E I N F O
A B S T R A C T
Article history:
Received 13 March 2013
Received in revised form 24 November 2013
Accepted 25 November 2013
Available online 18 December 2013
The focus of this article is the development and evaluation of an assessment program for measuring
senior teachers competences in secondary schools. The goals of the developed instrument were
measuring senior teachers competences and providing the opportunity for self-reection for the
teachers assessed. This instrument was developed and evaluated in four steps: (1) the content of
assessment was determined, dened in senior teacher competences; (2) criteria and standards were
specied for the assessment of the competences; (3) the assessment methods were determined; and (4)
the assessment program was evaluated by means of a pilot study. The target group consisted of eight
potential senior teachers, who were assessed with the new instrument. In total, eleven teachers and 70
pupils evaluated the new assessment instrument. The assessment seems t for the purpose. Pupils are
positive about the assessment program, whereas the teachers are more sceptic about it.
2013 Elsevier Ltd. All rights reserved.
Keywords:
Teacher evaluation
Evaluation methods
Secondary education
Introduction
For many years, the quality of education in general and of
teachers in particular has been the object of discussion and
research. Indeed, teacher quality is important because teachers
play a crucial role in realizing the quality of the learning
environment (Hattie, 2009) and determine to a great extent the
schools quality (Marzano, 2011). In this respect, Rasmussen and
Friche (2011) state that schools experience a pressure to increase
and demonstrate the quality of their education and teachers. In the
Netherlands, this pressure to increase the quality of education in
general and of teachers in particular has been addressed by the
Teaching Advisory Board of the Dutch government. As a way to
increase teacher quality, they advised to create more opportunities
for career development and differentiation within the teaching
profession. This should increase the attractiveness of the teaching
profession and prevent good teachers from leaving schools and
choosing other career paths (Teaching Advisory Board, 2007). The
Dutch Ministry of Education decided that secondary schools
should introduce integral personnel management in order to (1)
stimulate teachers development; (2) offer opportunities for
differentiation in the teacher profession; and (3) raise the quality
of Dutch secondary education. It was assumed that the introduction of integral personnel management in secondary education
would lead to increased educational quality. It might help putting
the best teachers on the most complex tasks and pupil groups, and
the possibility to address weak teaching practices (Borko,
Whitcomb, & Liston, 2009).
To integrate an effective and fair integral personnel management system, instruments are needed that validly and reliably
assess teacher quality (van der Schaaf, Stokking, & Verloop, 2005).
At the moment, no specic standardized procedures or guidelines
for teacher assessment are available and Dutch secondary schools
emphasize those aspects which are important for their particular
schools. The common practice is that teachers gain a raise of salary
each year, simply by having worked a year more as a teacher. In
order to effectuate this, one annual dialogue between teacher and
management takes place. This can hardly be looked upon as an
assessment method for teacher quality. The question then arises
whether there are possibilities to assess teacher quality validly.
Whereas assessment and development of student teachers has
quite often been studied (e.g. Hegender, 2010; Noell & Burns,
2006), summative assessment of teachers working in schools has
been studied distinctively less often. Therefore the aim of the
current study is to develop and evaluate a summative assessment
program for senior teachers in secondary education. Besides this
summative function, the assessment program should have a
formative function to enable and stimulate teachers to reect on
their own competence development.
51
52
Level
Level
Level
Level
1:
2:
3:
4:
53
54
describe the desired level of prociency and are mostly used for
product evaluations. Verbal descriptions or qualitative rubrics
(Scriven, 1980) describe the properties characterizing the desired
level of prociency. These standards are context-specic and are
often the most feasible to use, especially when multiple criteria are
used (Sadler, 1998). A rubric is a scoring tool for qualitative rating
of authentic work and it includes criteria for rating important
dimensions of performance. It describes levels of performance on a
particular task and thereby denotes what is considered important
to both assessors and assessees. For assessors, it helps determine
what to look for when assessing (Jonsson & Svingby, 2007;
Tigelaar, Van Tartwijk, Janssen, Veldman, & Verloop, 2009). A
review of studies investigating the use of scoring rubrics shows
that rubrics can enhance the reliable scoring of performance
assessments, especially if they are analytic and topic-specic
(Heldsinger & Humphry, 2013; Jonsson & Svingby, 2007; Panadero
and Jonsson, 2013). Consistent with the Sadlers (1998) ideas, this
study also shows the benets of exemplars and adds the
importance of rater training. Rubrics, on the other hand, do not
automatically enhance the validity of performance assessments.
This requires not only that the content of the rubric adequately
represents the content of the construct to be assessed (in this case,
senior teachers competences), but also that, for example, the
mental processes used during the assessment are incorporated.
According to Jonsson and Svingby, very few studies on the use of
rubrics provide this kind of validity evidence, which implies that
the effect of rubrics on the validity of performance assessments is
not clear at the moment. In this study, we decided to describe a set
of rubrics for each competence, as research seems to mainly show
advantages of the use of rubrics. The teacher development team
has described a set of rubrics for each competence dened in the
competence prole for senior teachers, and they have also
developed the competence prole in the same way. Eventually,
based on consensus, the rubrics were xed for each level (see
Table 1 for the eight competences and the rubrics).
Step 3: determination of the assessment program parts
The competence prole and rubrics were the starting point for
the further development of the assessment program for senior
teachers. The rst two steps described the development of the
competence prole and the rubrics, dening the content (what is
assessed). The next, third, step is about the way how this content
could be assessed. The choice of assessment methods largely
determines the validity of the assessment process, as the methods
should adequately measure the construct at stake (Messick, 1995).
A single assessment would probably not be sufcient to validly
assess senior teachers competences. A mix of methods should be
used instead (Baartman et al., 2007; van der Vleuten & Schuwirth,
2005), because it reveals additional insights in comparison with
one single assessment method, gaining input from qualitative as
Table 2
Mix of methods for the assessment of senior teachers competences.
Method
Target group
When
Conditions
Observation questionnaire
Pupils
Observation questionnaire
Colleagues
Questionnaire
Portfolio development
Senior teacher
Senior teacher colleagues
Portfolio assessment
External experts
Interview
External experts
- Standardized instruction
- Portfolio guidelines
- Independent experts
- Independent experts
Number
of items
Typical item
Leadership
Helpful/friendly
Understanding student
10
10
10
Responsibility/freedom
Uncertain
Dissatised
Admonishing
Strict
9
9
9
9
55
56
Table 4
Twelve quality criteria for competence assessment programs (Baartman et al., 2007,
p. 261).
1. Acceptability
All stakeholders should approve of the assessment methods, criteria
and standards
2. Authenticity
The degree of resemblance of the assessment to the (future) workplace
3. Cognitive complexity
The assessment should reect the presence of the cognitive skills needed and
should enable the judgement of thinking processes
4. Comparability
The assessment should be conducted in a consistent and responsible way.
The tasks, criteria and working conditions should be consistent with regard
to key features of interest
5. Costs and efciency
The time and resources needed to develop and carry out the assessment,
compared to the benets
6. Educational consequences
The degree to which the assessment yields positive effects on learning and
instruction and the degree to which negative effects are minimized
7. Fairness
Teachers should get a fair change to demonstrate their competences, by
letting them express themselves in different ways and making sure the
assessors do not show biases
8. Fitness for purpose
The assessment methods, criteria and standards should be compatible with
the construct to be measured
9. Fitness for self-assessment
The assessment should stimulate self-regulated learning by fostering selfassessment and the formulation of learning goals
10. Meaningfulness
The assessment should be a learning opportunity and provide valuable
feedback for further learning
11. Reproducibility of decisions
Decisions made based on the results of the assessment should based on
multiple situations and assessors. Decisions should not depend on one
assessor or specic situation
12. Transparency
The assessment, criteria and standards should be clear and understandable
to all stakeholders
Method
The study presented in this paper describes pilot study of the
assessment program, including an evaluation of the assessment
program as well.
Instruments
The competence prole for senior teachers was transformed
into a questionnaire, as described above and was used by the peer
teachers, the management and the teachers themselves. Next, a
standardized questionnaire on teacher behaviour was used, the
QTI (questionnaire on teacher interaction) by the pupils and the
senior teachers themselves. This questionnaire is a validated and
reliable instrument used in many other (international) studies
already (den Brok et al., 2010; Levy et al., 2003; Telli & den Brok,
2012). The scales and numbers of items of the QTI are presented in
Table 3.
Evaluation instruments of the (perceived) quality of the assessment
program
To evaluate the quality of the entire assessment program the 12
quality criteria of Baartman et al. (2007) were used (Table 4
presents the categories). In a previous study, these quality criteria
were specied into 46 indicators per quality criterion (Baartman
et al., 2007), which were used as questions in a questionnaire in
this study. The participating senior teachers and their colleagues
judged the quality of the assessment program on a 10-point Likert
scale. The pupils could ll in four of the twelve quality criteria: (1)
tness for purpose; (2) transparency; (3) fairness; and (4) (costs
and) efciency. These four were chosen, because these are the most
visible for the pupils, for example, if the criteria were clear to them
and if they thought the criteria represented their opinion of a good
teacher. Next, four questions were added to the pupils questionnaire in order to receive information about the pupils perspective
on the usefulness of the assessment program.
Participants
Eight senior teachers participated as assesses in the pilot study
of the assessment program: six men and two women. The age of
the teachers varied between about 30 years and 63 years of age. All
teachers taught a subject like maths or languages, and one teacher
taught physical exercise and had managerial tasks next to her
teaching tasks. All teachers had gained at least ve years of
teaching experience. In the evaluative part of the pilot study, which
was not obligatory, seven out of the eight senior teachers
completed the evaluation questionnaire of the quality of the
assessment program.
For each participating senior teacher, pupils of two classes rated
their teachers. They carried out observations and lled out the QTIquestionnaire. In total 170 pupils participated. The pupils varied in
age between 14 and 17 years old. Participation of the pupil groups
was obligatory, so the response rate was close to 100%. 70 out of
the 170 pupils also completed the evaluation questionnaire on a
voluntary basis.
Four different colleagues observed each single senior teacher;
in total 32 teachers participated in the new assessment program as
observers and 16 other teachers helped providing written
feedback. In total 48 teachers were involved in the assessment
program of their eight colleagues. Only 4 out of the 48 peer
teachers completed the evaluation of the quality of the assessment
program. This difference in participation between the pilot study
and the evaluative part of the pilot study might be due to the
period of the year (at the end of the second semester just before the
summer holidays) and the fact that peers and pupils were invited
to participate on a voluntarily basis.
Data analyses
For the assessment of the senior teachers competence,
available data were (1) results of pupils on the QTI-questionnaire,
together with the scores of the senior teachers themselves on the
QTI-scales; (2) the scores of the questionnaires on the competence
prole for senior teachers, completed by the colleagues; and (3)
portfolios of the senior teachers including the feedback by peer
colleagues. The questionnaires from the colleagues were analysed
by computing mean scores per competence (varying between 1
and 4) that subsequently were computed into percentages,
indicating on a 100% scale how often the teacher showed a
57
positive ones. For the last teacher, the portfolio was insufcient
and even if there would have been an interview with additional
materials, it could not have led to a positive judgement. Therefore,
the assessment interview was cancelled and this teacher was
requested to construct a new portfolio. In an evaluation interview,
the eight teachers, even the one without a positive judgement,
stated that they recognized the advice and judgement.
Opportunity for reection
Working with portfolios regarding professional development is
considered valuable when there is a dialogical context. This
dialogical context was created by having the senior teachers ask
their peers for written feedback. Next, an interview was held with
the senior teacher and two experts. All teachers stated that the
entire process helped them reect on their profession, their
behaviour and their actions undertaken. Seven out of the eight
senior teachers told the experts that it was a developmental
process for them to work on the portfolio because of the gathering
of evidence proving the competence, reecting on the competences and writing down, asking peers for feedback and discussing
this with the experts. One senior teacher, who did not receive a
positive judgement, did not agree with the other teachers on this.
He stated that the assessment program also judged the way one
could build up a portfolio and use one writing skills, and not only
the senior teachers competences.
Evaluation of the assessment program
Mean scores of the evaluation of the quality of the assessment
program as judged by the teachers and peer teachers on a 110
scale are presented in Table 6. The criterion acceptability (i.e. all
stakeholders should approve of the assessment methods, criteria
and standards) showed the lowest score. The teachers who had
been assessed, as well as the teachers who participated in the peer
assessment, did not completely support the assessment program
used (teachers M = 5.67, colleagues M = 4.75). Especially the
teachers who participated in the peer assessment reported low
scores on the acceptance of this method. The criterion fairness
also showed low scores within both groups (teachers M = 5.62,
colleagues M = 5.51). This criterion comprises questions like do
you think the assessment is fair and are the assessors
unprejudiced. The assessed teachers also reported low scores
on the criterion educational consequences. They stated that this
assessment program did not really inuence their professional
behaviour (M = 5.82). The peer assessors on the other hand stated
that participation in the assessment program did inuence the
teachers professional behaviour (M = 8.88). Next, the assessed
teachers reported that the assessment program was suitable for
self-reection (M = 7.43), which was part of the aim of the
Table 5
Results on different parts of the assessment of senior teachers competences.
Method
Target group
Pupils
Pupils
Colleagues
Colleagues
Questionnaire (self)
Portfolio assessment
Senior teacher
External experts
Interview
External experts
a
b
T1 = rst measurement.
T2 = second measurement (6 months after rst measurement).
58
Table 6
Mean scores and standard deviations (SD) on the 12 quality criteria for assessed teachers and peer assessors (110 scale).
Criteria
Teachers (N = 7)
Peer assessors (N = 4)
Number of items
Mean
SD
Mean
SD
Acceptability
Authenticity
Cognitive complexity
Comparability
Costs and efciency
Educational consequences
Fairness
Fitness for purpose
Fitness for self-assessment
Meaningfulness
Reproducibility of decisions
Transparency
5.67
7.14
6.69
6.92
5.87
5.82
5.62
7.30
7.43
6.32
7.38
6.57
2.61
1.80
1.79
2.50
1.88
2.93
0.83
1.06
1.47
2.50
1.16
1.17
4.75
6.25
6.73
6.88
8.88
5.51
6.65
6.00
6.25
7.75
6.75
2.99
1.50
1.72
3.01
0.88
2.28
2.29
1.75
2.54
1.77
1.52
3
2
5
4
6
4
6
6
4
4
6
3
Table 7
Pupils means scores and standard deviations (SD) on four quality criteria (N = 70, 110 scale).
Criterion
Mean
SD
Number of items
7.32
7.75
8.16
8.04
2.21
1.62
1.48
1.52
5
1
2
2
Additional questions
A good assessment with pupils participation
Adequate questions about my teacher
Questionnaire is a way of giving feedback
This assessment leads to a change in behaviour of my teacher
8.21
7.18
7.11
3.35
1.63
1.61
1.89
2.86
3
1
1
1
development process contained four steps. The rst step concerned determining the content of the competences to be assessed.
In the second step specication of criteria and standards was
undertaken and in the third step methods were chosen for carrying
out the assessment program. The assessment program was
implemented in a pilot study, assessing eight senior teachers.
Theoretical frameworks on good teachers do not present one
specic view on good teachers (Berliner, 2001; Fenstermacher &
Richardson, 2005). Therefore, three theoretical perspectives on
good teaching can be recognized in the nal competence prole for
senior teachers competences. The prole included aspects from (1)
perception studies of ideal teaching, including learning environment research (Allen & Fraser, 2007); (2) effectiveness research
(e.g. Seidel & Shavelson, 2007); and (3) studies on teachers
professional knowledge (e.g. Berliner, 2004; Darling-Hammond &
Snyder, 2000; Verloop, 2005). The literature on good teachers was
presented to the development team and the team also used their
own (literature) resources from e.g. professional development
programs. A specic aim was that the school team would recognize
the new assessment program and that there would be a strong
commitment towards using it. As a consequence, a school-specic
competence prole was developed by the schools development
team. This is a rather eclectic approach, using competences tting
to the specic school context, mostly chosen bottom up. Berliner
(2005) described the importance of taking specic demands of the
school environment into account. The school management agreed
a rather eclectic approach in choosing the competences, in order to
create a larger commitment of the team.
The assessment program had two goals: judgement of senior
teachers competences and creating an opportunity for reection
on their competences by the senior teachers participating. The
senior teachers stated that the assessment program did not really
inuence their professional behaviour, but they recognized the
possible inuence of the assessment program on their professional
behaviour as teachers. They mentioned the possibility to reect on
59
60
61
62
Tamara van Schilt-Mol, PhD, is associate professor, testing and assessing at the
HAN University of Applies Sciences, the Netherlands. She is part of the Research
Centre Quality for Learning. Her research focuses both on the function of testing
and assessment regarding development of students and teachers/lecturers, and on
the function of testing and assessment regarding (improving) the quality of
education.