Jurnal (Studies and Evaluation)

Studies in Educational Evaluation 40 (2014) 5062
Contents lists available at ScienceDirect
Studies in Educational Evaluation

journal homepage: www.elsevier.com/stueduc
Development and evaluation of a summative assessment program for

senior teacher competence
Anouke Bakx a,*, Liesbeth Baartman b, Tamara van Schilt-Mol c
a
Fontys University of Applied Sciences, FHKE, pabo Eindhoven, De Lismortel 25, 5612 AR Eindhoven, The Netherlands
Utrecht University of Applied Sciences, Faculty of Education, Research Group Vocational Education, P.O. Box 14007, 3508 SB Utrecht, The Netherlands
c
HAN University of Applied Sciences, Faculty of Education, Research Centre for Quality for Learning, P.O. Box 30011, 6503 HN Nijmegen, The Netherlands
b
A R T I C L E I N F O
A B S T R A C T
Article history:
Received 13 March 2013
Received in revised form 24 November 2013
Accepted 25 November 2013
Available online 18 December 2013
The focus of this article is the development and evaluation of an assessment program for measuring
senior teachers competences in secondary schools. The goals of the developed instrument were
measuring senior teachers competences and providing the opportunity for self-reection for the
teachers assessed. This instrument was developed and evaluated in four steps: (1) the content of
assessment was determined, dened in senior teacher competences; (2) criteria and standards were
specied for the assessment of the competences; (3) the assessment methods were determined; and (4)
the assessment program was evaluated by means of a pilot study. The target group consisted of eight
potential senior teachers, who were assessed with the new instrument. In total, eleven teachers and 70
pupils evaluated the new assessment instrument. The assessment seems t for the purpose. Pupils are
positive about the assessment program, whereas the teachers are more sceptic about it.
2013 Elsevier Ltd. All rights reserved.
Keywords:
Teacher evaluation
Evaluation methods
Secondary education
Introduction
For many years, the quality of education in general and of
teachers in particular has been the object of discussion and
research. Indeed, teacher quality is important because teachers
play a crucial role in realizing the quality of the learning
environment (Hattie, 2009) and determine to a great extent the
schools quality (Marzano, 2011). In this respect, Rasmussen and
Friche (2011) state that schools experience a pressure to increase
and demonstrate the quality of their education and teachers. In the
Netherlands, this pressure to increase the quality of education in
general and of teachers in particular has been addressed by the
Teaching Advisory Board of the Dutch government. As a way to
increase teacher quality, they advised to create more opportunities
for career development and differentiation within the teaching
profession. This should increase the attractiveness of the teaching
profession and prevent good teachers from leaving schools and
choosing other career paths (Teaching Advisory Board, 2007). The
Dutch Ministry of Education decided that secondary schools
should introduce integral personnel management in order to (1)
stimulate teachers development; (2) offer opportunities for
differentiation in the teacher profession; and (3) raise the quality
* Corresponding author. Tel.: +31 8778 75 993.

E-mail addresses: a.bakx@fontys.nl (A. Bakx), Liesbeth.baartman@hu.nl
(L. Baartman), Tamara.vanschiltmol@han.nl (T. van Schilt-Mol).
0191-491X/$ see front matter 2013 Elsevier Ltd. All rights reserved.
http://dx.doi.org/10.1016/j.stueduc.2013.11.004
of Dutch secondary education. It was assumed that the introduction of integral personnel management in secondary education
would lead to increased educational quality. It might help putting
the best teachers on the most complex tasks and pupil groups, and
the possibility to address weak teaching practices (Borko,
Whitcomb, & Liston, 2009).
To integrate an effective and fair integral personnel management system, instruments are needed that validly and reliably
assess teacher quality (van der Schaaf, Stokking, & Verloop, 2005).
At the moment, no specic standardized procedures or guidelines
for teacher assessment are available and Dutch secondary schools
emphasize those aspects which are important for their particular
schools. The common practice is that teachers gain a raise of salary
each year, simply by having worked a year more as a teacher. In
order to effectuate this, one annual dialogue between teacher and
management takes place. This can hardly be looked upon as an
assessment method for teacher quality. The question then arises
whether there are possibilities to assess teacher quality validly.
Whereas assessment and development of student teachers has
quite often been studied (e.g. Hegender, 2010; Noell & Burns,
2006), summative assessment of teachers working in schools has
been studied distinctively less often. Therefore the aim of the
current study is to develop and evaluate a summative assessment
program for senior teachers in secondary education. Besides this
summative function, the assessment program should have a
formative function to enable and stimulate teachers to reect on
their own competence development.
A. Bakx et al. / Studies in Educational Evaluation 40 (2014) 5062
Indeed, literature shows different perspectives on how teacher

competence is dened and measured, many of these focusing on
the effectiveness of teachers in accomplishing high student learning
outcomes (e.g. Chen, Mason, Staniszweski, Upton, & Valley, 2011;
Mangiante, 2011; Praslova, 2010; Seidel & Shavelson, 2007). These
studies rely on the assumption that certain teacher behaviour (den
Brok, Brekelmans, & Wubbels, 2004) and teachers (pedagogical)
content knowledge (e.g. Baumert et al., 2010; Kleickmann et al.,
2013; Shulman, 1986) have an inuence on student achievement.
Research results about teacher competences were used as input for
the assessment development team who would construct a schoolspecic assessment method and based on this together with input
from the teachers themselves, an assessment instrument was
developed.
The focus of this instrument was on senior teachers, because of
their important role in the school; they have the most important
(teaching) positions in schools and are responsible for coaching
starting teachers, for example. It is assumed that these senior
teachers determine the quality of the school to a large extent. Next to
this, there was additional funding from the government, meant for
the best teachers, in order to motivate them additionally and keep
them in school. For the integral personal management of a school it is
important to be able to spot and assess these key teachers in a valid
way, presumably in a way that is accepted by the school team. In the
study described in this paper, senior teachers have already been
effective teachers for many years and for the new program to be
developed, competences were needed that would have an additional
value beyond being a very effective teacher.
Thus, the main focus of our study was to develop and test an
assessment program for distinguishing average senior teachers
from very good senior teachers. The assessment program should
contribute to an opportunity of self-reection for the teachers as
well. Therefore, the central question of our study is: How can
senior teachers competence in secondary education be assessed,
while providing the opportunity for self-reection by the senior
teachers? The assessment program was developed in close
collaboration with a large secondary school and a pilot study
was organized in which we carried out and evaluated the
assessment program. In order to do so, the following steps were
carried out: rst, literature was explored on what good teachers
are and the content of the teacher competence had to be
determined. Second, criteria and standards were dened in order
to validly assess the competences of senior teachers. Third, the
program sections of the assessment were determined. The nal
step was to carry out a pilot with eight participating senior
teachers. The new assessment program was evaluated. Below,
these four steps are described in detail.
Theoretical background
Dening good teachers
The ability to distinguish average senior teachers from very
good senior teachers depends on how senior teacher competence is
dened and what assessment criteria and standards are set
(Uhlenbeck, 2002). In general, all assessments require a clear
notion of the construct to be assessed (Messick, 1995; Sadler,
1998). This is especially important for the development process
described in this article because the assessment program being
developed in this study can be considered a high stake
assessment. A positive assessment result would lead to a salary
raise, while negative outcomes of the assessment program would
lead to a frozen salary. Senior teachers, as we focus on in this
study, ought to be the schools best teachers. Dening good
teachers is complex and there is no consensus on this topic, yet
(e.g. Berliner, 2001; Fenstermacher & Richardson, 2005).
51
Contemporary educational research on good teachers is

scattered across a variety of research traditions, showing a
diversity of denitions, instruments and results related to the
issue of good teaching. These traditions can be broadly categorized
as: (1) perception studies of ideal teaching, including learning
environment research (Allen & Fraser, 2007); (2) effectiveness
research (e.g. den Brok et al., 2004; Seidel & Shavelson, 2007), (3)
studies on teachers professional knowledge (e.g. Berliner, 2004;
Darling-Hammond & Snyder, 2000; Verloop, 2005), and (4)
research on teachers professional identity (e.g. Beijaard, Meijer,
& Verloop, 2004; Day, Sammons, Stobart, Kingston, & Gu, 2007).
These four traditions have their own specic perspective of
studying good teaching practices.
The rst perspective, perception studies of ideal teaching, for
example, show that students (aged 716 years) value a nice
personality and teaching ability very important (e.g. Beishuizen,
Hof, van Putten, Bouwmeester, & Asscher, 2001), as well as
competent instructing, focusing on transfer of knowledge and
skills. Kutnick and Vena (1993) mentioned physical presentation,
teachers care for students, and trustworthiness as being important
for good teachers whereas Hamacheck (1969) adds being helpful in
schoolwork, clear explanation and humour.
The second tradition, effectiveness research, mainly focusses on
the results of teachers actions on students learning processes,
achievement or attitude towards learning (Seidel & Shavelson,
2007). Seidel and Shavelson (2007) used an interesting framework
of teachers effectiveness based on cognitive models of teaching
(and student learning) in their meta-analysis on teachers
effectiveness studies. One of their conclusions was that domainspecic components of teaching resulted in the largest effects for
students learning. Studies within this perspective show that the
combination of teaching skills with communicative competence
are important for gaining positive achievement by the students
(e.g. Hattie, 2009; Marzano, 2003; Ryan & Deci, 2000; Scheerens,
2007). Further, Brophy and Good (1986) stated instruction and
classroom management techniques are very important teacher
behaviours. This is in line with ndings from learning environment
research as described above. More specically, effectiveness
studies show that in order to gain high student outcomes, teachers
should be able to realize an appropriate level of difculty for the
instruction, continuous progress at a high success rate, effective
diagnosis of learning needs and prescription of learning activities
and monitoring of progress and continuous practice, integrating
new learning (Brophy & Good, 1986; Marzano, 2003). This also ts
the perception perspective, in which students also state that
teaching ability is important and that they are preferably taught by
competent instructors, who can transfer knowledge and skills
(Beishuizen et al., 2001).
The third tradition described, concerns the (practical and
theoretical) professional knowledge required for good teaching.
Teachers domain-specic knowledge is important for explaining
properly and asking the stimulating, specic, subject-related
questions (Darling-Hammond, 1999). In order to be able to
instruct well (professional) knowledge of teachers is considered
a requirement (Clausen, Reusser, & Klieme, 2003; Wise & Okey,
1983). More specically, teachers subject matter knowledge and
pedagogical content knowledge have been argued to be essential
for realizing quality of education (e.g. Hill, Rowan, & LoewenbergBall, 2005; Shulman, 1986). Teachers pedagogical and subjectrelated knowledge are often linked to their quality of instruction
(Elbaz, 1991; Shulman, 1987).
Finally, the fourth tradition concerns research on teachers
professional identity, taking the teacher as a person as focus for
research, stating that the teachers personality is omnipresent in
his way of teaching and professional learning (Beijaard et al.,
2004). The identity perspective claims that teachers perceive
52
teaching as a combination of different roles regarding the teaching

job and a certain hierarchy concerning these roles. Teachers view
themselves as subject matter experts, learning experts, and
pedagogical experts (Beijaard, Verloop, & Vermunt, 2000). The
perceived hierarchy in roles determines teachers professional
identity and their behaviour they show. Taking results from the
above described research traditions together, an integrated
expertise comes forward, combining professional (pedagogical
and subject-related) knowledge and instructional skills (Beijaard
et al., 2000; Darling-Hammond, 1999; Stronge, 2007). From the
perception studies and the studies on professional identity of
teachers, personality-related characteristics could be added, but
up to now little empirical evidence has been found that this leads
towards better student outcomes.
Summarizing, good teaching and good teachers have been
studied from different perspectives, leading to different foci, like
instructional quality and classroom management, professional
knowledge and teachers personality as possible main factors for
good teaching. As Gage (1964) stated, it might be the case that
teaching and teacher quality cannot be described by a single theory
at all. Vanderlinde and Van Braak (2010, p. 303) stated that the
traditional top-down model of the development and dissemination
of educational innovations should be replaced by a model where
teachers share a primary role with educational researchers in the
development of innovative practices (Englert & Tarrant, 1995).
Vanderlinde and Van Braak (2010) refer to a research-practice gap.
With this, they emphasize the fact that the use of and reection
upon and use of academic research by teachers are usually less
than optimal. It seems that teachers and other educational
professionals do not seem to see an additional value of educational
research, or they are unable to use results from educational
research in their practice. Vanderlinde and Van Braak (2010) found
in their study a possible solution to bridge this gap, by realizing
more cooperation between researchers and educational practitioners. In our study, for example, we have translated their insights
into a more bottom up approach, by giving the information from
literature as input to the development team.
Instead of sticking to one perspective of teacher quality, we
choose for a more eclectic approach, using multiple perspectives.
This was done in order to increase the school teams commitment,
leading towards an underlying competence prole for senior
teachers that would be recognized and accepted by the school
team, partly because this would be developed by the school team
itself instead of driven by one single theoretical perspective.
Teachers position towards an educational innovation is more
positive when given ownership, agency and logical sense-making
(Ketelaar, Beijaard, Boshuizen, & den Brok, 2012). This is why we
decided to take our knowledge on good teachers with us as a
starting point for the development of the assessment instrument,
while not having these theoretical ndings dominating the
discussions with the development team of the instruments (see
below). The way of bottom up working instead of top down on
the development of a high stake assessment program, as described
in this paper, contributes to the body of knowledge of assessment
development as well as educational innovation, when striving for
optimal commitment of the school team.
Development and evaluation of the teacher assessment
program
Step 1: determining the content: teacher competence
As this study was carried out in a large secondary school in the
Netherlands, the competence prole had to be locally valid. That is,
the Dutch government has adopted the Professions in Education
Act (2005), which species seven teacher competences as the
minimum quality for certied teachers. Teacher-training colleges

have to use these competences to assess their teacher students and
a logical consequence is to use these same competences to assess
working teachers in a personnel management system. These seven
teacher competences are (Snoek et al., 2009): (1) interpersonal
competence; (2) pedagogic competence; (3) subject knowledge
and methodological competence; (4) organizational competence;
(5) competence for collaboration with colleagues; (6) competence
for collaboration with the working environment; and (7) competence for reection and development. These seven competences
refer to a basic level or a starting point for junior teachers. For this,
quick scans and internet self-assessment tools for teachers are
available. However, more than this starting level is expected from
senior teachers, being already the more effective teachers in
school. Furthermore, instruments to assess teachers behaviour in
the classroom already exist, like the Questionnaire on Teacher
Interactions (den Brok et al., 2003). Unfortunately, in Dutch
secondary schools, personnel management is a rather underdeveloped area and has not received much attention from the school
management so far (Seezink & Poell, 2009). There are no
systematically documented experiences from other Dutch schools,
yet, which could be used in order to develop the assessment
program aimed for. Therefore, a competence prole for senior
teachers extending the basic and effective level has been
developed in this study.
School development team
In this study, a school-specic competence prole was developed for senior teachers. The (school) context was explicitly taken
into account because of the specic demands of the school
environment for senior teachers (Berliner, 2005) and because of
the commitment of the school team to this new assessment
program. Next to this, the described theoretical perspectives were
brought into the development team by the authors of this paper, as
well as specic trends within teachers education, for example, the
teacher-research as being a competence teachers need for their ongoing professional development (van der Linden, Bakx, Ros, &
Beijaard, 2012).
A development team consisting of ten teachers from the
secondary school (10% of the entire school team) and the
management were brought together with the assignment to
specify the competences which were needed for by their senior
teachers. Four out of the eight senior teachers to be assessed
participated voluntarily in the development team. A large team
was chosen in order to create a valid prole as well as a
commitment for the use of the new competence prole for senior
teachers as the underlying basis for the assessment program. For
the acceptance of this new assessment program, the team of
teachers should be condent that it is a valid and fair way of
judgement (Baartman, Prins, Kirschner, & Van der Vleuten, 2007).
During one year, the development team worked on the competence prole. They started with two open brainstorm sessions and
they investigated literature on teacher quality and recent
educational developments. The teachers themselves mainly used
literature, which they frequently used for their own professional
development. This was followed by discussion-sessions of rst,
second and third drafts of the competence prole for senior
teachers. Eventually, based on consensus, the competence prole
for senior teachers was accepted by all members of the
development team (see Table 1).
The competence prole for senior teachers consists of two
parts. First, the development team agreed that the seven
competences determined by the Dutch government as described
above relate to senior teachers as well as to starting teachers. These
competences mainly focus on effective classroom activities, but
also include cooperation with colleagues and stakeholders in the

Table 1
Competence prole for senior teachers, description of (level of) competences (level
1 = lowest level, level 3 = highest level).
1. Flexibility/anticipating
Flexibility/anticipating concerns the application of new alternatives:
improvise change methods easily, when the existing method is not
effective
propose new ideas
move quickly and effectively between ones own task and someone elses
task; adjust quickly
see the need for change, make suggestions and take initiatives
Level 1: change ideas and methods to changing circumstances
Level 2: change robust patterns
Level 3: apply new alternatives
2. Innovating
Innovating concerns the proposition and creation of alternatives:
introduce new activities
take risks and be not afraid to fail
see and use opportunities
create a stimulating learning environment for pupils
Level 1: develop new, original methods and applications
Level 2: propose and create alternatives for existing routines
3. Learning
Learning concerns sharing learning experiences with others and act as a role
model:
using different learning styles and apply these while learning
reective attitude in new situations
as questions and show ones own insecurities
be a role model in admitting mistakes and learn from it
talk about ideas about ones own professional development and ask for
feedback
Level 1: reect upon ones own qualities and translate this into behavioural
changes
Level 2: look for/create situations to learn
Level 3: share learning experiences with others and act as a role model
4. Dealing with stress
Dealing with stress concerns guarding ones own limits and talk about this
in the teachers team:
in times of pressure make sure the team works efciently (priorities)
stick to ones own ideas in time of pressure
talk about resistance to change by analysing the process together with
others
Level 1: take ones own limits into account
Level 2: take things easy
Level 3: guard ones own limits and talk about this in the teachers team
5. Coaching
Coaching concerns the stimulation of others and increasing their
self-condence:
stimulate others to ask questions about their drives and motives
put aside ones own judgement to increase other peoples condence
notice and mention individual contributions to . . .
represent trust and security
stimulate others towards self-reection and self-judgement
Level 1: think in line with other people and talk about it
Level 2: motivate and stimulate others to learn
Level 3: stimulate others and increase their self-condence
6. Problem solving
Problem solving concerns helping others to solve their problems:
anticipate problems within or out the team or school
analyse problems: what is the real question?
help others when they are not able to solve their own problems
Level 1: detect problems
Level 2: solve problems
Level 3: help others to solve their problems
7. Cooperation
Cooperation concerns the sustainment of teambuilding:
create conditions for a cooperative (learning) environment for pupils
create an open atmosphere
be trustful
take differences between people into account and talk about difcult
issues in the team
stimulate others to learn from one another and help them with this
Level
Level
Level
Level
1:
2:
3:
4:
53
contribute to team goals

be responsible for team goals
look for cooperation with others
sustain teambuilding
8. Results based acting

Results based acting concerns making sure plans can be realized (SMART):
assess the progress of activities and lead the team, when necessary
help others dene SMART-goals
assess the quality of new educational products by means of systematic
evaluation
Level 1: dene and realize goals
Level 2: stimulate and lead others
Level 3: make sure plans can be realized (use of SMART goalsa)
(Drucker, 1954)
a
SMART goals are specic, measurable, attainable, relevant and time-bound.
educational environment, even though this is a relatively small

part of the competence prole. In order to assess the classroom
activities as well, the school management decided to use the
existing questionnaire on teacher behaviour, the QTI (den Brok
et al., 2003). Second, senior teachers take on many outside-theclassroom activities, such as innovation projects and coaching
younger teachers. Therefore, the development team formulated a
competence prole for senior teachers with eight competences,
specically, having an added value above the seven national
competences for teachers.
In total, the competence prole for senior teachers thus
consisted of the seven teacher competences developed by the
government and the eight competences developed by the school
development team. The competence prole for senior teachers
includes competences like cooperation, dealing with stress,
problem solving and coaching. These were chosen because of
the t with the specic school-situation (Berliner, 2005) and
because literature shows that these competences add towards
educational quality (e.g. Brophy & Good, 1986). Next to this,
competences like innovating, learning, anticipating and resultsbased acting were put into the competence prole because of the
innovative developments within the educational eld, like the
teacher-researchers creating the opportunity to realize a critical,
reective attitude towards their practice (Zeichner & Noffke, 2001).
Personality-related characteristics were not included in the
competence prole for senior teachers because of the speculative
relation with teaching ability (Damon, 2007). Finally, the meaning
of the eight competences was described by specifying a number of
ascending levels distinguished within each competence, in order to
describe these as specic as possible. For senior teachers the
highest levels are relevant.
Step 2: specication of criteria and standards
For the seven basic competences for (junior) teachers, quick
scans and internet self-assessment tools for teachers are available.
The focus of our paper was on the newly developed competences
for senior teachers. Following, criteria were needed in order to
validly assess the additional competences of senior teachers
(Uhlenbeck, 2002). Assessment is a comparative process which
requires a frame of reference, with unambiguous denitions of
assessment criteria and standards (Damon, 2007). Therefore,
assessment criteria and standards had to be developed, matching
the eight competences for senior teachers. Standard-setting
studies show that standards are often contingent on the local
situation (Price, 2005) and are always subjective to some extent,
for example if they are determined by a group of experts in the
domain and thus rely on human judgement (Norcini & Shea, 1997).
This is the case in our study. A way of specifying standards as
objectively as possible, is the use of exemplars and verbal
descriptions (Sadler, 1998). Exemplars are key examples that
54
describe the desired level of prociency and are mostly used for
product evaluations. Verbal descriptions or qualitative rubrics
(Scriven, 1980) describe the properties characterizing the desired
level of prociency. These standards are context-specic and are
often the most feasible to use, especially when multiple criteria are
used (Sadler, 1998). A rubric is a scoring tool for qualitative rating
of authentic work and it includes criteria for rating important
dimensions of performance. It describes levels of performance on a
particular task and thereby denotes what is considered important
to both assessors and assessees. For assessors, it helps determine
what to look for when assessing (Jonsson & Svingby, 2007;
Tigelaar, Van Tartwijk, Janssen, Veldman, & Verloop, 2009). A
review of studies investigating the use of scoring rubrics shows
that rubrics can enhance the reliable scoring of performance
assessments, especially if they are analytic and topic-specic
(Heldsinger & Humphry, 2013; Jonsson & Svingby, 2007; Panadero
and Jonsson, 2013). Consistent with the Sadlers (1998) ideas, this
study also shows the benets of exemplars and adds the
importance of rater training. Rubrics, on the other hand, do not
automatically enhance the validity of performance assessments.
This requires not only that the content of the rubric adequately
represents the content of the construct to be assessed (in this case,
senior teachers competences), but also that, for example, the
mental processes used during the assessment are incorporated.
According to Jonsson and Svingby, very few studies on the use of
rubrics provide this kind of validity evidence, which implies that
the effect of rubrics on the validity of performance assessments is
not clear at the moment. In this study, we decided to describe a set
of rubrics for each competence, as research seems to mainly show
advantages of the use of rubrics. The teacher development team
has described a set of rubrics for each competence dened in the
competence prole for senior teachers, and they have also
developed the competence prole in the same way. Eventually,
based on consensus, the rubrics were xed for each level (see
Table 1 for the eight competences and the rubrics).
Step 3: determination of the assessment program parts
The competence prole and rubrics were the starting point for
the further development of the assessment program for senior
teachers. The rst two steps described the development of the
competence prole and the rubrics, dening the content (what is
assessed). The next, third, step is about the way how this content
could be assessed. The choice of assessment methods largely
determines the validity of the assessment process, as the methods
should adequately measure the construct at stake (Messick, 1995).
A single assessment would probably not be sufcient to validly
assess senior teachers competences. A mix of methods should be
used instead (Baartman et al., 2007; van der Vleuten & Schuwirth,
2005), because it reveals additional insights in comparison with
one single assessment method, gaining input from qualitative as
well as quantitative data (e.g. Spillane, Pareja, Dorner, Barnes, &

May, 2009). Others propose a longitudinal process involving
various methods in order to gain a rich picture of teachers
knowledge and performance (Berliner, 2005; Darling-Hammond &
Snyder, 2000). Ideally, measurements like observations, interviews
and questionnaires should be repeated over time. The reliability of
assessment can also increase when using different information
sources, like peers, the teachers themselves, management and
experts (Uhlenbeck, 2002). In this study a combination of
assessment methods was chosen in order to combine the strong
aspects of the different assessment methods. This study uses an
assessment program (Baartman et al., 2007), consisting of a mix of
methods: (1) observations and questionnaires; (2) interviews; and
(3) portfolio assessment. Different groups of stakeholders (management, pupils, colleagues, the senior teachers themselves and
two experts), were involved in the judgement of the senior
teachers competencies in a pilot study of the assessment program.
Table 2 summarizes the different methods used in the assessment
program. These assessment methods and the rationales for
choosing each method are described more in-depth below.
Observations and questionnaires: background and rationales
Observations are a powerful method to assess teachers quality
because authentic teacher behaviours can be judged in the in vivo
context (Chen et al., 2011; Landy & Conte, 2013). However,
observations by schooled assessors are not very practical and quite
expensive (Bakx, van der Sanden, Sijtsma, & Taconis, 2002). Pupils
and colleagues can also play a role observing their teachers.
Questionnaires can be used in order to judge competences of the
potential senior teachers. The use of questionnaires, when using
transparent, clear and uni-dimensional items, can be a rich and
standardized method for assessing teachers competences (Landy
& Conte, 2013). Also, for the judgement of teachers behaviour in
class validated questionnaires are often used, like the QTI
(questionnaire on teacher interaction) (den Brok, Brekelmans, &
Mainhard, 2010; Levy, den Brok, Wubbels, & Brekelmans, 2003;
Telli & den Brok, 2012). Using questionnaires helps the observers
tuning their perspectives towards certain aspects of the senior
teachers competences. Having pupils observe their teachers using
a standardized questionnaire can lead to richer insights in
teachers classroom behaviour and can reveal aspects which might
other ways remain implicit (Burden, 2010). In order to use pupils
observations as part of the assessment of teachers competence, a
large group of pupils is needed to prevent bias (Damon, 2007). The
other competences, like cooperation within the teaching team, can
be validly judged by observations by colleagues and managers. The
observation of teachers by their colleagues can be seen as peer
assessment. Different studies show benets of peer assessment,
with regard to professional development and reection by the
peer-assessor as well as the assesse (Sadler & Good, 2006), even
though observations done by peers always have the problem of
Table 2
Mix of methods for the assessment of senior teachers competences.
Method
Target group
When
Conditions
Observation questionnaire
Pupils
Twice, during a period of half a year
Observation questionnaire
Colleagues
Twice, during a period of half a year
Questionnaire
Portfolio development
Senior teacher
Senior teacher colleagues
Once, at the start of their portfolio development

During a period of half a year
Portfolio assessment
External experts
Interview
External experts
At the end of the assessment period/three

weeks after receiving data and portfolio
At the end of the assessment period, after
data-analysis of data and portfolio
- Group of pupils (25)

- More than once (1)
- Group of colleagues (4)
- More than once (1)
- Standardized instruction
- Portfolio guidelines
- Independent experts
Standardized questionnaire (www.ivlos.uu.nl).
- Independent experts

Table 3
Number of items and a typical item for the QTI-scales (den Brok et al., 2010).
Scale
Number
of items
Typical item
Leadership
Helpful/friendly
Understanding student
10
10
10
Responsibility/freedom
Uncertain
Dissatised
Admonishing
Strict
9
9
9
9
S/he is a good leader

S/he is someone we can depend on
If we have something to
say s/he will listen
S/he gives us a lot of free time in class
S/he seems uncertain
S/he is suspicious
S/he gets angry
S/he is strict
sympathy of the observed person. That is why at least four peers,

working together with the teacher on a regular basis, should be
involved in order to gain reliable and valid ratings (e.g. Sluijsmans,
Brand-Gruwel, & Van Merrienboer, 2002).
Practice of observations by pupils in the pilot study
For the observation by pupils, a standard questionnaire was
used, measuring pupils perceptions of interpersonal teacher
behaviour (Wubbels, Brekelmans, & Hooymayers, 1991). This
questionnaire assesses teachers interpersonal behaviour, using
nine aspects. Table 3 presents the scales, the number of items and a
typical item for each QTI-scale (den Brok et al., 2003). These nine
aspects are related to the competencies from the seven basic
competencies (especially interpersonal competence and pedagogic
competence) as well to newly dened competencies (especially
exibility/anticipating, dealing with stress and problem solving).
For each senior teacher being assessed in the new assessment
program (eight teachers in total), 20 pupils carried out the
observations and lled out the QTI-questionnaire. The QTIquestionnaire (Wubbels et al., 1991) could be lled out by the
pupils anonymously. Therefore, during twelve weeks, the pupils
observed their teachers. After this period they completed the
questionnaire online in the computer-classroom under guidance of
an ICT-assistant for the rst time. This was repeated half a year
later.
Practice of observations by colleagues and management in the pilot
study
The competence prole for senior teachers together with the
rubrics was transformed into questionnaires by the researchers.
This questionnaire used a 4-point rating scale, varying from my
senior colleague does this (1) almost never to (4) very often. The
teacher colleagues and school management used this questionnaire in order to rate the competences of the senior teacher whom
they observed. Four different colleagues and one manager
observed one senior teacher. The colleagues completed the
questionnaire twice, with an interval of six months.
Interviews: background and rationales
Next to observations and questionnaires, interviews are a valid
method for the judgement of competence, especially in combination with other methods and information (Landy & Conte, 2013;
Schmidt & Hunter, 1998). It is important that at least two experts,
not attached to the school in any other way, conduct the interviews
(van der Schaaf et al., 2005). The inclusion of two experts instead of
one increases reliability (Murhpy & Davidshofer, 1994) and is
sufcient to produce acceptable levels of inter-rater agreement
(Marzano, 2003).
Practice of interviews in the pilot study
In the pilot study two experts from the teacher-training college
were hired as external assessors. Based on an analysis of all data
available, being the questionnaires of pupils, colleagues and
55
management and senior teachers portfolios (as described below),

these experts could hold interviews with each individual senior
teacher in order to judge their level of competences. The individual
interviews took about 1.5 h each. The interviews were explicitly
directed on (1) the conrmation of evidence for already proven
competences from the materials analysed; and (2) on the further
exploration of competences which were unclear, or not yet
proven sufciently.
Portfolio assessment: background and rationales
By constructing a portfolio, the senior teachers themselves
could be actively involved in the assessment. In this portfolio they
described and reected on their own strengths and weaknesses
(e.g., van der Schaaf et al., 2005). The process of working on this
portfolio asks for reection and introspection. Research shows that
the use of rubrics leads to learning (Boud, 1995; Jonsson & Svingby,
2007) and possibly even to a professional growth (Hatton &
Schmith, 1995). For this to happen, the criteria, format and
guidelines for the portfolio should be transparent and clear (Linn,
Baker, & Dunbar, 1991; van der Schaaf & Stokking, 2008). From
other studies it is known that asking colleagues for feedback on
competences, can lead to learning experiences as well (Hattie &
Timperley, 2007).
Practice of portfolio assessment in the pilot study
For the pilot study, a portfolio manual was written by the
external experts, containing guidelines, ll-in factsheets, and the
competence prole for senior teachers with rubrics, as being the
criteria the teachers should judge themselves on. The teachers had
to prove their competences in at least two different situations.
These two situations should be additional to what already could be
known about the teachers from the other measurements. Next,
these situations should be described in detail, so the experts would
be able to visualize the situation and ask specic (check)questions
on this during the interview, and colleagues would be able to write
feedback (or a specic addition) regarding the situations described.
The addition of written feedback by at least two colleagues for each
situation described was put into the portfolio. As mentioned
earlier, the entire assessment program should offer learning
opportunities for the people involved, because it is an expensive
and time consuming process all together. Indeed, the complete
assessment should not only result in a judgement, but it should
also have developmental possibilities for the teacher during and
after the assessment. For the portfolio, the teachers completed the
pupil questionnaire on interpersonal teacher behaviour at the
start of the pilot study assessing the way they viewed their own
behaviour with pupils in the class. Next, the senior teachers proved
the eight competences by completing an evidence form (what is
your evidence for this competence, why is this convincing). This
evidence could be a video from a good practice, a series of lessons, a
manual they developed and so on. Together with this evidence,
written feedback from colleagues was added. For writing this
feedback, the teachers colleague(s) used the rubrics from the
competence prole.
Step 4: pilot study
As introduced above, a pilot study was organized to evaluate
the working of the assessment program in practice and the
acceptance of the program within the school team. Validity and
reliability are the most widely used quality criteria for assessment,
but just these two criteria are not sufcient when it comes to
assessing competence (see also Baartman et al., 2007). Several
authors have proposed other or complementary quality criteria,
focusing for example on the meaningfulness of the assessment for
learning or the quality of the feedback it provides (e.g., Baartman
56
Table 4
Twelve quality criteria for competence assessment programs (Baartman et al., 2007,
p. 261).
1. Acceptability
All stakeholders should approve of the assessment methods, criteria
and standards
2. Authenticity
The degree of resemblance of the assessment to the (future) workplace
3. Cognitive complexity
The assessment should reect the presence of the cognitive skills needed and
should enable the judgement of thinking processes
4. Comparability
The assessment should be conducted in a consistent and responsible way.
The tasks, criteria and working conditions should be consistent with regard
to key features of interest
5. Costs and efciency
The time and resources needed to develop and carry out the assessment,
compared to the benets
6. Educational consequences
The degree to which the assessment yields positive effects on learning and
instruction and the degree to which negative effects are minimized
7. Fairness
Teachers should get a fair change to demonstrate their competences, by
letting them express themselves in different ways and making sure the
assessors do not show biases
8. Fitness for purpose
The assessment methods, criteria and standards should be compatible with
the construct to be measured
9. Fitness for self-assessment
The assessment should stimulate self-regulated learning by fostering selfassessment and the formulation of learning goals
10. Meaningfulness
The assessment should be a learning opportunity and provide valuable
feedback for further learning
11. Reproducibility of decisions
Decisions made based on the results of the assessment should based on
multiple situations and assessors. Decisions should not depend on one
assessor or specic situation
12. Transparency
The assessment, criteria and standards should be clear and understandable
to all stakeholders
et al., 2007; Linn et al., 1991). Quantitative measures of quality are

often not available for these kinds of assessment programs,
necessitating other operationalizations of validity and reliability
(Baartman et al., 2007). The assessment program described in this
paper was evaluated by using 12 quality criteria for the
determination of the quality of competence assessment programs.
Of course, the assessment parts should be valid, reliable and
objective (traditional criteria). The rationale behind using the 12
new criteria is that competence assessment consists of both more
traditional and new forms of assessment, and as a consequence,
both traditional and new quality criteria are needed to evaluate the
quality of the assessment. Table 4 presents a description of the
quality criteria used in this study (Baartman et al., 2007, p. 261).
Method
The study presented in this paper describes pilot study of the
assessment program, including an evaluation of the assessment
program as well.
Instruments
The competence prole for senior teachers was transformed
into a questionnaire, as described above and was used by the peer
teachers, the management and the teachers themselves. Next, a
standardized questionnaire on teacher behaviour was used, the
QTI (questionnaire on teacher interaction) by the pupils and the
senior teachers themselves. This questionnaire is a validated and
reliable instrument used in many other (international) studies
already (den Brok et al., 2010; Levy et al., 2003; Telli & den Brok,
2012). The scales and numbers of items of the QTI are presented in
Table 3.
Evaluation instruments of the (perceived) quality of the assessment
program
To evaluate the quality of the entire assessment program the 12
quality criteria of Baartman et al. (2007) were used (Table 4
presents the categories). In a previous study, these quality criteria
were specied into 46 indicators per quality criterion (Baartman
et al., 2007), which were used as questions in a questionnaire in
this study. The participating senior teachers and their colleagues
judged the quality of the assessment program on a 10-point Likert
scale. The pupils could ll in four of the twelve quality criteria: (1)
tness for purpose; (2) transparency; (3) fairness; and (4) (costs
and) efciency. These four were chosen, because these are the most
visible for the pupils, for example, if the criteria were clear to them
and if they thought the criteria represented their opinion of a good
teacher. Next, four questions were added to the pupils questionnaire in order to receive information about the pupils perspective
on the usefulness of the assessment program.
Participants
Eight senior teachers participated as assesses in the pilot study
of the assessment program: six men and two women. The age of
the teachers varied between about 30 years and 63 years of age. All
teachers taught a subject like maths or languages, and one teacher
taught physical exercise and had managerial tasks next to her
teaching tasks. All teachers had gained at least ve years of
teaching experience. In the evaluative part of the pilot study, which
was not obligatory, seven out of the eight senior teachers
completed the evaluation questionnaire of the quality of the
assessment program.
For each participating senior teacher, pupils of two classes rated
their teachers. They carried out observations and lled out the QTIquestionnaire. In total 170 pupils participated. The pupils varied in
age between 14 and 17 years old. Participation of the pupil groups
was obligatory, so the response rate was close to 100%. 70 out of
the 170 pupils also completed the evaluation questionnaire on a
voluntary basis.
Four different colleagues observed each single senior teacher;
in total 32 teachers participated in the new assessment program as
observers and 16 other teachers helped providing written
feedback. In total 48 teachers were involved in the assessment
program of their eight colleagues. Only 4 out of the 48 peer
teachers completed the evaluation of the quality of the assessment
program. This difference in participation between the pilot study
and the evaluative part of the pilot study might be due to the
period of the year (at the end of the second semester just before the
summer holidays) and the fact that peers and pupils were invited
to participate on a voluntarily basis.
Data analyses
For the assessment of the senior teachers competence,
available data were (1) results of pupils on the QTI-questionnaire,
together with the scores of the senior teachers themselves on the
QTI-scales; (2) the scores of the questionnaires on the competence
prole for senior teachers, completed by the colleagues; and (3)
portfolios of the senior teachers including the feedback by peer
colleagues. The questionnaires from the colleagues were analysed
by computing mean scores per competence (varying between 1
and 4) that subsequently were computed into percentages,
indicating on a 100% scale how often the teacher showed a
specic competence. In order to make a nal judgement on the

senior teachers competences, the two experts used the following
criteria: (1) results on the pupils questionnaire should be positive,
being in line with the (national) norms of the QTI showing no large
negative differences with the national average scores; (2) the
results from the peer teachers questionnaires should at least have
scores of 60% for each competence, being a positive result; (3) the
evidence for each competence as included in the portfolio should
be valid, reliable and convincing according to the two experts. For a
positive judgement, teachers should score positively on all three
parts.
For the assessment of the quality of the assessment program,
available data were the results of the pupils evaluation
questionnaires and results on the evaluation questionnaire lled
in by the senior teachers themselves and their peers. Quantitative
data from the evaluation questionnaire were available from 70
pupils, seven of the eight senior teachers and four colleagues.
Means and standard deviations were calculated.
Evaluation of the assessment: results
The rst results concern the pilot study of the assessment
program, describing the assessment of the competences as well as
the provision of the opportunity for self-reection. Second, the
evaluation of the assessment program is presented.
Pilot study of the assessment program
Senior teachers competences
The separate parts of the assessment program, the observations,
questionnaires, portfolios and interviews all pointed out towards
the same direction: a senior teacher does or does not show the
competences as formulated in the competence prole for senior
teachers. In none of the cases, the results of the different parts of
the assessment program contrasted each other. Table 5 presents
the overall results of the teachers on the separate parts of the
assessment.
Five teachers proved to be competent for the senior role as
described in the competence prole for senior teachers. The two
assessors judged this independently from each other based on the
portfolio assessment. This positive judgement was conrmed by
the interview with these teachers. In contrast, two other teachers
had not been able to prove their competences by means of their
portfolios; the judgement by the experts was doubtful. However,
these two teachers were capable of adding extra material during
the assessment interview by providing extra information on the
evidence provided and by adding critical incidents during the
interview. Eventually, the interview turned their judgements into
57
positive ones. For the last teacher, the portfolio was insufcient
and even if there would have been an interview with additional
materials, it could not have led to a positive judgement. Therefore,
the assessment interview was cancelled and this teacher was
requested to construct a new portfolio. In an evaluation interview,
the eight teachers, even the one without a positive judgement,
stated that they recognized the advice and judgement.
Opportunity for reection
Working with portfolios regarding professional development is
considered valuable when there is a dialogical context. This
dialogical context was created by having the senior teachers ask
their peers for written feedback. Next, an interview was held with
the senior teacher and two experts. All teachers stated that the
entire process helped them reect on their profession, their
behaviour and their actions undertaken. Seven out of the eight
senior teachers told the experts that it was a developmental
process for them to work on the portfolio because of the gathering
of evidence proving the competence, reecting on the competences and writing down, asking peers for feedback and discussing
this with the experts. One senior teacher, who did not receive a
positive judgement, did not agree with the other teachers on this.
He stated that the assessment program also judged the way one
could build up a portfolio and use one writing skills, and not only
the senior teachers competences.
Evaluation of the assessment program
Mean scores of the evaluation of the quality of the assessment
program as judged by the teachers and peer teachers on a 110
scale are presented in Table 6. The criterion acceptability (i.e. all
stakeholders should approve of the assessment methods, criteria
and standards) showed the lowest score. The teachers who had
been assessed, as well as the teachers who participated in the peer
assessment, did not completely support the assessment program
used (teachers M = 5.67, colleagues M = 4.75). Especially the
teachers who participated in the peer assessment reported low
scores on the acceptance of this method. The criterion fairness
also showed low scores within both groups (teachers M = 5.62,
colleagues M = 5.51). This criterion comprises questions like do
you think the assessment is fair and are the assessors
unprejudiced. The assessed teachers also reported low scores
on the criterion educational consequences. They stated that this
assessment program did not really inuence their professional
behaviour (M = 5.82). The peer assessors on the other hand stated
that participation in the assessment program did inuence the
teachers professional behaviour (M = 8.88). Next, the assessed
teachers reported that the assessment program was suitable for
self-reection (M = 7.43), which was part of the aim of the
Table 5
Results on different parts of the assessment of senior teachers competences.
Method
Target group
Overall results of the senior teachers (N = 8)
Observation questionnaire T1a

Observation questionnaire T2b
Pupils
Pupils
Observation questionnaire T1a
Colleagues
Observation questionnaire T2b
Colleagues
Questionnaire (self)
Portfolio assessment
Senior teacher
External experts
Interview
External experts
Positive results for all 8 teachers

Positive results for 7 teachers, showing even better results than in T1
Negative results for 1 teacher, compared to T1
Positive results for 7 teachers
Negative results for 1 teacher
Positive results for 7 teachers, showing even better results than in T1
Negative results for 1 teacher, the same as in T1
Positive judgement of self by all 8 teachers
Convincing evidence for 5 teachers
Additional evidence on two competences needed for two teachers
Insufcient evidence for 1 teacher
Positive on all competences for 7 teachers
One interview was postponed; additional evidence was needed rst (judgement: insufcient)
a
b
T1 = rst measurement.
T2 = second measurement (6 months after rst measurement).
58
Table 6
Mean scores and standard deviations (SD) on the 12 quality criteria for assessed teachers and peer assessors (110 scale).
Criteria
Teachers (N = 7)
Peer assessors (N = 4)
Number of items
Mean
SD
Mean
SD
Acceptability
Authenticity
Cognitive complexity
Comparability
Costs and efciency
Educational consequences
Fairness
Fitness for purpose
Fitness for self-assessment
Meaningfulness
Reproducibility of decisions
Transparency
5.67
7.14
6.69
6.92
5.87
5.82
5.62
7.30
7.43
6.32
7.38
6.57
2.61
1.80
1.79
2.50
1.88
2.93
0.83
1.06
1.47
2.50
1.16
1.17
4.75
6.25
6.73
6.88
8.88
5.51
6.65
6.00
6.25
7.75
6.75
2.99
1.50
1.72
3.01
0.88
2.28
2.29
1.75
2.54
1.77
1.52
3
2
5
4
6
4
6
6
4
4
6
3
Table 7
Pupils means scores and standard deviations (SD) on four quality criteria (N = 70, 110 scale).
Criterion
Mean
SD
Number of items
Fitness for purpose

Transparency
Fairness
(Costs and) efciency
7.32
7.75
8.16
8.04
2.21
1.62
1.48
1.52
5
1
2
2
Additional questions
A good assessment with pupils participation
Adequate questions about my teacher
Questionnaire is a way of giving feedback
This assessment leads to a change in behaviour of my teacher
8.21
7.18
7.11
3.35
1.63
1.61
1.89
2.86
3
1
1
1
assessment program. Especially the portfolio was designed to

stimulate the teachers to reect on their own competence. All
eleven teachers (7 senior teachers and 4 peer teachers) reported
that the assessment program led to reproducible judgements and
decisions (resp. M = 7.38; M = 7.75), which is a measure of
reliability. Another measure of reliability is comparability, which
is the use of comparable methods, criteria and standards for all
assesses. According to the assessed teachers and the peer teachers,
the assessment program was indeed comparable (M = 6.92;
M = 6.88 respectively). The (peer) teachers also reported that the
assessment program was suitable for the aim set (tness for
purpose: teachers M = 7.30, peers M = 6.65), which was judging
whether a teacher was a real senior teacher having all competences
described.
Table 7 reports the scores of the evaluation of the assessment
program by the pupils. In total, 70 pupils lled out the evaluation
questionnaire in which a 10 point scale was used. All four criteria
measured showed high scores (see Table 7). The pupils reported
that they understood the questionnaire about their teachers
interpersonal behaviour and that they understood the goal of it,
namely to judge the best teachers in school for senior positions,
also giving the pupils a voice in this. The assessment program (as
far as the pupils participate in it) was fair and transparent
according to the pupils.
The pupils reported that they appreciated the fact that they
could participate in the assessment and give their judgement of the
teacher (M = 8.21). They stated this is a way of giving feedback to
their teachers (M = 7.11). Pupils did not perceive changes in
teachers behaviour (M = 3.34).
Conclusion and discussion
The study presented in this paper focused on the development
of an assessment program for senior teachers, while providing the
opportunity for self-reection by the senior teachers. The
development process contained four steps. The rst step concerned determining the content of the competences to be assessed.
In the second step specication of criteria and standards was
undertaken and in the third step methods were chosen for carrying
out the assessment program. The assessment program was
implemented in a pilot study, assessing eight senior teachers.
Theoretical frameworks on good teachers do not present one
specic view on good teachers (Berliner, 2001; Fenstermacher &
Richardson, 2005). Therefore, three theoretical perspectives on
good teaching can be recognized in the nal competence prole for
senior teachers competences. The prole included aspects from (1)
perception studies of ideal teaching, including learning environment research (Allen & Fraser, 2007); (2) effectiveness research
(e.g. Seidel & Shavelson, 2007); and (3) studies on teachers
professional knowledge (e.g. Berliner, 2004; Darling-Hammond &
Snyder, 2000; Verloop, 2005). The literature on good teachers was
presented to the development team and the team also used their
own (literature) resources from e.g. professional development
programs. A specic aim was that the school team would recognize
the new assessment program and that there would be a strong
commitment towards using it. As a consequence, a school-specic
competence prole was developed by the schools development
team. This is a rather eclectic approach, using competences tting
to the specic school context, mostly chosen bottom up. Berliner
(2005) described the importance of taking specic demands of the
school environment into account. The school management agreed
a rather eclectic approach in choosing the competences, in order to
create a larger commitment of the team.
The assessment program had two goals: judgement of senior
teachers competences and creating an opportunity for reection
on their competences by the senior teachers participating. The
senior teachers stated that the assessment program did not really
inuence their professional behaviour, but they recognized the
possible inuence of the assessment program on their professional
behaviour as teachers. They mentioned the possibility to reect on
their own competence development while working on their

portfolio. This forced them to make their competences explicit.
With this reection function, the assessment program although
having a specic summative goal had a formative purpose as well
(Hickey, Zuiker, Taasoobshirazi, Schafer, & Michael, 2006).
Opportunity for reection
A portfolio can play an important role in the professional
development of teachers, and not only in case of an (external)
judgement. Working with portfolios regarding professional
development is considered valuable when there is a dialogical
context. If so, a portfolio can be considered an assessment tool
for learning. However, when there are no reective discussions
regarding the portfolio, it is less valuable (Mittendorff, Jochems,
Meijers, & den Brok, 2008). This might be the case in our
assessment program: only one interview was conducted in
which the teacher could talk about and explain his or her
contribution in the portfolio. It is not unlikely that extrinsic
motivation played a role considering the portfolio assignment.
The teachers completed their portfolios in order to gain another
position/salary in the school. The participating teachers stated
that the process of working on their portfolios is a very good
way of self-reection (see also Boud, 1995). However, the other
colleagues were not convinced of this; they stated that working
on a portfolio might contribute to self-reection, but that this
does not has to be the case for everyone. In order to achieve an
actual and long-lasting effect on teacher behaviour and
professionalization, teacher assessment should be integrated
into a larger personnel evaluation system. This could be done by
having all teachers work on a professional portfolio, describing
their professional development and reecting on their professional identity as a teacher (Beijaard et al., 2004). This portfolio
can then be the base of the annual dialogue of the teacher and
the management.
Indeed, a pilot study of the assessment program showed that a
good view of senior teachers competences, as described in this
study, seemed to be gained in this way. Seven out of eight senior
teachers who were subjected to the assessment indeed demonstrated the specied senior teachers competences. The results of
the different methods within the assessment program all pointed
in the same direction. Multiple different methods, assessors and
pieces of evidence were used to demonstrate teacher competence, assuring triangulation of methods and assessors. The fact
that observations, questionnaires, interviews and portfolios all
pointed in the same direction (a positive or negative judgement
of the senior competence) is a rst starting point for the
determination of construct validity (Murhpy and Davidshofer,
1994). However, in the pilot study a rather small, selective group
of teachers participated. Half of them participated in the
development team of the competence prole as well, which
might have inuenced their views on the assessment program in
a positive way. The fact that the eight best teachers were
selected by the school management for participating in the pilot
study might be of some inuence of the (positive) ndings. In
order to validate the assessment program further on, it could be
implemented in other, comparable schools, who did not
participate in the development process, but who also need a
system for personal management and high stake assessments as
required by the Dutch government.
It remains an interesting question whether these kinds of
assessment tools should be theoretically driven, practically driven,
or could be a combination of both, as in our study. We assume that
this combination works best in order to gain a valid competence
prole as well as an increased commitment of the team. While
working mainly theoretically driven might lead towards a not
59
invented here-problem, ending up with a rejection of the

assessment program the school team. By having the best teachers
participate in the development team and the input of literature, a
certain gathering of competences should have been prevented,
but this is possible weakness of our study. This way of working
might be interesting for others to try out and could be interesting
as a subject for further research as well. It might also be interesting
for future research to compare our approach to more theoretically
driven approaches while assessing teachers, taking the acceptance
and commitment towards the assessment program of the school
team in mind.
Evaluation of the assessment program
The judgements of the eight senior teachers were recognized
and accepted by the teachers assessed as well as by their peer
teachers. The development process was carried out with the four
steps by the school development team, in order to create a large
commitment of the school team towards this new assessment
program. As a consequence, the pilot study with the rst
implementation of the assessment program was evaluated. This
evaluation needs to be interpreted with some caution because only
four peer assessors (of the 48 peer assessors participating in the
assessment program) and 70 pupils (of 170 in total) participated in
the evaluative part of this study. This difference in participation
might be due to the period of the year (at the end of the second
semester just before the summer holidays) and the fact that peers
and pupils were invited to participate on a voluntarily basis. School
management reported that it was too busy that time of year to gain
a higher response.
The participating teachers (teachers assessed and the peers)
reported that the assessment program was suitable for providing a
competence judgement, but that the acceptance of the assessment
methods was rather low and that the commitment towards this
way of competence assessment was not very high. This could be
partly due to the fact that the assessment methods used were quite
time-consuming. Teachers who judged their colleagues invested
time and effort, but did not perceive their participation as useful
for themselves. Indeed, peer assessment can be valuable for both
parties (teacher and peer), but dialogue and exchange are
important conditions for learning from each other in school teams
(Doppenberg, den Brok, & Bakx, 2012). However, in order to reach
this dual learning effect for teachers assessed as well as their peers,
specic goals on this should be set, explained and guided. This was
not done in the pilot study, but can be a valuable suggestion for
others who would use a comparable assessment program. Even
though the ndings of the study do not support some of the
outcomes hoped for (in particular the development of a summative
assessment program for senior teachers), it does seem to offer a
sound methodology for the development of such a program within
each school context. If all teachers in a school site utilized the
method for developing the competencies, then acceptance of the
outcomes of the evaluation might be more readily embraced by the
teachers.
For additional research, it is interesting to gain more insights
into the possible psychological rationales in order to nd out why
the teachers resist supporting the assessment program. For this
purpose, another questionnaire could be developed, investigating
the possible psychological causes of resistance. Especially open
ended questions could be useful in order to do so. This may
facilitate an understanding of why teachers did not quite accept
this assessment method. This might help to rene the program for
the future if the underlying causes for their resistance are
identied. From other studies on educational innovation, it is
known that teachers are more positive when given ownership,
agency and logical sense-making (Ketelaar et al., 2012). In order to
60
create acceptance for the assessment program, a large group of

teachers and the management were brought together both to
create a valid prole as well as a commitment for the use of this
competence prole for senior teachers as the underlying basis for
the assessment program. However, not all teachers were involved
in the development process and the teachers had not been involved
in choosing the assessment methods. Especially these methods
(portfolio, observation, questionnaires and interview) were timeconsuming and involved many peers and pupils in order to
establish one valid judgement of a senior teachers competences.
The low acceptance could be due to the large amount of time, effort
and money spent on the assessment of relatively few people
(Wiliam, Lee, Harrison, & Black, 2004). The pupils were more
positive in this respect. They understood and recognized their part
of the assessment program and appreciated the possibility to give
feedback to their teachers. They appreciated the fact that their
opinion was asked for and perceived the assessment program as
fair, transparent and suitable for the purpose of assessing senior
teachers competence. The difference between teachers and
pupils judgements may be due to the fact that pupils voices
were heard and that they had a serious role in the assessment of
their teachers, which was not a role that pupils commonly get.
Because of the anonymity of the methods, the pupils could be
honest about their opinions of their teachers, without fearing
negative consequences.
A question that remains difcult to answer is whether
colleagues are capable of judging each other objectively when
it considers a summative assessment. The participating peers
were in no way dependent on each other in the teaching team, or
connected in a hierarchical relation. However, it is possible that
colleagues who like each other, judge each other more positively.
Observations done by peers have the problem of sympathy of the
observed person. This is especially important when it concerns a
summative assessment with salary consequences, like in this
study. The same question that can be asked is whether pupils are
capable of judging their teachers? In order to reduce the possible
bias by colleagues and pupils, three actions were undertaken: (1)
judgements were carried out anonymously; (2) relatively large
groups of pupils and peers were asked to participate in the
assessment (Damon, 2007); and (3) external assessors were
included in the assessment program. The assessed teachers were
asked to add evidence to their portfolios to prove all eight
competences which were analysed by the external assessors and,
together with the results from the pupil questionnaire, the
questionnaires of the colleagues and the portfolio, formed the
basis for the nal individual interview. The expert judgement and
the peer judgement produced comparable results, which is a rst
indication that peer teachers and pupils could play a role in the
summative assessment of their colleagues. However, more
research is needed in this respect, for example by comparing
the judgements of colleagues who like or dislike each other.
For formative purposes, these reliability issues are less of an
issue. In case of assessments for formative reasons, colleagues
could judge one another as a starting point for professional
development and intervision. The organization could create a
culture in which collegial consultation and reection on
professional behaviour are generally accepted and appreciated
and in which learning from each other is a central goal
(Doppenberg et al., 2012).
Summarizing, the assessment program for senior teachers
showed that competences of working teachers can be assessed
during their teaching career, with this program. First indications of
validity and reliability were positive, but the acceptance of the
program by the school team was rather low and should be
investigated further. Future steps to be undertaken in order to
improve the assessment program should contain the validation of
the competence prole, acceptance of the assessment program and a

possibility to lower costs and time-investments. A further validation
of the competence prole could be done by external experts and
teachers, combining a theoretical approach and practical input. This
could possibly improve and specify the competence prole further.
Next, in the school conditions for learning from each other and using
portfolios as a means for professional development, could contribute
towards a decrease of time and effort, because a portfolio would then
be a growing document all teachers already have. This portfolio
would then be the base of the annual dialogue between teacher and
school management. When a teacher would be selected to
participate in the assessment program, then only an addition in
the portfolio with peer feedback would be needed. Indeed, this
requires a change in school culture, directed at a learning
organization. Our study provides a rst direction for the development of an assessment program that would t in a learning
organization.
References
Allen, D., & Fraser, B. J. (2007). Parent and student perceptions of classroom learning
environment and its association with student outcomes. Learning Environments
Research, 10(1), 6782.
Baartman, L. K. J., Prins, F. J., Kirschner, P. A., & Van der Vleuten, C. P. M. (2007).
Determining the quality of competence assessment programs: A self-evaluation
procedure. Studies in Educational Evaluation, 33, 258281.
Bakx, A.W.E.A., van der Sanden, J. M. M., Sijtsma, K., & Taconis, R. (2002). Development
and evaluation of a student-centred multimedia self-assessment instrument for
social-communicative competence. Instructional Science, 30, 335359.
Baumert, J., Kunter, M., Blum, W., Brunner, M., Voss, T., Jordan, A., et al. (2010).
Teachers mathematical knowledge, cognitive activation in the classroom, and
student progress. American Educational Research Journal, 47(1), 133180.
Beijaard, D., Verloop, N., & Vermunt, J. D. (2000). Teachers perceptions of professional
identity: An exploratory study from a personal knowledge perspective. Teaching
and Teacher Education, 16(7), 749764.
Beijaard, D., Meijer, P. C., & Verloop, N. (2004). Reconsidering research on teachers
professional identity. Teaching and Teacher Education, 20, 107128.
Beishuizen, J. J., Hof, E., van Putten, C. M., Bouwmeester, S., & Asscher, J. J. (2001).
Students and teachers cognitions about good teachers. British Journal of Educational Psychology, 71(2), 185201.
Berliner, D. (2001). Learning about and learning from expert teachers. International
Journal of Educational Research, 35, 463482.
Berliner, D. (2004). Describing the behaviour and documenting the accomplishments
of expert teachers. Bulletin of Science Technology and Society, 25, 113.
Berliner, D. (2005). The near impossibility of testing for teacher quality. Journal of
Teacher Education, 56(3), 205213.
Borko, H., Whitcomb, J., & Liston, D. (2009). Wicked problems and other thoughts on
issues of technology in teacher learning. Journal of Teacher Education, 60(1), 37.
Boud, D. (1995). Enhancing learning through self assessment. London/New York: Routledge Falmer Taylor & Francis Group.
Brophy, J., & Good, T. (1986). Teacher behaviour and student achievement. In M.
Wittrock (Ed.), Handbook of research on teaching (3rd ed., pp. 328375). New York:
Macmillan.
Burden, P. (2010). Creating confusion or creative evaluation? The use of student
evaluation of teaching surveys in Japanese tertiary education. Educational Assessment, Evaluation and Accountability, 22(2), 97117.
Chen, W., Mason, S., Staniszewski, C., Upton, A., & Valley, M. (2011). Assessing the
quality of teachers teaching practices. Educational Assessment, Evaluation and
Accountability, 24(1), 2541.
Clausen, M., Reusser, K., & Klieme, E. (2003). Unterrichtsqualitat auf der Basis hochinferenter Unterrichtsbeurteilungen: Ein Vergleich zwischen Deutschland und der
deutschsprachigen Schweiz [Quality of instruction based on high-inference analysis of lessons: A comparison between Germany and German speaking
Switzerland]. Unterrichtswissenschaft, 31, 122141.
Damon, W. (2007). Dispositions and teacher assessment: The need for more rigorous
denition. Journal of Teacher Education, 58(5), 365369.
Darling-Hammond, L. (1999). Teacher quality and student achievement: A review of state
policy evidence. Seattle: University of Washington, Center for Teaching and Policy.
Darling-Hammond, L., & Snyder, J. (2000). Authentic assessment in teaching in context.
Teaching and Teacher Education, 16, 523545.
Day, C., Sammons, P., Stobart, G., Kington, A., & Gu, Q. (2007). Teachers matter:
Connecting lives, work and effectiveness. Berkshire: Open University Press.
den Brok, P. J., Fisher, D., Brekelmans, J. M. G., Rickards, T., Wubbels, Th., Levy, J., et al.
(2003). Students perceptions of secondary science teachers interpersonal style in
six countries: A study on the cross national validity of the Questionnaire on
Teacher Interaction. NARST annual meeting 2003 March 2326, Philadelphia. Philadelphia: NARST.
den Brok, P. J., Brekelmans, J. M. G., & Wubbels, Th. (2004). Interpersonal teacher
behaviour and student outcomes. School Effectiveness and School Improvement,
15(3), 407442.

den Brok, P. J., Brekelmans, J. M. G., & Mainhard, T. (2010). The effect of students
perceptions of their teachers interpersonal behaviour on their educational outcomes: A meta-analysis of research with the Questionnaire on Teacher Interaction
(QTI). In Th. Wubbels, P. den Brok, J. van Tartwijk, J. Levy, & B. Fraser (Eds.),
International conference on interpersonal relationships in education (pp. 21)Eindhoven: TUe-UU-LU.
Doppenberg, J., Den Brok, P., & Bakx, A. (2012). Collaborative teacher learning across
foci of collaboration: Perceived activities and outcomes. Teachers and Teacher
Education, 28(6), 899910.
Drucker, P. F. (1954). The Practice of Management. New York: Harper.
Elbaz, F. (1991). Research on teachers knowledge: The evolution of a discourse. Journal
of Curriculum Studies, 29(1), 119.
Englert, C. S., & Tarrant, K. L. (1995). Creating collaborative cultures for educational
change. Remedial and special education, 16(6), 325336.
Fenstermacher, G. D., & Richardson, V. (2005). On making determinations of
quality in teaching. Revision of paper presented for the Board of International
Comparative Studies in Education of the National Academies of Science and the
National Research Council, Washington, DC. Teachers College Record, 107(1),
186213.
Gage, N. L. (1964). Theories of teaching. In E. R. Hilgard (Ed.), Theories of learning and
instruction: Sixty-third yearbook, Part I: National Society for the Study of Education
(pp. 268285). Chicago: University of Chicago Press.
Hamacheck, D. (1969). Characteristics of good teachers and implications for teacher
education. The Phi Delta Kappan, 50(6), 341345.
Hattie, J. (2009). Visible learning: A synthesis of over 800 meta-analyses relating to
achievement. London: Taylor & Francis.
Hattie, J., & Timperley, H. (2007). The power of feedback. Review of Educational Research,
77(1), 81112.
Hatton, N., & Schmith, D. (1995). Reection in teacher education: Towards a denition
and implementation. Teacher and Teacher Education, 11(1), 3349.
Hegender, H. (2010). The assessment of student teachers academic and professional
knowledge in school-based teacher education. Scandinavian Journal of Educational
Research, 54(2), 151171.
Heldsinger, S. A., & Humphry, S. M. (2013). Using calibrated exemplars in the
teacher-assessment of writing: An empirical study. Educational Research,
55(3), 219235.
Hickey, D. T., Zuiker, S. J., Taasoobshirazi, G., Schafer, N. J., & Michael, M. A. (2006).
Balancing varied assessment functions to attain systemic validity: Three is the
magic number. Studies in Educational Evaluation, 32(3), 180201.
Hill, H. C., Rowan, B., & Loewenberg Ball, D. (2005). Effects of teachers mathematical
knowledge for teaching on student achievement. American Educational Research
Journal, 42(2), 371406.
Jonsson, A., & Svingby, G. (2007). The use of scoring rubrics: Reliability, validity and
educational consequences. Educational Research Review, 2(2), 130144.
Ketelaar, E., Beijaard, D., Boshuizen, H., & den Brok, P. J. (2012). Teachers positioning
towards an educational innovation in the light of ownership, sense-making and
agency. Teaching and Teacher Education, 28(2), 273282.
Kleickmann, T., Richter, D., Kunter, M., Elsner, J., Besser, M., Krauss, S., et al. (2013).
Teachers content knowledge and pedagogical content knowledge. The role of
structural differences in teacher education. Journal of Teacher Education, 64(1), 90
106.
Kutnick, P., & Vena, J. (1993). Students perceptions of a good teacher: A developmental
perspective from Trinidad and Tobago. British Journal of Educational Psychology,
63(3), 400413.
Landy, F. J., & Conte, J. M. (2013). Work in the 21st century: An introduction to industrial
and organizational psychology (4th ed.). Hoboken, NJ: Wiley.
Levy, J., den Brok, P., Wubbels, T., & Brekelmans, M. (2003). Students perceptions of
interpersonal aspects of the learning environment. Learning Environments Research,
6(1), 536.
Linn, R. L., Baker, E. L., & Dunbar, S. B. (1991). Complex, performance-based assessment: Expectations and validation criteria. Educational Researcher, 20(8),
1521.
Mangiante, E. M. S. (2011). Teachers matter: Measures of teacher effectiveness in lowincome minority schools. Educational Assessment, Evaluation and Accountability,
23(1), 4163.
Marzano, R. J. (2003). What works in schools. Translating research into action. Alexandria,
VA: Association for Supervision and Curriculum Development.
Marzano, R. J. (2011). De kunst en wetenschap van het lesgeven. Een evidence-based
denkkader voor goed, opbrengstgericht onderwijs. [The art and science of teaching.
An evidence-based perspective for good, data-driven education]. Vlissingen:
Bazalt.
Messick, S. (1995). Validity of psychological assessment. American Psychologist, 50,
741749.
Mittendorff, K., Jochems, W., Meijers, F., & den Brok, P. (2008). Differences and
similarities in the use of the portfolio and personal development plan for career
guidance in various 70 vocational schools in The Netherlands. Journal of Vocational
Education and Training, 60(1), 7591.
Murhpy, K. R., & Davidshofer, C. O. (1994). Psychological testing: Principles and applications. New Jersey: Prentice-Hall.
Noell, G. H., & Burns, J. L. (2006). Value-added assessment of teacher preparation.
An illustration of emerging technology. Journal of Teacher Education, 57(1), 37
50.
Norcini, J. J., & Shea, J. A. (1997). The credibility and comparability of standards. Applied
Measurement Education, 10(1), 3959.
Panadero, E., & Jonsson, A. (2013). The use of scoring rubrics for formative assessment
purposes revisited: A review. Educational Research Review, 9, 129144.
61
Praslova, L. (2010). Adaptation of Kirkpatricks four level model of training criteria to

assessment of learning outcomes and program evaluation in Higher Education.
Educational Assessment, Evaluation and Accountability, 22(3), 215225.
Price, M. (2005). Assessment standards: The role of communities of practice and the
scholarship of assessment. Assessment and Evaluation in Higher Education, 3, 215
230.
Rasmussen, A., & Friche, N. (2011). Roles of assessment in secondary education:
Participant perspectives. Educational Assessment, Evaluation and Accountability,
23(2), 113129.
Ryan, R. M., & Deci, E. L. (2000). Self-determination theory and the facilitation of
intrinsic motivation, social development, and well-being. American Psychologist,
55, 6878.
Sadler, D. R. (1998). Formative assessment: Revisiting the territory. Assessment in
Education, 5(1), 7784.
Sadler, P. M., & Good, E. (2006). The impact of self- and peer-grading on student
learning. Educational Assessment, 11(1), 131.
Scheerens, J. (2007). Een overzichtsstudie naar school- en instructie-effectiviteit. [An
overview research on school-effectiveness and instruction-effectiveness].
Enschede: Universiteit Twente.
Schmidt, F. L., & Hunter, J. E. (1998). The validity and utility of selection methods in
personnel psychology: Practical and theoretical implications of 85 years of research ndings. Psychological Bulletin, 124(2), 262274.
Scriven, M. (1980). The logic of evaluation. Inverness, CA: Edge Press.
Seezink, A., & Poell, R. F. (2009). Teachers individual action theories about competencebased education: The value of the cognitive apprenticeship model. Journal of
Vocational Education and Training, 61(2), 203215.
Seidel, T., & Shavelson, R. J. (2007). Teaching effectiveness research in the past decade:
The role of theory and research design in disentangling meta-analysis results.
Review of Educational Research, 77(4), 454499.
Shulman, L. (1986). Those who understand: Knowledge growth in teaching. Educational
Researcher, 15, 414.
Shulman, L. (1987). Knowledge and teaching: Foundations of the new reform. Harvard
Educational Review, 57(1), 122.
Sluijsmans, D. M. A., Brand-Gruwel, S., & Van Merrienboer, J. J. G. (2002). Peer
assessment training in teacher education: Effects on performance and perceptions.
Assessment and Evaluation in Higher Education, 27(5), 443454.
Snoek, M., Clouder, C., De Ganck, J., Klonari, K., Lorist, P., Lukasova, H., et al. (2009).
Teacher quality in Europe: Comparing formal descriptions. Paper presented at the
ATEE conference 2009, Mallorca.
Spillane, J. P., Pareja, A. S., Dorner, L., Carol Barnes, C., & May, H. (2009). Mixing methods
in randomized controlled trials (RCTs): Validation, contextualization, triangulation, and control. Educational Assessment, Evaluation and Accountability, 22(1), 5
28.
Stronge, J. H. (2007). Qualities of effective teachers. Alexandria, USA: Association for
Supervision and Curriculum Development.
Teaching Advisory Board. (2007). [Commissie Leraren]. LeerKracht! Advies van de Commissie Leraren. [Power of teachers: Advice of the teaching advisory board]. Den
Haag: Ministry of Education.
Telli, S., & den Brok, P. J. (2012). The questionnaire on teacher interaction from the
primary to the higher education context in Turkey. In T. Wubbels, P. J. den Brok, J.
van Tartwijk, & J. Levy (Eds.), Interpersonal relationships in education: An overview of
contemporary research (pp. 187206). Rotterdam: Sense Publishers.
Tigelaar, D. E. H., Van Tartwijk, J., Janssen, F., Veldman, I., & Verloop, N. (2009). A
program for the assessment of competence in teacher education. An exploration of
teacher educators assessment activities. Paper presented at the 13th biannual
conference of the European Association for Research on Learning and Instruction,
August 29, 2009, Amsterdam, The Netherlands.
Uhlenbeck, A. M. (2002). The development of an assessment procedure for beginning
teachers of English as foreign language. (unpublished doctoral dissertation) Leiden,
the Netherlands: University of Leiden, ICLON Graduate School of Education.
van der Linden, W., Bakx, A., Ros, A., Beijaard, D., & Vermeulen, M. (2012). Students
perceived development of a positive attitude towards research and research
knowledge and skills in primary teacher education. European Journal of Teacher
Education, 35(4), 401419.
van der Schaaf, M. F., & Stokking, K. M. (2008). Developing and validating a design for
teacher portfolio assessment. Assessment and Evaluation in Higher Education, 33(3),
245262.
van der Schaaf, M. F., Stokking, K. M., & Verloop, N. (2005). Cognitive representations in
raters assessment of teacher portfolios. Studies in Educational Evaluation, 31, 27
55.
van der Vleuten, C. P. M., & Schuwirth, L. W. T. (2005). Assessing professional competence: From methods to programs. Medical Education, 39, 309317.
Vanderlinde, R., & Van Braak, J. (2010). The gap between educational research and
practice: Views of teachers, school leaders, intermediaries and researchers. British
Educational Research Journal, 36(2), 299316.
Verloop, N. (2005). De leraar [The teacher]. In N. Verloop & J. Lowyck (Eds.), Onderwijskunde (pp. 195234). Groningen: Noordhoff Uitgevers.
Wiliam, D., Lee, C., Harrison, C., & Black, P. J. (2004). Teachers developing assessment for
learning: Impact on student achievement. Assessment in Education: Principles Policy
and Practice, 11(1), 4965.
Wise, K. C., & Okey, J. R. (1983). A meta-analysis of the effects of various science
teaching strategies on achievement. Journal of Research in Science Teaching, 20,
419435.
Wubbels, T., Brekelmans, M., & Hooymayers, H. (1991). Interpersonal teacher behaviour in the classroom. In B. Fraser & H. Walberg (Eds.), Educational environments.
Oxford: Pergamon.
62
Zeichner, K. M., & Noffke, S. E. (2001). Practitioner research. In V. Richardson (Ed.),

Handbook of research on teaching (pp. 298330). Washington, DC: American
Educational Research Association.
Anouke Bakx, PhD, is an associate professor and academic director of the master
program Learning and Innovation for teachers at Fontys University of Applied
Sciences, The Netherlands. Her research focuses on teacher quality in primary schools,
professional learning of teachers and outcome-based education.
Liesbeth Baartman, PhD, is a senior researcher and lecturer at the Faculty of Education
of Utrecht University of Applied Sciences, the Netherlands. She is part of the Research
Group Vocational Education. Her research focuses on assessment quality in (higher)

vocational education and students learning processes between school and work in
vocational education.
Tamara van Schilt-Mol, PhD, is associate professor, testing and assessing at the
HAN University of Applies Sciences, the Netherlands. She is part of the Research
Centre Quality for Learning. Her research focuses both on the function of testing
and assessment regarding development of students and teachers/lecturers, and on
the function of testing and assessment regarding (improving) the quality of
education.

Jurnal (Studies and Evaluation)

Enviado por

Dados do documento

Descrição original:

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Jurnal (Studies and Evaluation)

Enviado por

Direitos autorais:

Formatos disponíveis

Studies in Educational Evaluation 40 (2014) 5062

Contents lists available at ScienceDirect

Studies in Educational Evaluation

Development and evaluation of a summative assessment program for

* Corresponding author. Tel.: +31 8778 75 993.

A. Bakx et al. / Studies in Educational Evaluation 40 (2014) 5062

Indeed, literature shows different perspectives on how teacher

Contemporary educational research on good teachers is

A. Bakx et al. / Studies in Educational Evaluation 40 (2014) 5062

teaching as a combination of different roles regarding the teaching

minimum quality for certied teachers. Teacher-training colleges

A. Bakx et al. / Studies in Educational Evaluation 40 (2014) 5062

contribute to team goals

8. Results based acting

SMART goals are specic, measurable, attainable, relevant and time-bound.

educational environment, even though this is a relatively small

A. Bakx et al. / Studies in Educational Evaluation 40 (2014) 5062

well as quantitative data (e.g. Spillane, Pareja, Dorner, Barnes, &

Twice, during a period of half a year

Twice, during a period of half a year

Once, at the start of their portfolio development

At the end of the assessment period/three

- Group of pupils (25)

Standardized questionnaire (www.ivlos.uu.nl).

A. Bakx et al. / Studies in Educational Evaluation 40 (2014) 5062

S/he is a good leader

sympathy of the observed person. That is why at least four peers,

management and senior teachers portfolios (as described below),

A. Bakx et al. / Studies in Educational Evaluation 40 (2014) 5062

et al., 2007; Linn et al., 1991). Quantitative measures of quality are

A. Bakx et al. / Studies in Educational Evaluation 40 (2014) 5062

specic competence. In order to make a nal judgement on the

Overall results of the senior teachers (N = 8)

Observation questionnaire T1a

Observation questionnaire T1a

Observation questionnaire T2b

Positive results for all 8 teachers

A. Bakx et al. / Studies in Educational Evaluation 40 (2014) 5062

Fitness for purpose

assessment program. Especially the portfolio was designed to

A. Bakx et al. / Studies in Educational Evaluation 40 (2014) 5062

their own competence development while working on their

invented here-problem, ending up with a rejection of the

A. Bakx et al. / Studies in Educational Evaluation 40 (2014) 5062

create acceptance for the assessment program, a large group of

the competence prole, acceptance of the assessment program and a

A. Bakx et al. / Studies in Educational Evaluation 40 (2014) 5062

Praslova, L. (2010). Adaptation of Kirkpatricks four level model of training criteria to

A. Bakx et al. / Studies in Educational Evaluation 40 (2014) 5062

Zeichner, K. M., & Noffke, S. E. (2001). Practitioner research. In V. Richardson (Ed.),

Group Vocational Education. Her research focuses on assessment quality in (higher)

Você também pode gostar

- Group of pupils (25)