Você está na página 1de 21

House Joint Resolution #6

DPAS II SubCommittee
Final Report
March 31, 2016

A report of findings and recommendations regarding the current DPAS II


Educator Evaluation System

Table of Contents

Task Force Members

Executive Summary

Position Paper

Appendixes:

10

A.
B.
C.
D.

House Joint Resolution #6


Proposed Summative Rating Review
DSEA/DASA Workgroup Final Recommendations
Measure B Assessments Quality Control Process (Presentation by DDOE)

Page 2 of 21

11
15
17
18

DPAS II Sub-Committee
Voting Members
Jackie Kook, Delaware State Education Association (DSEA) - Chair
Dr. David Santore, DE Association for School Administrators (DASA) Co-Chair
Sherry Antonetti, Delaware State Education Association (DSEA)
Clay Beauchamp, Delaware State Education Association (DSEA)
Rhiannon ONeal, Delaware State Education Association (DSEA)
Kent Chase, DE Association for School Administrators (DASA)
Dr. Clifton Hayes, DE Association for School Administrators (DASA)
Dr. Charlynne Hopkins, DE Association for School Administrators (DASA)
Bill Doolittle, Parent Representative (PTA)
David Tull, Delaware School Boards Association
Dr. Lisa Ueltzhoffer, Newark Charter - Charter School Representative
Dr. Susan Bunting, School Chiefs Association/ (DPAS-II Advisory Committee Chairperson)
The Honorable David Sokola, Delaware State Senate
Tyler Wells, Higher Education
Non-Voting Members
Christopher Ruszkowski, Delaware Department of Education
Atnre Alleyne, Delaware Department of Education
Eric Niebrzydowski, Delaware Department of Education
Angeline Rivello, , Delaware Department of Education
Donna R Johnson, Delaware State Board of Education

Page 3 of 21

Executive Summary
The DPAS II Sub-Committee, comprised of representatives from all major educator and
stakeholder groups, has met regularly since September 28, 2015. In the subsequent meetings, the
Committee identified the issues regarding the Teacher and Specialist portions of the Delaware
Performance Appraisal System II (DPAS II). The Committee worked to address the major issues
affecting the lack of confidence in the system. These were identified in statewide surveys and noted in
House Joint Resolution No. 6, as at least 70% of every employee group believed that the current form of
the DPAS II was not adequate.
The Committee agrees that, while there are many issues surrounding the effectiveness of the
DPAS II system, the major issue has been Component V, Student Improvement. Therefore, the
Committees work has centered largely on modifying the current structure of Component V as well as
weighting it more equitably in relation to the other important components within the system. The
Committee does address other issues regarding the DPAS II throughout the process, and those that are
pertinent are also recognized.
The Committee is recommending several changes to the structure and weighting of the DPAS II
Component V. In addition, the Committee is making recommendations regarding the implementation of
the changes, as the Committee believes this was a major reason for the initial lack of confidence in the
system.

Page 4 of 21

DPAS II Sub-Committee Position Paper


The Committee, which is comprised of members of both the Delaware State Educators Association
(DSEA) and the Delaware Association of School Administrators (DASA), along with representatives
from the Delaware Department of Education, the Delaware State Board of Education, the Delaware
School Boards Association, higher education, the Delaware Parent Teacher Organization, and a
legislative representative, has met on several occasions to discuss improvements upon the existing
evaluation system in accordance with House Joint Resolution #6. The Committee looked to identify
some of the shortcomings and improve upon them. Some of the deficiencies identified by the Committee
were:

The Student Improvement portion (Component V) of the evaluation system was weighted
disproportionately among the other components.

Many of the student growth measures, especially at the outset, were of poor quality and did not
accurately measure student growth with the fidelity necessary to build an equitable educator
accountability system.

There was not always consistent rigor and comparability between the measure Bs and Cs when
they were applied to employee groups. For example, the rigor of some pre and post mathematics
tests far exceeded the rigor of some performing arts pre and post-tests. In addition, at the outset,
some measure Cs were growth measures, while others were completion goals.

Initial issues with the measures, particularly the internal measure B assessments, as well as some
of the measure C growth goals, have caused the users of the current structure to lack confidence
in the accuracy and fairness of the system as evidenced by the fact outlined in HJR#6 78% of
administrators, 70% of teachers, and 78% of specialists believe the current evaluation system
should not continue in its current form. Despite DOEs earnest attempts at rectifying the issues,
the user perception of the accountability tools in the current Component V is an obstacle that may
be difficult to overcome.
Page 5 of 21

It should be noted that while there are other issues identified that can be improved upon in the current
system, such as the educator rubric and the redundancy of some components, the issues identified in this
paper were recognized by the Committee as the most urgent to address, as they have the most direct
impact on educator accountability.
The Committee therefore recommends the following two changes to Component V of the current system:
The first change is that Component V be weighted 20% of the total summative evaluation, which will
give it equal weight with all of the other components. It will be divided into two parts of equal measure:
Part A
50% of Component V will consist of an individual goal agreed upon by the educator and the
administrator. The goal will be based upon each individual institutions school or district
improvement document, and must target student improvement and the educators individual
efforts to foster positive change in the school in which he or she works. This measure will
closely mirror the Component V measure used in the 2008-2011 school years.
Part B
The other 50% of Component V will consist of the student improvement results from uniform
accountability measures for each content measure and employee group. These measures must be
collected annually. Timelines that foster end-of-year discussion for teachers should be a strong
consideration, but should not preclude an accurate and viable measure from being utilized.
Where applicable, this includes, but is not limited to, state assessments, pre-and post tests by
course, portfolios, end-of-course projects, AP exam results, industry-standard measures for CTE
courses, industry standard measures for specialists, such as the Performance Assessment of
Contributions and Effectiveness of Speech and Language Pathology, etc If not already

Page 6 of 21

completed, the agreed upon measures should be vetted for validity and reliability on an ongoing
basis by the Department of Education.
The Committee recognizes the massive undertaking of this process of developing acceptable uniform
measures, as they must be specific to various and numerous educator groups. However, with the lack of
confidence in Component V, we believe that it is important to take the time necessary to ensure that the
measures are as ready as possible when implemented. We also recognize that there must be a balance
between the validity and reliability of a measure and the practical and flexible nature of administering and
scoring the assessment. For instance, while a longer assessment covering additional standards may prove
more reliable, giving a three-day post-test may not outweigh the benefits of administering the test in a
single sitting, especially with the current number of assessments students are given.
The Committee recommends that DOE oversee the work of compiling, creating, and/or developing an
acceptable list of uniform measures for each employee group and job description with significant and
ongoing participation by practitioners.
The second recommendation is that the requirement of annual summatives that will be instituted in 20172018 as a result of the revisions of 14 Delaware Admin Code 106A, are not necessary and should be
rescinded. The Committee values the dialogue between the evaluator and the educator, and believes the
discussion of Component V during the required Measures and Targets Conference each year enables
educators and their supervisors to engage in annual, meaningful dialogue regarding teaching and learning.
We believe it accomplishes the same objective without additional paperwork that adds no value to the
process, and in fact, creates unnecessary redundancy. The Committee does not wish to move to required
annual summatives.
Miscellaneous Recommendations:
Professional development for Part A goals is recommended for all users, to be led by the Department of
Education but developed in conjunction with a Committee of practitioners chosen by DSEA and DASA.
Page 7 of 21

Included should be guidelines and parameters for goal setting, such as processes to address potential
conflicts over the selection of measures, as well as procedures and examples to assist practitioners with
developing meaningful goals.
According to a recent study completed by the Department of Education, many of the current measures
that are used for Component V are accurate and reliable. The Committee recommends that these existing
measures that have been vetted for reliability and validity be widely published to foster more confidence
in the system, and that the usable measures be kept. The measures that show sub-par reliability and
validity will be revised and/or replaced.
The Committee recommends the use of the mathematical algorithm developed by the joint DSEA and
DASA Committee in the Spring of 2015 (attached) to determine final ratings for evaluations.
Many of the problems with the implementation of the current system were attributed to the hurried
execution of very important educator accountability measures. The Committee proposes that the full
system be put into place in the 2017-2018 school year. It is recommended that 2016-2017 should be
used for measure refinement, professional development on Part A of Component V, a piloting of the
mathematical algorithm, and a piloting of the various measures in Part B.
The Committee recommends that a public school district or charter school continue to be permitted to
submit locally developed assessments to be approved by the Department of Education. Once developed
and approved by the DOE, the Committee recommends that the assessments become public domain and
available for use for all districts in Delaware.
The Committee recommends that a public school district or charter school continue to be permitted to
develop and submit alternate teacher and/or specialist evaluation systems to be approved by the
Department of Education. Once developed and approved by the DOE, the Committee recommends that
the systems become public domain and available for use for all districts in Delaware.

Page 8 of 21

Quality control of the measures determining evaluation ratings of educators will be paramount in revising
this system and changing user perception of the system.

Page 9 of 21

Appendixes
A.
B.
C.
D.

House Joint Resolution #6


Proposed Summative Rating Review
DSEA/DASA Workgroup Final Recommendations
Measure B Assessments Quality Control Process (Presentation by DDOE)

Page 10 of 21

11
15
17
18

Appendix A: House Joint Resolution #6

SPONSOR: Rep. Jaques & Sen. Sokola

HOUSE OF REPRESENTATIVES

148th GENERAL ASSEMBLY

HOUSE JOINT RESOLUTION NO. 6


AS AMENDED BY
HOUSE AMENDMENT NO. 1

DIRECTING THE DPAS II ADVISORY COMMITTEE TO ESTABLISH A SUB-COMMITTEE TO REVIEW AND MAKE
RECOMMENDATIONS FOR CHANGES TO THE CURRENT EDUCATOR EVALUATION SYSTEM.

WHEREAS, Delawares administrators, teachers, and specialists are evaluated under the Delaware
Performance Appraisal System II-R (DPAS II-R); and
WHEREAS, the Delaware Performance Appraisal System was created to provide credible evidence on the
performance of educators and enhance the skills and knowledge of educators through professional growth; and
WHEREAS, 99% of Delaware educators have been rated highly effective or effective; and

Page 11 of 21

WHEREAS, according to the year seven report of the Delaware Performance Appraisal System II-R, 78% of
administrators, 70% of teachers, and 78% of specialists believe the current evaluation system should not continue
in its current form; and
WHEREAS, given the significance of the feedback on the Delaware Performance Appraisal System II-R, the
system needs to be reviewed; and
WHEREAS, the review of the system would benefit from the continued engagement and input of the
education stakeholders on the DPAS-II Advisory Committee and practitioners affected by the system;
NOW THEREFORE:
BE IT RESOLVED by the House of Representatives and the Senate of the 148th General Assembly of the State
of Delaware, with the approval of the Governor, that:
1. The DPAS II Advisory Committee is hereby directed to establish a sub-committee before July 30,
2015, in order to review and make recommendations for changes to the current educator evaluation
system;
2. All changes to statute and regulation regarding educator evaluations should be focused on
professional growth, continuous improvement of student outcomes, and providing quality educators in
every school building and classroom;
3. The sub-committee shall review existing state statutes and regulations relating to teacher
evaluations and make recommendations on the following:
a. Multiple measures which reflect the impact of clear teaching standards and where the
components include detailed indicators that guide teacher performance and a teachers
contribution to student growth. Measures to be considered include, but are not limited to, student
learning objectives and other non-test options such as end-of-course projects;
b. Differentiated by years of experience;

Page 12 of 21

4. In addition to the forgoing, the sub-committee shall also review existing state statutes and
regulations relating to specialist evaluations and make recommendations based on the following:
a. Multiple measures based on standards developed by specialist national organizations
in order to provide specialists with clear and actionable feedback to enhance their practice;
b. Relevance to the work of the specialist;
c. Differentiated between direct and indirect services; and
d. Flexibility for districts and charter schools to set building level goals for each specialist.
5. The sub-committee shall be composed of the following members:
a. Seven representatives of the DPAS II Advisory Committee appointed by the Chair,
including at least one teacher and one administrator;
b. Three administrators, one representing each county, appointed by the Delaware
Association of State Administrators;
c. Three educators, one representing each county and including at least one specialist,
appointed by the Delaware State Education Association;
d. One administrator and one educator appointed by the Charter School Network; and
e. The sub-committee shall designate a Chairperson and a Vice Chairperson from amongst
its membership;
6. The sub-committee shall work in collaboration with the Department, and include the Secretary
of Education or his/her designee and the President of the State Board of Education or his/her designee,
who may participate in the sub-committee as non-voting members;
7. The sub-committee shall be staffed by existing personnel at the Department of Education; and

Page 13 of 21

8. Recommendations must be sent to the Secretary of Education, the State Board of Education,
and the House and Senate Education Committees by March 31, 2016.

Page 14 of 21

Appendix B: Proposed Summative Rating Review

December 2, 2015
DPAS II Advisory Committee;
It is the formal recommendation of the DPAS II Advisory Sub-Committee, given authority by
House Joint Resolution #6 to review the existing evaluation system and make such
recommendations, that the current regulation directing annual summative ratings to begin in the
2017-2018 school year be rescinded based on the current evaluation system. The Sub-Committee
feels that, for a variety of reasons, implementing an annual summative rating system would be
more burdensome than useful.
A summative rating is a specific document that must be completed in a specific way, based on
the current DPAS II system of educator evaluation. It requires that evidence be collected in
criteria for each of the five components on an annual basis, which the collective administrative
voice on the Sub-Committee felt would be difficult to accomplish. Getting into the classrooms to
provide feedback on all criteria every time would greatly increase the time spent on observations,
which would decrease the time administrators have for other required tasks. Changing the
number of criteria or implementing alternative evaluation systems would make the playing field
unequal, be it across the building, across the district, or across the state.
The collective educator voice on the Sub-Committee felt the annual summative rating would not
address the need for specific, timely feedback based on classroom observations if the evaluation
system otherwise continued in its current form. There is potential for the summative rating to be
based on a single observation and corresponding formative rating form, and this does not provide
for a system of targeted educator growth. It was agreed across the Sub-Committee that the spring
goal review conference was an appropriate time for educators and evaluators to discuss the
classroom observation and associated components for the year without the onus of a summative
rating form needing completion.
As it currently stands, DPAS II does not support the implementation of annual summative
ratings, and the Sub-Committee felt there is not adequate time to do the work of creating a new
evaluation system. While giving an annual assessment of an educators abilities is not a concern
in and of itself, using the current system with its summative rating procedure is not feasible.
There was some discussion around the possibility of formalizing the spring conference to allow
for annual review, but as a whole the Sub-Committee feels this was not the charge given and
would require more direction for the intention and expected outcomes of the annual summative
rating regulation. If the goal is help educators and administrators grow, the work to change
DPAS II would be tremendous. If the goal is to hold educators and administrators more
accountable for the off-year Component V: Student Achievement scores, there are less
cumbersome ways to formalize the system for that purpose.
The Sub-Committee looks forward to continued conversations around remaining tasks.
Sincerely;
Page 15 of 21

Jackie Kook, Chair


David Santore, Co-Chair
DPAS II Advisory Sub-Committee

Page 16 of 21

Appendix C: DSEA/DASA Workgroup Final Recommendations


The workgroups finalized recommendations were submitted to the Department of Education in May of
2015 as a PowerPoint file. Though the overall mathematical scoring suggested by the workgroup is in
the table below, the entire recommendation file is attached to the email containing this document.

Criteria Ratings - Components 1-4


Ineffective
Needs Improvement

Effective

Highly Effective

0
2
Component 1-4 Ratings
Ineffective
Needs Improvement

Effective

Highly Effective

0 to 1.59
Component 5 Rating
Ineffective

1.6 to 2.59

2.6 to 3.59

3.6 to 4

Needs Improvement

Effective

Highly Effective

(Ineffective
w/discretion)

(satisfactory)

(Exceeds)

0
2
Summative Rating
Ineffective

Needs Improvement

Effective

Highly Effective

0 to 1.59

1.6 to 2.59

2.6 to 3.59

3.6 to 4

Page 17 of 21

Appendix D: Measure B Assessments Quality Control Process


On the Evidence of the Validity & Reliability of Measure B Student Growth Measures White Paper
Prepared by Shanna Ricketts, Ph.D. (Fellow, Harvard Strategic Data Project) & Pete Goldschmidt, Ph.D.
(Associate Professor, California State University Northridge)
January 2016
Delawares educator evaluation system, the Delaware Performance Appraisal System (DPAS II), provides
measures of educator effectiveness based on five components. These components include: 1) Planning
and Preparation, 2) Classroom Environment, 3) Instruction, 4) Professional Responsibility, and 5) Student
Improvement. The Student Improvement Component (commonly referred to as Component V within
Delaware) is comprised of multiple measures of student growth. These student growth measure
categories include: 1) Measure Abased on student growth on the statewide assessment in Math and
ELA, 2) Measure Bthe Delaware-educator-created (internal) and vendor-created (external)
assessments myriad subject areas, and 3) Measure Cthe growth goals that cover a large number of
subject areas as well as classroom assignments.
The focus of this white paper is on the student growth measures commonly referred to, categorically
and/or individually, as Measure B (or Measure Bs). In particular, the paper focuses on the subset of
assessments that were created by and for Delaware educators (commonly referred to as internal
Measure B assessments). These 200+ assessments were created by Delaware educators, and thus are
aligned to the content that is being taught within classrooms. In addition, they are subject to a thorough
approval process by an external vendor. These assessments allow educators in Delaware to be able to
measure student growth in almost every subject area. From some of the least used assessments such as
German Level III, Sheet Metal I, and Textiles and Clothing III to some of the most used assessments such
as Social Studies Grade 8, Science Grade 8, and ELA Grade 10, these assessments allow educators to
measure the academic growth of their students over the course of the school year.
During the first three years of implementation, there was discussion throughout Delaware (as evidenced
by the annual statewide survey results and meeting minutes from state advisory committees) about the
validity and reliability of internal Measure Bs. Yet, as this paper presents, the vast majority of these
assessments exhibit adequate to good reliability. Further, there is a great amount of evidence for the
validity of these assessments for measuring both student growth and educator effectiveness. Now that
the early research has been conducted, this paper provides evidence for the validity (and thus reliability)
of these assessments.
What is Validity?
The key aspect to validity is that inferences we intend to make based on the results or scores of
assessments are subject to validation not the assessments themselves. Messick (1995) describes
validity as an overall evaluative judgment of the degree to which empirical evidence and theoretical
rationales support the adequacy and appropriateness of interpretations and actions on the basis of test
Page 18 of 21

scores or other modes of assessment (pg. 741). Validity is also described as the degree to which
evidence and theory support the interpretations of test scores for proposed uses of tests (AERA, APA, &
NCME, 2014, pg. 11). In other words, there is no one measure for validity, and no one source of
evidence for validity. Rather, validity is a compilation of evidence that takes many forms, such as expert
judgment and relevant psychometric results.
Evidence for Validity in the Construction of the Assessments
As noted by Herman, Heritage, and Goldschmidt (2011), and building on Kanes (2013) claim framework,
a key component of establishing validity is identifying the purpose of the assessment. As part of
Delawares educator evaluation system, DPAS-II, Measure B assessments are intended, in part, to be
used in aggregate to demonstrate a teachers contribution to his/her class learning. Specifically, the
assessments should provide evidence of content mastery of relevant standards and, importantly, need
to be amenable to demonstrating growth which also implies that they are sensitive to the quality,
consistency and rigor of instruction.
These assessments (part of Delawares library of Student Growth Measures) were designed by Delaware
educators for almost every grade and subject area statewide. Assessments are written to a set of
standards that align with the content that students are expected to learn in an academic year.
Educators, as well as content experts from the DDOE and across the country, partner to write items that
will be used on the assessment. An external vendor provides assessment-writing advice, training, and
support to educators throughout this process. The vendor then assures the quality of the assessment by
ensuring that the assessments exhibit appropriate properties, including an alignment of items to the
standards, unambiguous items, an appropriate scoring format, adequate representation of the intended
domain, and an appropriate mix of item difficulty. The Delaware Measure B process is substantively
grounded in the expertise of educators and moderated by additional independent expertise which is a
solid basis for establishing validity.
Evidence of Fairness
Validity also relates to evidence of fairness. In Delawares educator evaluation system, fairness relates to
two aspects: 1) the appropriateness of the assessment for students and their opportunities to learn the
material that have been provided them, and 2) fairness in terms of whether performance on the
Measure Bs reflect what the standards denote (and thus what teachers teach). Analyses performed by
the DDOE show that students consistently perform better on the Measure B post-tests (after
instruction) than on the Measure B pre-tests (upon entering a class).
Psychometric Quality of the Assessments
The post-tests from school year 2013-14 were examined in great depth to allow the DDOE to have a
fuller understanding of how well the state-approved assessments were performing psychometrically.
Measure Bs were examined for their reliability as measured by Cronbachs alpha (a measure of internal
consistency). The reliability of the majority of the assessments is acceptable (generally considered to be
a Cronbachs alpha > 0.7).
Page 19 of 21

There were some assessments that did not exhibit high internal consistency. However, further
examination revealed that this may have more to do with the structure of the assessment than with
items in general. For example, assessments with a heavily-weighted constructed response section
tended to exhibit lower reliability than assessments that used entirely multiple choice responses. The
DDOE also examined other relevant psychometric indicators, such as differential item functioning
(whether items are equally difficult for subsets of students), range of difficulty (whether each Measure B
has an adequate distribution of easy and moderate and hard items); discrimination and fit (were items
logically related to the assessments and the skill levels of students taking it, and were there any
outliers).
Convergent Evidence of Validity
Another type of validity evidence of an assessment is convergent evidence. This refers to whether there
exists a relationship between a Measure B assessment and another assessment that is intended to
measure the same construct (AERA, APA, NCME, 2014). The Measure B assessments were highly
correlated with other assessments that measured similar constructs. For example, student scores on the
Mathematics Measure B internal assessments were highly correlated with student scores on the
Mathematics DCAS assessment for that same year. If both assessments are measuring the same
construct (in this case, mathematics student growth) then there should be a high correlation between
scores on each of these assessments. This was found to be the case with correlations around 0.7 or
higher, thus providing convergent evidence of validity.
Validity in the Use of the Assessments (Consequential Evidence)
While the vast majority of the assessments themselves exhibited average or good reliability, this in and
of itself is not sufficient to guarantee the validity of the assessment. In order to explore the validity of
the assessments one also has to identify the way in which the assessment will be used. Validation
logically begins with an explicit statement of the proposed interpretation of test scores, along with a
rationale for the relevance of the interpretation to the proposed use (AERA, APA, & NCME, 2014, p.
11). The claim based on Measure B results is that the change in student scores between the pre-test and
the post-test reflects a change in student skills and knowledge related to a particular subject, and that
this change in skills and knowledge was facilitated by the students teacher in that class. This is relevant
because DPAS-II declares that effective teacher practices are evidenced by multiple indicators, of which
demonstrated teacher effectiveness in providing opportunities for students to learn the subject matter
is one important aspect. Technically, the reliability of the assessments is sufficient to allow for
inferences about mean changes in pre/post performance. Equally important are the consequences
related to the inferences. Claims about teachers based on Measure B results are vetted by principals and
provide only one of several indicators of a teachers effectiveness. This line of consequences is thus
appropriately matched to the nature of the evidence.
Maintaining and Improving Measure B Assessments
In addition to providing for quality control at the front-end of the assessment-creation process,
Delaware has implemented a process referred to as the refinement cycle (or wash cycle) in which
Page 20 of 21

25% of the assessments come up for review by DDOE, alongside educators, each year. When an
assessment comes up for review, an in-depth look at each assessment by Delaware educators allows
items that have not performed well historically to be revised or removed. In the spirit of continuous
improvement, DDOE continues to extensively monitor these assessments and to identify ways in which
they can be improved taking into account both the technical properties that are needed for a quality
assessment as well as input from Delaware educators regarding the alignment of standards and content
covered on the assessment to the standards and content that is taught within Delaware classrooms.

References
American Educational Research Association, American Psychological Association & National Council on
Measurement in Education (2014). Standards for educational and psychological testing.
Washington, DC: American Educational Research Association.
Herman, J. L., Heritage, M., & Goldschmidt, P. (2011). Developing and selecting assessments of student
growth for use in teacher evaluation systems (extended version). Los Angeles, CA: University of
California, National Center for Research on Evaluation, Standards, and Student Testing (CRESST).
Kane, M. (2013). Validating the Interpretation and Use of Test Scores, Journal of Educational
Measurement, 50(1), 1-73.
Messick, S. (1995). Validity of psychological assessment: Validation of inferences from persons
responses and performances as scientific inquiry into score meaning. American Psychologist,
50(9), 741-749.

Page 21 of 21

Você também pode gostar