Você está na página 1de 29

11

Best Approaches for 21st Century Evaluations

Of the program evaluation fields variety of approaches we chose nine for comparative analysis and evaluation. The selected approaches are in the questions and methods category(relative newcomer) Success Case Method (SCM), case study, experimental and quasi-experimental design, and objectivesbased; in the improvement and accountability categorythe context, input, process, and product (CIPP) model and consumer-oriented evaluation; in the social agenda and advocacy categoryconstructivist and responsive/clientcentered; and in the eclectic categoryutilization-focused evaluation. These approaches are applicable to program evaluations, representative of the different categories of evaluation approaches, widely referenced in the professional literature, and likely to be used extensivelyadvisedly or notbeyond 2012. In contrasting and evaluating these approaches, we aimed to help evaluators and their clients to critically appraise these approaches before choosing among them. We selected the particular nine approaches, rather than some of the other legitimate evaluation approaches referenced in preceding chapters, because we sought balance in evaluating representative approaches in each category and needed to keep the approaches assessed to a manageable number. Clearly some of the approaches not assessed in this chapter are worthy of consideration by evaluators and their clients. These include especially the cost study, value-added

432

assessment,

connoisseurship,

accreditation,

deliberative

democratic,

and

participatory approaches. Our selection of the nine approaches should not be construed as an exclusion of the other legitimate approaches described in previous chapters. Indeed, eclectically, it can be beneficial to incorporate aspects of these approaches in applications of the selected nine. The ratings of the selected nine approaches are shown in Figure 11.1. They are listed in order of judged overall merit within the categories of questions-oriented and methods-oriented, improvement and accountability, social agenda and advocacy, and eclectic evaluation approaches.

433

Figure 11.1 Strongest Program Evaluation Approaches within Types in Order of Compliance with The Program Evaluation Standards

434

Note. Each author rated each evaluation approach on each of the 30 Joint Committee on Standards for Educational Evaluation (2011) The Program Evaluation Standards by judging whether the approach endorses each of six key features of the standard. They judged each approachs adequacy on each standard as follows: [6] Excellent, [5] Very Good, [4] Good, [2-3] Fair, [0-1] Poor. A score for the approach on each of the five categories of standards (Utility, Feasibility, Propriety, Accuracy, Evaluation Accountability) was then determined by summing the following products: 4 number of Excellent ratings, 3 number of Very Good ratings, 2 number of Good ratings, and 1 number of Fair ratings, then dividing the sum by the product of the number of standards in the category 4. Judgments of each approachs strength in satisfying each category of standards were then determined according to percentages of the possible quality points for the category of standards as follows: 92%100% Excellent, 67%91.99% Very Good, 42%66.99% Good, 17%41.99% Fair, 0% 16.99% Poor. The final percentage scores were obtained by multiplying the initial decimal point score obtained for each category and overall by 100. The four equalized percentage scores were then summed and divided by 5. The result was then judged by comparing it to the total maximum score, 100%. Each approachs overall merit and merit for each category of standards was judged as follows: [92% 100%] Excellent, [67%91.99%] Very Good, [42%66.99%] Good, [17%41.99%] Fair, [0%16.99%] Poor. The first listed authors ratings were based on his knowledge of the Joint Committee on Standards for Educational Evaluation (1981, 1994, 2011) The Program Evaluation Standards, his many years of studying the various evaluation models and approaches, his personal acquaintance and collaborative evaluation work with authors/leading proponents of most of the assessed legitimate approaches (i.e., Brinkerhoff, Campbell, Eisner, Guba, House, Linn, Madaus, Millman, Owens, Rogers, Sanders, Scriven, Stake, Tyler, and Worthen) and his experience in seeing and assessing how all of these approaches worked in practice. He chaired the Joint Committee on Standards for Educational Evaluation during its first 13 years and led the development of the first editions of both The Program Evaluation Standards (Joint Committee on Standards for Educational Evaluation, 1981) and The Personnel Evaluation Standards (Joint Committee on Standards for Educational Evaluation, 1988). Nevertheless, his ratings should be viewed as only his personal set of judgments of these models and approaches. Also, his conflict of interest is acknowledged, since he developed the CIPP approach. The second listed authors ratings were based on his teaching of the full

435

range of approaches within the context of a doctoral-level evaluation theory course, as well as doctoral-level experimental and quasiexperimental design and meta-analysis courses, as part of the Western Michigan University Interdisciplinary Ph. D. in Evaluation program, his national and international applications of a number of the approaches, and his collaborations with such authors as Cook, Cousins, Davidson, Scriven, Patton, Stufflebeam, and Brinkerhofff. Readers may wish to consider, as a possible source of conflict of interest, that Michael Scrivenauthor of the consumer-oriented approachwas the major professor for the second authors doctoral studies. After each of this books authors rated each of the nine selected evaluation approaches following the method described above, they systematically reviewed and discussed their different sets of ratings and reached consensus judgments wherever they found discrepancies in ratings of each approachoverall and for Utility, Feasibility, Propriety, Accuracy, and Evaluation Accountability. In this process, they also sought to reach a + or determination for those checkpoints that one or the other of them had assigned a ?. This books two authors ultimately reached the consensus judgments presented above. The scale ranges in the above Figure are P = Poor, F = Fair, G = Good, VG = Very Good, E = Excellent.

436

The ratings are based on the Joint Committee on Standards for Educational Evaluation (2011) The Program Evaluation Standards. We arrived at the ratings through use of Stufflebeams (2011b) checklist keyed to The Program Evaluation Standards, which is available both from the books website at

http://www.josseybass.com/go/evalmodels and The Evaluation Center website at http://www.wmich.edu/evalctr/checklists. Stufflebeams (2011b) checklist

essentially reflects a systematic content analysis of the Joint Committee on Standards for Educational Evaluation (2011) The Program Evaluation Standards and consists of 180 discrete checkpoints (48 for the utility standards, 24 for the feasibility the standards, 42 for the propriety standards, 48 for the accuracy standards, and 18 for the evaluation accountability standards). Ratings were completed for each of The Program Evaluation Standards (Joint Committee on Standards for Educational Evaluation, 2011) domains (i.e., utility, feasibility, propriety, accuracy, and evaluation accountability) and overall. The two authors of this book first independently rated each of the nine approaches against each of the 30 standards by deciding whether or not the approach met each of the six checkpoints. Assignments for each checkpoint were assigned a + if the approach fulfilled the requirement, a if not, or a ? if it was unclear whether or not the approach embraced or addressed the particular requirement. In using the ratings to arrive at judgments of each approachfor each category and overallonly +s were scored. Subsequently we jointly reviewed our ratings, discussed and resolved discrepancies, attempted to reach determinations for checkpoints that had been assigned a ?, and finally produced Figure 11.1. 437

In general, the ratings shown in Figure 11.1 are somewhat lower than the ratings recorded in a similar figure in the 2007 edition of this book (Stufflebeam & Shinkfield, 2007). The disparity is evident in Table 11.1 which contrasts overall 2007 and 2011 ratings given to the eight approaches that were assessed in Stufflebeam and Shinkfields 2007 edition of this book. (Note that the recently developed SCM was not rated in the 2007 edition.) Table 11.1 Comparison of 2007 and 2011 Ratings of Eight Evaluation Approaches
Evaluation Approach Case study Experimental and quasi-experimental design Objectives-based CIPP Consumer-oriented Constructivist Responsive/Client-centered Utilization-focused 2007 Ratings 81.00% 56.00% 62.00% 92.00% 84.00% 81.00% 84.00% 86.00% 2011 Ratings 54.91% 51.34% 43.13% 85.68% 65.64% 71.28% 66.28% 71.64%

The lower ratings of overall merit in this edition seem partially due to changes in the Joint Committee on Standards for Educational Evaluation (2011) The Program Evaluation Standards, especially the addition of a 5th category labeled Evaluation Accountability. Figure 11.1 reveals that all nine evaluation approaches received substantially lower ratings in this category than in the other four categories of standards. Another possibility for the lower ratings in this edition is that the newly revised standards (Joint Committee on Standards for Educational Evaluation, 2011) may be more demanding than their 1994 predecessors (Joint Committee on Standards for Educational Evaluation).

438

Another useful contrast to the current and former (Stufflebeam & Shinkfield, 2007) ratings is Stufflebeams (2001a) widely cited Evaluation Models monograph in which a similar method was employed to evaluate the relative merits of various evaluation approaches against the 2nd edition of The Program Evaluation Standards (Joint Committee on Standards for Educational Evaluation, 1994). It seems clear that some of the new The Program Evaluation Standards (Joint Committee on Standards for Educational Evaluation, 2011) standards are at odds with what proponents of certain evaluation approaches consider to be best evaluation practice. For example, certain requirements in the 2011 The Program Evaluation Standards (Joint Committee on Standards for Educational Evaluation) standards for detailed, pre-ordinate planning of evaluation procedures clearly is contrary to the requirement for flexibility in the approaches advocated by the social agenda and advocacy approaches; authors of many of these approaches recommend an emergent, developing, evolving process for evaluation planning and implementation. Also, the new The Program Evaluation Standards (Joint Committee on Standards for Educational Evaluation, 2011) place a pervasive, strong emphasis on a culturally sensitive and pluralistic approach to evaluation, including, it seems, empowerment of all stakeholders to exercise substantial control over evaluation planning, operations, and reporting. While the social agenda and advocacy evaluation approaches are congenial to stakeholder involvement and influence, other approaches, especially the experimental and quasi-experimental design, the CIPP, and consumer-oriented evaluation approaches see delegation of influence and control over evaluation 439

matters to stakeholders as a threat to an evaluations independence, rigor, and credibility. To some extent, certain negative judgments of evaluation approaches based on the 2011 The Program Evaluation Standards (Joint Committee on Standards for Educational Evaluation) may in some quarters be viewed as unwarranted or at least questionable, with the resulting interpretation that some aspects of the new The Program Evaluation Standards (Joint Committee on Standards for Educational Evaluation) themselves may be unacceptable to certain credible experts in the evaluation field. In any case, readers should exercise circumspection in viewing ratings based on the 2011 The Program Evaluation Standards (Joint Committee on Standards for Educational Evaluation) that are decidedly lower than the similar ratings that appeared in the 2007 (Stufflebeam & Shinkfield) edition of this book. Before proceeding, two other cautionary statements are in order. First, the ratings in Figure 11.1 are based not directly on the Joint Committee on Standards for Educational Evaluation (2011) The Program Evaluation Standards, but on Stufflebeams (2011b) Program Evaluations Metaevaluation Checklist. The Joint Committee on Standards for Educational Evaluation has neither reviewed nor sanctioned this checklist as to its correct representation of the 3rd edition of The Program Evaluation Standards (Joint Committee on Standards for Educational Evaluation (2011). Nevertheless, we stand behind Stufflebeams (2011b) checklist as a carefully and systematically developed assessment tool whose contents were derived from his careful content analysis of the 3rd edition of The Program Evaluation Standards (Joint Committee on Standards for Educational Evaluation, 2011) and experience over the years in developing previous editions 440

of The Program Evaluation Standards and helping with the development of the 2011 edition. Second, the ratings in 2007 were determined by Stufflebeam and Shinkfield, while the 2011 ratings were arrived at by Stufflebeam and Coryn. It is noteworthy that the Pearson product-moment correlation coefficient between the 2007 and 2011 overall ratings in Figure 11.1 was =.85 and, despite the changes in The Program Evaluation Standards (Joint Committee on Standards for Educational Evaluation, 2011) and the differing pairs of raters, a relatively high degree of consistency was observed between the rank order of approaches in the 2007 and 2011 ratings (Spearmans rank correlation coefficient was =.87). The preceding discussion of caveats and cross-year comparison of ratings aside, we turn now to a systematic look at the selected nine evaluation approaches against the checklist interpretation of the requirements of the 2011 The Program Evaluation Standards (Joint Committee on Standards for Educational Evaluation). We do so, because in the main we see the Joint Committee on Standards for Educational Evaluation (2011) The Program Evaluation Standards as a carefully thought through, professionally developed, widely accepted, and nationally accredited set of requirements for program evaluations that are useful, feasible, ethical, accurate, and accountable. Also, we believe the employed metaevaluation checklist (Stufflebeam, 2011b) will stand up to professional scrutiny and, indeed, we invite independent assessments of the checklist. The CIPP, utilization-focused, and constructivist evaluation approaches earned overall ratings of Very Good, while the other six approaches were judged Good, overall. These results suggest that all nine approaches are at least 441

minimally acceptable for evaluating programs but that all of them have room for improvement. The top-rated CIPP (85.68%), utilization-focused (71.64%), and constructivist (71.28%) evaluation approaches provide strong options for evaluators who prefer either an improvement- and accountability-oriented approach, eclectic approach, or a social agenda or advocacy approach. It is noted that the objectives-based (43.13%) and experimental and quasi-experimental design (51.34%) options are least favorable for meeting the full range of The Program Evaluation Standards (Joint Committee on Standards for Educational Evaluation, 2011), because they scored low in the Good Range. Also, it is worth considering that the responsive/client-centered (66.28%) and consumer-oriented approach (65,86%) scored high in the range of Good ratings. If one is looking to maximize an evaluations utility, the CIPP approach is the clear choice, with its Utility rating of 100%. Other especially very good choices in the Utility category are the constructivist (84.38%) and utilization-focused (81.25%) approaches. Experimental and quasi-experimental design (28.13%) is the lowest rated approach in this category, and scored a rating of Fair. Also, the objectives-based (50.00%) and case study (40.63%) approaches scored relatively low in the Utility category and are not recommended when one wants to maximize an evaluations utility and impact, including provision of systematic feedback during a programs development and operation. The consumer-oriented (68.75%) approachs rating in the lower end of the Very Good range reflects this approachs emphasis on insulating summative evaluations from stakeholder involvement and influence and also its stronger emphasis on providing end-ofprogram, summative feedback than on providing feedback throughout a 442

programs process. The 81.25% rating for the utilization-focused evaluation approach against the Utility standards is, in part, a reflection of its selectiveness in serving only targeted stakeholders and potentially not serving some other right-to-know audiences (i.e., a select group of intended users). The CIPP evaluation approach was rated Excellent (93.75%) in addressing standards of Feasibility, with the other eight approaches receiving ratings of Very Good or Good. The utilization-focused (87.50%), SCM (81.25%), and constructivist (81.25%) approaches were all rated Very Good in addressing the Feasibility standards. The objectives-based approach (62.5%) received a Feasibility rating of Good. The experimental and quasi-experimental design (50.00%) approach was judged to be the least feasible of the nine approaches, owing especially to its rigid application of separate treatments and requirements for randomization of subjects. In the Propriety category no approach received a rating of Excellent. The CIPP (85.71%), case study (71.43%), and constructivist (67.86%) approaches were judged Very Good in this category. The rating for the objectives-based approach (25.00%) was in the lower end of the range of Fair ratings, largely because this approach focuses almost exclusively on program goals, does not validate the goals through historical analysis and needs assessment, and is inadequate in addressing such propriety considerations as advance contracting, potential conflicts of interest, and engagement of and feedback to the full range of stakeholders throughout the evaluation process. The consumer-oriented (67.86%), responsive and client-centered (57.14 %), utilization-focused (56.25%), experimental and quasi-experimental design (53.57%), and SCM (50.00%) 443

approaches were judged Good. Again, the utilization-focused approach earned a comparatively low rating in propriety because of its dedication to engaging and serving only a selected subset of a programs full range of stakeholders. In the Accuracy category, ratings of Very Good were assigned to the CIPP (90.63%), constructivist (81.25%), experimental and quasi-experimental design (75.00%), case study (75.00%), (71.88%) consumer-oriented approaches, with (71.88%), the other and three

responsive/client-centered

approachesutilization-focused (65.63%), SCM (62.50%), and objectives-based (53.13%)receiving ratings of Good. The comparatively lower rating for the objectives-based approach is largely due to its narrow focus on main effects related to the developers goals and little attention to context and side effects. In the (new) Evaluation Accountability category none of the nine approaches received a rating of Excellent or Very Good. Ratings of Good were received by the utilization-focused (66.67%), CIPP (58.33%), consumer-oriented (58.33%), experimental and quasi-experimental design (50.00%), constructivist (41.67%), and responsive/client-centered (41.67%) approaches. The SCM (33.33%), case study (25.00%), and objectives-based (25.00%) approaches were rated as Fair. The Joint Committee on Standards for Educational Evaluation added the Evaluation Accountability group of standards to the 2011 edition of the standards because, by and large, evaluators have performed poorly in documenting, assessing, and securing independent metaevaluations of their evaluations. The relatively low ratings of the nine selected approaches in the Evaluation Accountability category reported here seem to support the need for standards, strengthened approaches, and better evaluator performance pertaining to the 444

three standards of Evaluation Accountability: Evaluation Documentation, Internal Metaevaluation, and External Metaevaluation. However, evaluators may justifiably disagree with the External Metaevaluation standards requirement that the evaluator engage and assure the quality of an external metaevaluation; arguably it is more appropriate for the evaluator to recommend that the client do this and then cooperate with the external metaevaluators, not by overseeing and controlling their work (which would entail a possible conflict of interest for the evaluator) but by supplying them with needed information. In rounding out this chapter a few comments about salient features of each approach are in order. The SCM (57.92%) and case study (54.91%) approaches both scored overall in the middle of the range of Good ratings. Of the two, the SCM was shown to be more useful62.50% versus 40.63%than the straight case study approach. We think this is so because the former is directly oriented to discovering particular program strengths that may be hard to detect and whose detection might help a program staff to preserve and strengthen a program that otherwise could be headed for termination. In contrast, the case study approach seeks to give a rich account of a program and not, so much, to issue judgments related to saving or strengthening a program. These two approaches ratings in the Feasibility, Accuracy, and Evaluation Accountability sections of The Program Evaluation Standards (Joint Committee on Standards for Educational Evaluation, 2011) are fairly comparable. Regarding Propriety, the Case Study approach gets the clear edge71.43% versus 50.00%partially because the SCM approach starts out with a built-in bias to find and report successes and is more geared to serving a program director and staff than the full range of program 445

stakeholders. The experimental and quasi-experimental design (51.34%) and objectivesbased (43.13%) approaches overall rated at the lower end of the range of Good ratings. We believe both approaches have limited applicability in the broad range of program evaluation assignments. The comparatively lower overall rating given to the experimental and quasi-experimental design approach resulted especially from its rating of Fair for Utility (28.13%). For many evaluation assignments, this approach would not provide program staff members with continuing feedback for program improvement; in program evaluation applications in the field, the experimental and quasi-experimental design approach often proves to be

impractical, vulnerable to political problems, and not cost-effective. Its lower rating (75.00%) in the Accuracy category than one might have expected (although in the Very Good range) is due more to its narrow focus on a few dependent variables and lack of information on context and process, than on the quality of the obtained, focal outcomes. The overall rating of Good for the objectives-based (43.13%) approach is somewhat misleading, since it scored at the very bottom of the range of Good ratings. This poor showing reflects the approachs narrow focus on objectives, provision of only terminal information, and lack of attention to unanticipated outcomes. For most program evaluation assignments, evaluators are advised to seek a better approach than either the experimental and quasi-experimental design or objectives-based evaluation approaches. SCM and case study are our methods of choice in the questions- and methods-oriented category of evaluation approaches, although, in comparison to other approaches, they are, nonetheless, 446

quite weak choices relative to others. The improvement- and accountability-oriented approaches rated slightly better overall than the questions- and methods-oriented, social agenda and advocacy, and eclectic approaches. The CIPP approach scored well overall (85.68%), Excellent for Utility (100%), Excellent for Feasibility (93.75%), Very Good for Propriety (85.71%), and Accuracy (90.63%), and in the middle of the Good range for Evaluation Accountability (58.33%). This approach offers comprehensiveness in assessing all stages of program development and all aspects of a program, serving the full range of stakeholders, employing multiple quantitative and qualitative methods, providing for formative and summative uses of findings, being oriented to both program improvement and accountability, addressing all 30 of The Program Evaluation Standards (Joint Committee on Standards for Educational Evaluation, 2011), requiring at least internal metaevaluation, and, especially, being grounded in advance agreements keyed to stakeholder needs and professional standards of evaluation. The approachs relatively lower rating (58.33%) in the Evaluation Accountability category reflects the approachs disagreement with the External Metaevaluation standard, which thrusts the evaluator into a conflict of interest situation by requiring the evaluator rather than the client to select, engage, and oversee an external metaevaluators work. The consumer-oriented approach scored near the upper end of Good overall (65.86%) and from Good to Very Good on the Utility (68.75%), Feasibility (62.50%), Propriety (67.86%), and Accuracy (71.88%) standards. The consumeroriented approach is a strong option for clients and evaluators who desire to 447

emphasize independent, unimpeachable assessment of developed products and services. Although the approach is not strongly suited to internal evaluations for improvement, it complements such approaches with an outsider, expert view that becomes important when products and services are put up for dissemination. This approach depends on a highly skilled evaluator who strongly guards independence and separation from program personnel. Paradoxically, the approach depends on program personnel for much of the information needed for the evaluation, which tends to be its Achilles heel. As a cautionary note, a high degree of evaluator independence from program personnel can discourage the extensive amount of stakeholder support that the consumer-oriented evaluator often needs. This psychological distance also can discourage program personnel from using external evaluation findings, but it can be reassuring to external audiencesespecially those who pay for the program or use its products and services. This approachs relatively high rating on Utility (68.75%) is due not to strong impact on the actions of program personnel during program implementation but to the high degree of credibility consumers external to a program place on independent evaluations. The two social agenda and advocacy approaches generally scored well, definitely ahead of the questions- and methods-oriented approaches. The constructivist approach (71.28% overall) is a well-founded, mainly qualitative approach to evaluation that systematically engages interested parties to help conduct both the divergent and convergent stages of evaluation. It strongly advocates for the least powerful and poorest members of program stakeholders. It tends to be utopian and thus unrealistic, which is acknowledged by its creators. 448

Also, its provision for ongoing negotiation with a wide range of stakeholders through hermeneutic and consensus building processesengenders difficulty in reaching closure under a framework of multiple values and multiple realities. Nevertheless, this approach earned quite acceptable ratings in Utility (84.38%), Feasibility (81.25%), Propriety (67.86%), and Accuracy (81.25%). Its rating of 41.67% for Evaluation Accountability was low but similar to the other rated approaches, reflects the evaluation fields historic poor performance in documenting and evaluating evaluations, and, we think, reflects the External Metaevaluation standards unrealistic expectation that evaluators be responsible for securing and managing extern evaluations of their own work. The responsive/client-centered approach received quite acceptable ratings overall (66.28%) and for Utility (81.25%), Feasibility (68.75%), Propriety (57.14%), and Accuracy (71.88%). Like the constructivist evaluation approach, this approach received a low rating (41.67%) in the area of Evaluation Accountability. (Our above commentary about the dubious requirement than an evaluator secure and control external evaluation of his own work is applicable here.) In contrast to the independence of consumer-oriented evaluation, the responsive approach engenders close collaboration between evaluator and program personnel and other stakeholders. This results in easier access to needed information and stakeholders better acceptance, support, and use of an evaluation. This approach has the advantage of systematically informing and assisting ongoing development and operations. It is also strong in searching for unintended consequences. Its comparatively lower rating in the Propriety category reflects its lack of provision for advance formal contracting for 449

evaluation, lack of focus on meeting published, professional standards for sound evaluations, and quite weak approach to identifying and addressing conflicts of interest. Aside from these points the approach is credible in the area of propriety, especially with its strong orientation evenhandedly to engaging and serving the full range of stakeholders. Finally, the utilization-focused evaluation approach is a ubiquitous, umbrella approach to evaluation. It received an overall rating of Very Good (71.64%), ratings of Very Good in Utility (81.25%) and Feasibility (87.50%), and ratings of Good in the Evaluation Accountability (66.67%), Accuracy (65.63%), and Propriety (57.14%) categories. Its main objective is to get evaluation findings used and accordingly rates high on Utility. As noted earlier, the approachs lower rating on Utility than one might expect is due to its mission of serving exclusively a pre-defined, set of stakeholdersleaving the possibility for some right-to-know audiences not being served. This approach also rates high on Feasibility, since stakeholders are invited to help assure that the study will fit well in their programs environment and help choose elements with which they are comfortable. The approach emphasizes efficiencies, including using existing information and incorporating insider knowledge into the evaluation process. Also, it enhances ease of use by trading off the requirement to serve all stakeholders by specifically targeting only intended evaluation uses and users and therefore does not have to deal with a broader range of stakeholders and can keep the studys focus where the users want it. The relatively lower but Good rating in Propriety (57.14%) is largely due to the approachs limiting of service possibly to only a subset of right-to-know stakeholders. The Good rating in Accuracy 450

(65.53%) is largely due to the approachs sanction for not necessarily producing a printed report. Accordingly, a utilization-focused evaluation might not produce a written report that documents and justifies an evaluations technical merit, adherence to professional standards, and chain of reasoning that led to an evaluations conclusions. Nevertheless, the approach is viewed as strong in respect to exercising rigor in collecting and analyzing both qualitative and quantitative information and in getting evaluation findings used.

Summary
All in all, the nine approaches summarized in Figure 11.1 bode well for the future application and further development of alternative approaches to program evaluation. The last half of the 20th century saw considerable development of program evaluation approaches. Based on the ratings presented in this chapter, evaluators and their clients can choose from an array of strong, creditable evaluation approaches. Nevertheless, it is essential to stress that even the strongest of the nine approaches have considerable room for improvement. We think Figure 11.1 can be profitably used to identify and target areas in each approach for needed improvements. Clearly, all of the approaches should be strengthened in the vital area of Evaluation Accountability. However, we see a serious problem regarding the credibility of the Joint Committee on Standards for Educational Evaluation (2011) External

Metaevaluation standard. As mentioned repeatedly in this chapter, it seems that this standard imposes an unsound requirement on evaluations, to wit evaluators are charged to select, engage, oversee, and control the quality of external metaevaluations of their own work. Clearly, such evaluator-controlled 451

metaevaluation practice entails conflict of interest and credibility problems for an evaluator. If an evaluator controls external evaluations of their evaluations, who would not perceive that the resulting external assessments are biased in order to put the evaluation in a favorable light? To be sure, evaluators should do all they can to foster credible, independent assessments of their evaluations. However, in contrast to the advice contained in the Joint Committee on Standards for Educational Evaluation (2011) External Metaevaluation standard, it seems much more professionally sound for an evaluator to encourage an evaluations client or funder to select, fund, engage, oversee, and control the quality of an external metaevaluators work. We hope the Joint Committee on Standards for Educational Evaluation will seriously consider the constructive spirit in which this commentary is offered when they next revise the The Program Evaluation Standards. In this book, 30 evaluation approaches were grouped as pseudoevaluations, questions- and methods-oriented evaluations, improvement- and accountabilityoriented evaluations, social agenda and advocacy evaluations, and eclectic evaluations. Apart from pseudoevaluations, there is among the approaches an increasingly balanced quest for rigor, relevance, and justice. Clearly the approaches are showing a strong orientation to stakeholder involvement, the use of multiple methods, and impact of evaluation findings. When compared with professional standards for program evaluations, the best approaches are the CIPP, consumer-oriented, constructivist, utilizationfocused, and responsive/client-centered approaches. All of these approaches are recommended for consideration in program evaluations. Typically, we believe 452

that better alternatives can be found to objectives-based and experimental and quasi-experimental design approaches. A critical analysis of evaluation approaches has important implications for evaluators, those who train evaluators, theoreticians concerned with devising better concepts and methods, and those engaged in professionalizing program evaluation. Adherence by these groups to well-constituted and widely accepted principles and standards for evaluations is relevant and important. A major consideration for the practitioner is that evaluators may encounter considerable difficulties if their perceptions of the study being undertaken differ from those of their clients and audiences. Frequently clients want a politically advantageous study performed, while the evaluator wants to conduct questions- and methodsoriented studies that allow him or her to exploit the methodologies in which he or she was trained. Moreover, audiences usually want values-oriented studies that will help them determine the relative merits and worth of competing programs or advocacy evaluations that will give them voice and control in the issues that affect them. If evaluators ignore the likely conflicts in purposes, their program evaluations are probably doomed to fail. At an evaluations outset, evaluators must be keenly sensitive to their own agendas for a study, as well as those that are held by clients and the other right-to-know audiences. Evaluators should advise involved parties of possible conflicts in an evaluations purposes and should, at the beginning, negotiate a common understanding of an evaluations purpose and the appropriate approach. Evaluators also should inform participants regularly of the selected approachs logic, rationale, process, and pitfalls. This will enhance 453

stakeholders cooperation and careful, constructive, appropriate use of findings. Using negotiation skills, an evaluator should accommodate stakeholder interests and concerns while maintaining the integrity of an evaluation. At the outset of an evaluation, a crucial step in this balancing activity is for the evaluator and client group to reach agreement on the principles and standards that will guide and govern the evaluation. We strongly advise evaluators to advise their clients to adopt the American Evaluation Association Guiding Principles for Evaluators (2004) and Joint Committee on Standards for Educational Evaluation The Program Evaluation Standards (2011) as the criteria for guiding and judging an evaluation. Moreover, at an evaluations outset we advise evaluators and clients to reach clear, printed agreements that stipulate an evaluations purpose and timeline and the full range of conditions required to fund, staff, conduct, report, and assess an evaluation in order to fulfill stipulated professional standards for sound evaluations. In short, evaluations should be grounded in sound, negotiated contracts. Evaluation training programs should effectively address the ferment over and development of new program evaluation approaches. Trainers should provide their students with both instruction and field experiences in applying and assessing these approaches. When students fully understand the approaches; see how they are assessed against professional standards and guiding principles; and gain relevant, practical experience in using and assessing the approaches, they will be in a position to discern which approaches work best under which sets of circumstances. For the theoretician, a main point is that all the approaches have inherent 454

strengths and weaknesses. In general, the weaknesses of pseudoevaluation studies are that they are vulnerable to conflicts of interest and may mislead an audience to develop an unfounded, perhaps erroneous judgment of a programs merit and worth. The main problem with the questions- and methods-oriented studies is that they often address questions that are too narrow to support a full assessment of merit and worth and maybe even only questions of interest to a programs developer. However, it is also noteworthy that these types of studies compete favorably with improvement- and accountability-oriented evaluation studies, social agenda and advocacy studies, and eclectic studies in the efficiency and rigor of methodology employed. Improvement- and accountability-oriented studies, with their concentration on merit and worth, undertake an ambitious task, for it is virtually impossible to fully and unequivocally assess any programs ultimate worth. Such an achievement would require omniscience, infallibility, an unchanging

environment, and an unquestioned, singular value base. Nevertheless, the continuing attempt to address questions of merit and worth is essential for the advancement of societal programs. The social agenda and advocacy studies are to be applauded for their quest for equity and responsiveness in programs being studied. They model their mission by attempting to make evaluation a participatory, democratic enterprise. Unfortunately, many pitfalls attend such utopian approaches. These approaches are especially susceptible to bias, and they face practical constraints in involving, informing, empowering targeted stakeholders and in getting evaluations done on 455

time and within budget, and in reaching bottom-line conclusions. Finally, the eclectic approaches are attractive because of their focus on getting findings used and their resourcefulness in engaging selected stakeholders to pragmatically select and apply conceptual frameworks, criteria, and procedures from all other relevant approaches. Yet any approach that is overly pragmatic and gives away too much authority to program stakeholders can fail to meet standards of ethics and technical rigor. Also, given its exclusive focus on intended uses and users, the utilization-focused evaluation approach can result in evaluations that provide poor service to some members of the full range of rightto-know stakeholders. For the evaluation profession itself, the review of program evaluation approaches underscores the importance of guiding principles, professional standards, and metaevaluations. Guiding principles and standards are needed to maintain a consistently high level of integrity in uses of the various approaches. All legitimate approaches are enhanced when evaluators key their studies to accredited principles and standards for evaluation and obtain independent reviews of their evaluations. Moreover, continuing attention to the requirements of principles and standards will provide valuable direction for developing better approaches. The Joint Committee on Standards for Educational Evaluation is to be applauded for extending The Program Evaluation Standards (2011) to include new standards for evaluation documentation, internal metaevaluation, and external metaevaluation. It is now up to evaluation theoreticians, practitioners, and trainers and professional evaluation organizations to move aggressively to implement and, where needed, seek improvement of the new 456

evaluation accountability standards. With this consumer report and analysis of selected evaluation approaches, we conclude Part Two of this book. In Part Three, we describe a program evaluation situation requiring selection and application of an evaluation approach. We then provide in-depth information on six of the approaches just reviewed and consider how each could be applied to address the illustrative evaluation assignment.

Review Questions
The first six review exercises are, in part, preparation for the first group discussion question that follows. Having perused Chapters 7 through 10, write a brief paragraph identifying the main intentions of each of the evaluation approaches below in questions #1 to #6: 1. Success Case Method 2. CIPP model 3. Consumer-oriented 4. Responsive and client-centered 5. Constructivist 6. Utilization-focused 7. Why have the authors rated the objectives-based approach comparatively low on compliance with The Program Evaluation Standards with respect to the Utility and Propriety standards? 8. Are you able to support the Very Good rating of the experimental and quasi-experimental design approach in respect to compliance with the Accuracy standards but Fair rating in the Utility category? Why or why not? 457

9. In reference to the evaluation approaches discussed in this chapter, apart from pseudoevaluations, there is among the approaches an increasingly balanced quest for rigor, relevance, and justice. Justify or refute this statement. 10. What are the inherent weaknesses of social agenda and advocacy evaluations, and also of improvement- and accountability-oriented studies?

Group Exercises
Exercise 1. This exercise will have greater benefit for group if members they prepare their responses in advance of the meeting. If the exercise is undertaken thoroughly, it should serve two main purposes: first, a further study of the six methods and approaches listed in Review Questions #1 through #6 and in-depth study of one of these, and second, a review of Chapter 3 (Standards for Program Evaluation) and a practical application of the standards. This exercise focuses on the following approaches to program evaluation: SCM, CIPP, consumer-oriented, responsive/client-centered, constructivist, and utilization-focused. Each group member is allocated one of these six. If there are more than six group members, a particular approach will be allocated more than once. Refer to Figure 11.1 and study the procedures underlying the ratings used in it (at first glance, they may appear complicated, but in reality the method is quite straightforward). Using the same methodology, give a rating for your allocated approach on each of the Utility, Feasibility, Propriety, Accuracy, and 458

Evaluation Accountability standards and an overall rating. You will need to download and copy the Program Evaluation Metaevaluation Checklist from the books website at http://www.josseybass.com/go/evalmodels. Clearly, subjectivity and degree of experience in program development and evaluation will play a part in your ratings. However, knowledge of the approach and the exact nature (by definition and example) of each of the 30 standards will provide very useful parameters for your decision making. As a group discuss: The proximity of each members ratings to the authors Possible reasons for any wide divergences The benefits of knowing and using, The Program Evaluation Standards and the associated the Program Evaluation Metaevaluation Checklist

Exercise 2. The comment is made in this chapter that if evaluators ignore the likely conflicts in purposes, their program evaluations are probably doomed to fail. Discuss the ramifications of this statement.

Suggested Supplementary Readings


American Evaluation Association. (2004). Guiding principles for evaluators. Available at http://www.eval.org/Publications/GuidingPrinciples.asp. Joint Committee on Standards for Educational Evaluation. (2011). The program evaluation standards: A guide for evaluators and evaluation users (3rd ed.). Thousand Oaks, CA: Sage. Stufflebeam, D. L. (2001a). Evaluation models. New Directions for Evaluation, 89. San Francisco, CA: Jossey-Bass. Stufflebeam, D. L., & Shinkfield, A. J. (2007). Evaluation theory, models, & 459

applications. San Francisco, CA: Jossey-Bass. Stufflebeam, D. L. (2011b). Program evaluations metaevaluation checklist.

460