Concepts of testing, measurement, assessment and evaluation
Testing An instrument or systematic procedure for measuring a sample of behaviour by posing a set of questions in a uniform manner. Because a test is a form of assessment, tests also answer the question, How well does the individual perform either in comparison with others or in comparison with a domain of performance tasks? Measurement The process of obtaining a numerical description of the degree to which an individual possesses a particular characteristic. Measurement answers the question, How much?. Example:- When the teacher calculates the percentage of problems of student Geetha has correct, and gives her a score of 70/100, that is measurement. Test is used to gather information. That information is presented in the form of measurement. That measurement is then used to make evaluation. Assessment Any of a variety of procedures used to obtain information about student performance includes traditional paper and pencil tests as well as extended responses (e.g. essays) and performances of authentic tasks (e.g. laboratory experiments). Assessment answers the question, How well does the individual perform?. Note that term assessment is used to mean the same as evaluation. Concept of Evaluation Evaluation has a wider meaning. It goes beyond measurement. When from useful information including measurement, we make a judgement, that is evaluation. Example:- The teacher may evaluate the student Geetha that she is doing well in mathematics, because most of the class scored 50/100. This is an example of evaluation using quantitative data (measurable information). The teacher might also make an evaluation based on qualitative data, such as her observations that Geetha works hard, has an enthusiastic attitude towards mathematics and finishes her assignments quickly. Evaluation is a Science of providing information for decision making. It Includes measurement, assessment and testing It is a process that involves Information gathering Information processing Judgement forming Decision making From the above, we can arrive at the following concept of evaluation Evaluation is a concept that has emerged as a prominent process of assessing, testing and measuring. Its main objective is Qualitative Improvement. Evaluation is a process of making value judgements over a level of performance or achievement. Making value judgements in Evaluation process presupposes the set of objectives Evaluation implies a critical assessment of educative process and its outcome in the light of the objectives.
Purposes of Evaluation Evaluation is the process of determining the extent to which the objectives are achieved. It is Concerned not only with the appraisal of achievement, but also with its improvement. Evaluation is a continuous and a dynamic process. Evaluation helps in forming the following decisions.
Types of Decisions Instructional Curricular Selection Placement or Classification Personal Among the above decisions, we shall learn how evaluation assists a teacher in taking instructional decisions. Evaluation assists in taking certain instructional decisions like: 1. 1. to what extent students are ready for learning experience? 2. to what extent they can cope with the pace of Learning Experiences provided? 3. How the individual differences within the group can be tackled? 4. What are the learning problems of the students? 5. What is the intensity of such problems? 6. What modifications are needed in the instruction to suit the needs of students, etc. Evaluation is an integral part of teaching and learning process. This is explained in the following figure.
What should teachers learn about evaluation? 1. Choosing evaluation methods appropriate for instructional decisions; 2. developing methods appropriate for instructional decisions; 3. administering, scoring and interpreting the results of both externally produced and teacher-produced evaluation methods; 4. using evaluation results when making decisions about individual students, planning teaching, developing curriculum and school improvement, 5. developing valid pupil grading procedures, which use pupil assessments, 6. communicating evaluation results to students, parents, other lay audiences, and other educators; 7. recognizing unethical, illegal and otherwise inappropriate evaluation methods and uses of evaluation information.
Definition and Types of Evaluation Evaluation consists of objective assessment of a project, programme or policy at all of its stages, i.e. planning, implementation and measurement of outcomes. It should provide reliable and useful information allowing to apply the knowledge thus obtained in the decision making process. It often concerns the process of determination of the value or importance of a measure, policy or programme. According to the above indicated Regulation, the aim of evaluation is to improve: the quality, effectiveness and consistency of the assistance from the Funds and the strategy and implementation of operational programmes with respect to the specific structural problems affecting the Member States and regions concerned, while taking account of the objective of sustainable development and the relevant Community legislation concerning environmental impact and strategic environmental assessment. Evaluation as a process of systematic assessment of interventions financed from structural funds is continuously gaining increasing importance. In the programming period 2007-2013 the results of evaluation studies will play an important role in the process of shaping of the cohesion policy of the European Union, and during the debate on the next budget following after the current financial perspective after 2013 will belong to key arguments in favour of preserving it in the existing shape or of the verification of its presumptions. Categories of Evaluation According to the criterion of the purpose of evaluation, it is classified into the following categories: Strategic evaluation (with the purpose to assess and analyse the evolution of NSRF and OP with respect to national and Community priorities); Operational evaluation (with the purpose to support the process of NSRF and OP monitoring). Strategic evaluation concerns mainly the analysis and assessment of interventions at the level of strategic goals. The object of strategic evaluation consists of the analysis and appraisal of the relevance of general directions of interventions determined at the programming stage. One of the significant aspects of strategic evaluation consists of the verification of the adopted strategy with respect to the current and anticipated social and economic situation. Operational evaluation is closely linked to the process of NSRF and OP management and monitoring. The purpose of operational evaluation consists of providing support to the institutions responsible for the implementation of NSRF and OP with regards to the achievement of the assumed operational objectives by providing practically useful conclusions and recommendations. According to Regulation 1083/2006, operational evaluation should be carried out, in particular, in the case when monitoring has revealed significant deviations from the originally assumed objectives and when requests are submitted for the review of an operational programme or its part. From the point of view of timing of the performed evaluation, it is classified into the following types: ex ante evaluation (prior to the launch of NSRF or OP implementation), ongoing evaluation (in the course of NSRF or OP implementation), ex post evaluation (after completion of NSRF or OP implementation). The process of ex ante evaluation of NSRF and OP was completed in the year 2006. The results of ex-ante evaluations of NSRF and OP performed by external evaluators were taken into account in the final version of the National Strategic Reference Framework for 2007-2013 and of the different particular Operational Programmes.
Ex post evaluation is done by the European Commission in cooperation with the member states and with the Management Bodies. Regardless of the evaluation conducted by the European Commission, member states may perform ex post evaluation on their own account. Ongoing evaluation is a process with the purpose of arriving at better understanding of the current outcomes of intervention and the formulation of recommendations that would be useful from the point of view of programme implementation. In the next few years ongoing evaluation will become key for the effective cohesion policy implementation in Poland.
Evaluation Approaches & Types
There are various types of evaluations but two main philosophical approaches: formative and summative. After a brief introduction to these two approaches, we shall share several specific types of evaluations that fall under the formative and summative approaches. Formative evaluation is an on-going process that allows for feedback to be implemented during a program cycle. Formative evaluations (Boulmetis & Dutwin, 2005): Concentrate on examining and changing processes as they occur Provide timely feedback about program services Allow you to make program adjustments on the fly to help achieve program goals COMMON TYPES OF FORMATIVE EVALUATION Needs assessment determines who needs the program, how great the need is, and what might work to meet the need. Structured conceptualization helps stakeholders define the program, the target population, and the possible outcomes. Implementation evaluation monitors the fidelity of the program delivery. Process evaluation investigates the process of delivering the program, including alternative delivery procedures. Summative evaluation occurs at the end of a program cycle and provides an overall description of program effectiveness. Summative evaluation examines program outcomes to determine overall program effectiveness. Summative evaluation is a method for answering some of the following questions: Were your program objectives met? Will you need to improve and modify the overall structure of the program? What is the overall impact of the program? What resources will you need to address the programs weaknesses? Summative evaluation will enable you to make decisions regarding specific services and the future direction of the program that cannot be made during the middle of a program cycle. Summative evaluations should be provided to funders and constituents with an interest in the program. COMMON TYPES OF SUMMATIVE EVALUATION Goal-based evaluation determines if the intended goals of a program were achieved. Has my program accomplished its goals? Outcome evaluation investigate whether the program caused demonstrable effects on specifically defined target outcomes. What effect does program participation have on students? Impact evaluation is broader and assesses the overall or net effects intended or unintended of the program. What impact does this program have on the larger organization (e.g., high school or college), community, or system? Cost-effectiveness and cost-benefit analysis address questions of efficiency by standardizing outcomes in terms of their dollar costs and values. How efficient is my program with respect to cost?
Below is a figure depicting the different ways formative and summative evaluation can be utilized.
Introduction to Evaluation Evaluation is a methodological area that is closely related to, but distinguishable from more traditional social research. Evaluation utilizes many of the same methodologies used in traditional social research, but because evaluation takes place within a political and organizational context, it requires group skills, management ability, political dexterity, sensitivity to multiple stakeholders and other skills that social research in general does not rely on as much. Here we introduce the idea of evaluation and some of the major terms and issues in the field. Definitions of Evaluation Probably the most frequently given definition is: Evaluation is the systematic assessment of the worth or merit of some object This definition is hardly perfect. There are many types of evaluations that do not necessarily result in an assessment of worth or merit -- descriptive studies, implementation analyses, and formative evaluations, to name a few. Better perhaps is a definition that emphasizes the information-processing and feedback functions of evaluation. For instance, one might say: Evaluation is the systematic acquisition and assessment of information to provide useful feedback about some object Both definitions agree that evaluation is a systematic endeavor and both use the deliberately ambiguous term 'object' which could refer to a program, policy, technology, person, need, activity, and so on. The latter definition emphasizes acquiring and assessing information rather than assessing worth or merit because all evaluation work involves collecting and sifting through data, making judgements about the validity of the information and of inferences we derive from it, whether or not an assessment of worth or merit results. The Goals of Evaluation The generic goal of most evaluations is to provide "useful feedback" to a variety of audiences including sponsors, donors, client-groups, administrators, staff, and other relevant constituencies. Most often, feedback is perceived as "useful" if it aids in decision-making. But the relationship between an evaluation and its impact is not a simple one - - studies that seem critical sometimes fail to influence short-term decisions, and studies that initially seem to have no influence can have a delayed impact when more congenial conditions arise. Despite this, there is broad consensus that the major goal of evaluation should be to influence decision-making or policy formulation through the provision of empirically-driven feedback. Evaluation Strategies 'Evaluation strategies' means broad, overarching perspectives on evaluation. They encompass the most general groups or "camps" of evaluators; although, at its best, evaluation work borrows eclectically from the perspectives of all these camps. Four major groups of evaluation strategies are discussed here. Scientific-experimental models are probably the most historically dominant evaluation strategies. Taking their values and methods from the sciences -- especially the social sciences -- they prioritize on the desirability of impartiality, accuracy, objectivity and the validity of the information generated. Included under scientific-experimental models would be: the tradition of experimental and quasi-experimental designs; objectives-based research that comes from education; econometrically-oriented perspectives including cost-effectiveness and cost-benefit analysis; and the recent articulation of theory-driven evaluation. The second class of strategies are management-oriented systems models. Two of the most common of these are PERT, the Program Evaluation and Review Technique, and CPM, the Critical Path Method. Both have been widely used in business and government in this country. It would also be legitimate to include the Logical Framework or "Logframe" model developed at U.S. Agency for International Development and general systems theory and operations research approaches in this category. Two management-oriented systems models were originated by evaluators: the UTOS model where U stands for Units, T for Treatments, O for Observing Observations and S for Settings; and the CIPP model where the C stands for Context, the I for Input, the first P for Process and the second P for Product. These management-oriented systems models emphasize comprehensiveness in evaluation, placing evaluation within a larger framework of organizational activities. The third class of strategies are the qualitative/anthropological models. They emphasize the importance of observation, the need to retain the phenomenological quality of the evaluation context, and the value of subjective human interpretation in the evaluation process. Included in this category are the approaches known in evaluation as naturalistic or 'Fourth Generation' evaluation; the various qualitative schools; critical theory and art criticism approaches; and, the 'grounded theory' approach of Glaser and Strauss among others. Finally, a fourth class of strategies is termed participant-oriented models. As the term suggests, they emphasize the central importance of the evaluation participants, especially clients and users of the program or technology. Client-centered and stakeholder approaches are examples of participant-oriented models, as are consumer-oriented evaluation systems. With all of these strategies to choose from, how to decide? Debates that rage within the evaluation profession -- and they do rage -- are generally battles between these different strategists, with each claiming the superiority of their position. In reality, most good evaluators are familiar with all four categories and borrow from each as the need arises. There is no inherent incompatibility between these broad strategies -- each of them brings something valuable to the evaluation table. In fact, in recent years attention has increasingly turned to how one might integrate results from evaluations that use different strategies, carried out from different perspectives, and using different methods. Clearly, there are no simple answers here. The problems are complex and the methodologies needed will and should be varied. Types of Evaluation There are many different types of evaluations depending on the object being evaluated and the purpose of the evaluation. Perhaps the most important basic distinction in evaluation types is that between formative andsummative evaluation. Formative evaluations strengthen or improve the object being evaluated -- they help form it by examining the delivery of the program or technology, the quality of its implementation, and the assessment of the organizational context, personnel, procedures, inputs, and so on. Summative evaluations, in contrast, examine the effects or outcomes of some object -- they summarize it by describing what happens subsequent to delivery of the program or technology; assessing whether the object can be said to have caused the outcome; determining the overall impact of the causal factor beyond only the immediate target outcomes; and, estimating the relative costs associated with the object. Formative evaluation includes several evaluation types: needs assessment determines who needs the program, how great the need is, and what might work to meet the need evaluability assessment determines whether an evaluation is feasible and how stakeholders can help shape its usefulness structured conceptualization helps stakeholders define the program or technology, the target population, and the possible outcomes implementation evaluation monitors the fidelity of the program or technology delivery process evaluation investigates the process of delivering the program or technology, including alternative delivery procedures Summative evaluation can also be subdivided: outcome evaluations investigate whether the program or technology caused demonstrable effects on specifically defined target outcomes impact evaluation is broader and assesses the overall or net effects -- intended or unintended -- of the program or technology as a whole cost-effectiveness and cost-benefit analysis address questions of efficiency by standardizing outcomes in terms of their dollar costs and values secondary analysis reexamines existing data to address new questions or use methods not previously employed meta-analysis integrates the outcome estimates from multiple studies to arrive at an overall or summary judgement on an evaluation question Evaluation Questions and Methods Evaluators ask many different kinds of questions and use a variety of methods to address them. These are considered within the framework of formative and summative evaluation as presented above. In formative research the major questions and methodologies are: What is the definition and scope of the problem or issue, or what's the question? Formulating and conceptualizing methods might be used including brainstorming, focus groups, nominal group techniques, Delphi methods, brainwriting, stakeholder analysis, synectics, lateral thinking, input-output analysis, and concept mapping. Where is the problem and how big or serious is it? The most common method used here is "needs assessment" which can include: analysis of existing data sources, and the use of sample surveys, interviews of constituent populations, qualitative research, expert testimony, and focus groups. How should the program or technology be delivered to address the problem? Some of the methods already listed apply here, as do detailing methodologies like simulation techniques, or multivariate methods like multiattribute utility theory or exploratory causal modeling; decision-making methods; and project planning and implementation methods like flow charting, PERT/CPM, and project scheduling. How well is the program or technology delivered? Qualitative and quantitative monitoring techniques, the use of management information systems, and implementation assessment would be appropriate methodologies here. The questions and methods addressed under summative evaluation include: What type of evaluation is feasible? Evaluability assessment can be used here, as well as standard approaches for selecting an appropriate evaluation design. What was the effectiveness of the program or technology? One would choose from observational and correlational methods for demonstrating whether desired effects occurred, and quasi-experimental and experimental designs for determining whether observed effects can reasonably be attributed to the intervention and not to other sources. What is the net impact of the program? Econometric methods for assessing cost effectiveness and cost/benefits would apply here, along with qualitative methods that enable us to summarize the full range of intended and unintended impacts. Clearly, this introduction is not meant to be exhaustive. Each of these methods, and the many not mentioned, are supported by an extensive methodological research literature. This is a formidable set of tools. But the need to improve, update and adapt these methods to changing circumstances means that methodological research and development needs to have a major place in evaluation work. Evaluation Models
You learned the basics of evaluation last week. This week we are going to learn about some of the different evaluation approaches or models or metaphors that different groups of evaluators tend to endorse. I generally use the terms approaches, models, and metaphors as synonyms.
The reading is titled Chapter 4: Evaluation Models, which is from a book by my (Burke Johnsons) major professor at the University of Georgia. Here is the reference:
Payne, D.A. (1994). Designing educational project and program evaluations: A practical overview based on research and experience. Boston: Kluwer Academic Publishers.
The whole book is actually quite good, but we are only using one chapter for our course.
In this chapter, Payne discusses four types of models: 1. Management Models 2. Judicial Models 3. Anthropological Models 4. Consumer Models
You might remember these four types using this mnemonic: MJAC.
Here is how Scriven defines models: A term loosely used to refer to a conception or approach or sometimes even a method (e.g., naturalistic, goal-free) of doing evaluation.Models are to paradigms as hypotheses are to theories, which means less general but with some overlap.
Payne notes (p.58) that his four metaphors may be helpful in leading to your theory of evaluation. In fact, this is something I want you to think about this semester: what is YOUR theory of evaluation. Note: Marvin Alkins (1969) definition of program theory on p.58 of Paynes chapter and compare it with Will Shadishs definition of program theory on page 33 in RFL. I suggest that you memorize Shadishs definition of program theory. I am a strong advocate of evaluators being aware of their evaluation theory.
In short, you may wish to pick one model as being of most importance in your theory of evaluation. On the other hand, I my theory of evaluation is a needs based or contingency theory of evaluation. (By the way, I am probably most strongly influenced by Will Shadishs evaluation writings.) In short, I like to select the model that best fits the specific needs or situational characteristics of the program evaluation I am conducting. Payne makes some similar points in the last section of the chapter in the section titled Metaphor Selection: In Praise of Eclecticism.
Now I will make some comments about each of the four approaches to evaluation discussed by David Payne. I will also add some thoughts not included by Payne.
1. Management Models
The basic idea of the management approach is that the evaluators job is to provide information to management to help them in making decisions about programs, products, etc. The evaluators job is to serve managers (or whoever the key decision makers are).
One very popular management model used today is Michael Pattons Utilization Focused Evaluation. (Note that Pattons model is not discussed in Paynes chapter. You may want to examine the appendix of RFL for pages where Pattons model is briefly discussed.) Basically, Patton wants evaluators to provide information to primary intended users, and not to even conduct an evaluation if it has little or no potential for utilization. He wants evaluators to facilitate use as much as possible. Pattons motto is to focus on intended use by intended users. He recommends that evaluators work closely with primary intended users so that their needs will be met. This requires focusing on stakeholders key questions, issues, and intended uses. It also requires involving intended users in the interpretation of the findings, and then disseminating those findings so that they can be used. One should also follow up on actual use. It is helpful to develop a utilization plan and to outline what the evaluator and primary users must do to result in the use of the evaluation findings. Ultimately, evaluations should, according to Patton, be judged by their utility and actual use. Pattons approach is discussed in detail in the following book:
Patton, M.Q. (1997). Utilization-focused evaluation: The new century text. Thousand Oaks, CA: Sage.
The first edition of Pattons Utilization-focused evaluation book was published in 1978.
Another current giant in evaluation that fits into the management oriented evaluation camp is Joseph Wholey, but I will not outline his theory here (see, for example his 1979 book titled Evaluation: Promise and Performance, or his 1983 book titled Evaluation and Effective Public Management, or his edited 1994 book titled Handbook of Practical Program Evaluation.).
Now I will make a few comments on the only management model discussed by Payne (i.e., the CIPP Model).
Daniel Stufflebeams CIPP Model has been around for many years (e.g., see Stufflebeam, et al. 1971), and it has been very popular in education.
The CIPP Model is a simple systems model applied to program evaluation. A basic open system includes input, process, and output. Stufflebeam added context, included input and process, and relabeled output with the term product. Hence, CIPP stands for context evaluation, input evaluation, process evaluation, and product evaluation. These types are typically viewed as separate forms of evaluation, but they can also be viewed as steps or stages in a comprehensive evaluation.
Context evaluation includes examining and describing the context of the program you are evaluating, conducting a needs and goals assessment, determining the objectives of the program, and determining whether the proposed objectives will be sufficiently responsive to the identified needs. It helps in making program planning decisions.
Input evaluation includes activities such as a description of the program inputs and resources, a comparison of how the program might perform compared to other programs, a prospective benefit/cost assessment (i.e., decide whether you think the benefits will outweigh the costs of the program, before the program is actually implemented), an evaluation of the proposed design of the program, and an examination of what alternative strategies and procedures for the program should be considered and recommended. In short, this type of evaluation examines what the program plans on doing. It helps in making program structuring decisions.
Process evaluation includes examining how a program is being implemented, monitoring how the program is performing, auditing the program to make sure it is following required legal and ethical guidelines, and identifying defects in the procedural design or in the implementation of the program. It is here that evaluators provide information about what is actually occurring in the program. Evaluators typically provide this kind of feedback to program personnel because it can be helpful in making formative evaluation decisions (i.e., decisions about how to modify or improve the program). In general, process evaluation helps in making implementing decisions.
Product evaluation includes determining and examining the general and specific outcomes of the program (i.e., which requires using impact or outcome assessment techniques), measuring anticipated outcomes, attempting to identify unanticipated outcomes, assessing the merit of the program, conducting a retrospective benefit/cost assessment (to establish the actual worth or value of the program), and/or conducting a cost effectiveness assessment (to determine if the program is cost effective compared to other similar programs). Product evaluation is very helpful in making summative evaluation decisions (e.g., What is the merit and worth of the program? Should the program be continued?)
(By the way, formative evaluation is conducted for the purpose of improving an evaluation object (evaluand) and summative evaluation is conducted for the purpose of accountability which requires determining the overall effectiveness or merit and worth of an evaluation object. Formative evaluation information tends to be used by program administrators and staff members, whereas summative evaluation information tends to be used by high level administrators and policy makers to assist them in making funding or program continuation decisions. As I mentioned earlier (in lecture one), the terms formative and summative evaluation were coined by Michael Scriven in the late 1960s.)
Thinking of the CIPP Model, input and process evaluation tend to be very helpful for formative evaluation and product evaluation tends to be especially helpful for summative evaluation. Note, however, that the other parts of the CIPP Model can sometimes be used for formative or summative evaluative decisions. For example, product evaluation may lead to program improvements (i.e., formative), and process evaluation may lead to documentation that the program has met delivery requirements set by law (i.e., summative).
As you can see, the CIPP Evaluation Model is quite comprehensive, and one would often not use every part of the CIPP Model in a single evaluation. On the other hand, it would be fruitful for you to think about a small program (e.g., a training program in a local organization) where you would go through all four steps or parts of the CIPP Model. (Again, there are two different ways to view the CIPP model: first as four distinct kinds of evaluation and second as steps or stages in a comprehensive evaluation model.) The CIPP Model is, in general, quite useful in helping us to focus on some very important evaluation questions and issues and to think about some different types or stages of evaluation.
Interestingly, Stufflebeam no longer talks about the CIPP Model. He now seems to refer to his approach as Decision/Accountability- Oriented Evaluation (see Stufflebeam, 2001, in the Sage book titled Evaluation Models). (By the way, I generally do not recommend Stufflebeams recent book titled Evaluation Models because he tends to denigrate other useful approaches (in my opinion) while pushing his own approach. In contrast, I advocate an eclectic approach to evaluation or what Will Shadish calls needs based evaluation; needs based evaluation is based on contingency theory because the type of evaluation needed in a particular time and place is said to be contingent upon many factors which must be determined and considered by the evaluator.)
2. Judicial Models
Judicial or adversary-oriented evaluation is based on the judicial metaphor. It is assumed here that the potential for evaluation bias by a single evaluator cannot be ruled out, and, therefore, each side should have a separate evaluator to make their case. For example, one evaluator can examine and present the evidence for terminating a program and another evaluator can examine and present the evidence for continuing the program. A hearing of some sort is conducted where each evaluator makes his or her case regarding the evaluand. In a sense, this approach sets up a system of checks and balances, by ensuring that all sides be heard, including alternative explanations for the data. Obviously the quality of the different evaluators must be equated for fairness. The ultimate decision is made by some judge or arbiter who considers the arguments and the evidence and then renders a decision.
One example, that includes multiple experts is the so called blue- ribbon panel, where multiple experts of different backgrounds argue the merits of some policy or program. Some committees also operate, to some degree, along the lines of the judicial model.
As one set of authors put it, adversary evaluation has a built-in metaevaluation (Worthen and Sanders, 1999). A metaevaluation is simply an evaluation of an evaluation.
By showing the positive and negative aspects of a program, considering alternative interpretations of the data, and examining the strengths and weaknesses of the evaluation report (metaevaluation), the adversary or judicial approach seems to have some potential. On the other hand, it may lead to unnecessary arguing, competition, and an indictment mentality. It can also be quite expensive because of the requirement of multiple evaluators. In general, formal judicial or adversary models are not often used in program evaluation. It is, however, an interesting idea that may be useful on occasion.
3. Anthropological Models
Payne includes under this heading the qualitative approaches to program evaluation. For a review of qualitative research you can review pages 17-21 and Chapter 11 in my research methods book (Educational Research by Johnson and Christensen). (Remember that IDE 510 or a very similar course is a prerequisite for IDE 660.) Briefly, qualitative research tends to be exploratory, collect a lot of descriptive data, and take an inductive approach to understanding the world (i.e., looking at specifics and then trying to come up with conclusions or generalizations about the what is observed). Payne points out that you may want to view the group of people involved in a program as forming a unique culture that can be systematically studied.
Payne treats several approaches as being very similar and anthropological in nature, including responsive evaluation (Robert Stakes model), goal-free evaluation (developed by Scriven as a supplement to his other evaluation approach), and naturalistic evaluation (which is somewhat attributable to Guba and Lincoln, who wrote a 1985 book titled Naturalistic Evaluation). Again, what all these approaches have in common is that they tend to rely on the qualitative research paradigm.
In all of these approaches the evaluator enters the field and observes what is going on in the program. Participant and nonparticipant observation are commonly used. Additional data are also regularly collected (e.g., focus groups, interviews, questionnaires, and secondary or extant data), especially for the purpose of triangulation.
The key to Scrivens goal-free evaluation is to have an evaluator enter the field and try to learn about a program and its results inductively and without being aware of the specific objectives of the program. Note that Scrivens approach is useful as a supplement to the more traditional goal-oriented evaluation. Goal free evaluation is done by a separate evaluator, who collects exploratory data to supplement another evaluators goal-oriented data.
Payne next lists several strengths of qualitative evaluation. This list is from a nice book by Michael Patton (titled How to Use Qualitative Methods in Evaluation). Qualitative methods tend to be useful for describing program implementation, studying process, studying participation, getting program participants views or opinions about program impact, and identifying program strengths and weaknesses. Another strength is identifying unintended outcomes which may be missed if you design a study only to measure certain specific objectives.
Next, Payne talks about Robert Stakes specific anthropological model, which is called Responsive Evaluation. (By the way, Robert (Bob) Stake also has a recent book on case study research which I recommend you add to your library sometime (titled The Art of Case Study Research, Sage Publications, 1997).) Stake uses the term responsive because he wants evaluators to be flexible and responsive to the concerns and issues of program stakeholders. He also believes that qualitative methods provide the way to be the most responsive. He uses a somewhat derogatory (I think) term to refer to what he sees as the traditional evaluator. In particular, he labels the traditional evaluation approach preordinate evaluation, which means evaluation that relies only on formal plans and measurement of pre-specified program objectives.
In explaining responsive evaluation, Stake says an educational evaluation is responsive evaluation if it orients more directly to program activities than to program intents; responds to audience requirements for information; and if the different value-perspectives present are referred to in reporting the success and failure of the program (Stake, 1975).
Payne also shows Stakes events clock which shows the key evaluation activities and events, while stressing that they do not have to be done in a predetermined or linear order. Flexibility is the key. Go where the data and your emerging conclusions and opportunities lead you. Ultimately, the responsive evaluator prepares a narrative or case study report on what he or she finds, although it is also essential that the responsive evaluator present findings informally to different stakeholders during the conduct of the study to increase their input, participation, buy-in, and use of findings. As you can see, responsive evaluation is very much a participatory evaluation approach.
On page 74 Payne lists some strengths and weaknesses of the Anthropological evaluation approach. He also gives a nice real world example of an evaluation using the responsive approach.
4. Consumer Models
The last approach discussed by Payne is the consumer approach. The primary evaluation theorist behind this approach is Michael Scriven. Obviously this approach is based on the consumer product metaphor. In other words, perhaps evaluators can obtain some useful evaluation ideas from the field of consumer product evaluation (which is exemplified by the magazine Consumer Reports). As Payne mentions, the consumer approach is primarily summative. For example, when you read Consumer Reports, your goal is to learn if a product is good or not and how well it stacks up against similar products and whether you want to purchase it. In short, you are looking at the merit and worth (absolute and relative) of a particular product. Note, however, that it is much more difficult to evaluate a social or educational program that it is to evaluate, for example, an automobile or a coffee maker. With an automobile or a coffee maker, you can easily measure its specifications and performance. A social program is a much more complex package, that includes many elements and that requires an impact assessment using social science research techniques to determine if the program works and how it works.
Payne includes an excellent checklist (developed by Scriven, and sometimes called an evaluation checklist) that you may want to use when you are evaluating any type of evaluand (i.e., not just consumer products).
As Payne points out, the consumer approach also holds some promise for developing lists of programs that work, which can be used by policy makers and others when developing or selecting programs for specific problems. Payne also discuss the process of how a program could get on to such a list.
Logic model A logic model (also known as a logical framework, theory of change, or program matrix) is a tool used most often by managers and evaluators of programs to evaluate the effectiveness of a program. Logic models are usually a graphical depiction of the logical relationships between the resources, activities, outputs and outcomes of a program. [1] While there are many ways in which logic models can be presented, the underlying purpose of constructing a logic model is to assess the "if-then" (causal) relationships between the elements of the program; if the resources are available for a program, then the activities can be implemented, if the activities are implemented successfully then certain outputs and outcomes can be expected. Logic models are most often used in the evaluation stage of a program, they can however be used during planning and implementation. [2]
Versions[edit] In its simplest form, a logic model has four components: [3]
Inputs Activities Outputs Outcomes/impacts what resources go into a program what activities the program undertakes what is produced through those activities the changes or benefits that result from the program e.g. money, staff, equipment e.g. development of materials, training programs e.g. number of booklets produced, workshops held, people trained e.g. increased skills/ knowledge/ confidence, leading in longer-term to promotion, new job, etc. Following the early development of the logic model in the 1970s by Carol Weiss, Joseph Wholey and others, many refinements and variations have been added to the basic concept. Many versions of logic models set out a series of outcomes/impacts, explaining in more detail the logic of how an intervention contributes to intended or observed results. [4] This will often include distinguishing between short-term, medium-term and long-term results, and between direct and indirect results. Some logic models also include assumptions, which are beliefs the prospective grantees have about the program, the people involved, and the context and the way the prospective grantees think the program will work, and external factors, consisting of the environment in which the program exists, including a variety of external factors that interact with and influence the program action. University Cooperative Extension Programs in the US have developed a more elaborate logic model, called the Program Action Logic Model, which includes six steps: Inputs (what we invest) Outputs: Activities (the actual tasks we do) Participation (who we serve; customers & stakeholders) Engagement (how those we serve engage with the activities) Outcomes/Impacts: Short Term (learning: awareness, knowledge, skills, motivations) Medium Term (action: behavior, practice, decisions, policies) Long Term (consequences: social, economic, environmental etc.) In front of Inputs, there is a description of a Situation and Priorities. These are the considerations that determine what Inputs will be needed. The University of Wisconsin Extension offers a series of guidance documents [5] on the use of logic models. There is also an extensive bibliography [6] of work on this program logic model. Advantages[edit] By describing work in this way, managers have an easier way to define the work and measure it. Performance measures can be drawn from any of the steps. One of the key insights of the logic model is the importance of measuring final outcomes or results, because it is quite possible to waste time and money (inputs), "spin the wheels" on work activities, or produce outputs without achieving desired outcomes. It is these outcomes (impacts, long-term results) that are the only justification for doing the work in the first place. For commercial organizations, outcomes relate to profit. For not-for-profit or governmental organizations, outcomes relate to successful achievement of mission or program goals. Uses of the logic model[edit] Program planning[edit] One of the most important uses of the logic model is for program planning. Here it helps managers to 'plan with the end in mind' Stephen Covey, rather than just consider inputs (e.g. budgets, employees) or just the tasks that must be done. In the past, program logic has been justified by explaining the process from the perspective of an insider. Paul McCawley (no date) outlines how this process was approached: 1. We invest this time/money so that we can generate this activity/product. 2. The activity/product is needed so people will learn how to do this. 3. People need to learn that so they can apply their knowledge to this practice. 4. When that practice is applied, the effect will be to change this condition 5. When that condition changes, we will no longer be in this situation. While logic models have been used in this way successfully, Millar et al. (1999) has suggested that following the above sequence, from the inputs through to the outcomes, could limit ones thinking to the existing activities, programs and research questions. Instead, by using the logic model to focus on the intended outcomes of a particular program the questions change from what is being done? to what needs to be done? McCawley (no date) suggests that by using this new reasoning, a logic model for a program can be built by asking the following questions in sequence: 1. What is the current situation that we intend to impact? 2. What will it look like when we achieve the desired situation or outcome? 3. What behaviors need to change for that outcome to be achieved? 4. What knowledge or skills do people need before the behavior will change? 5. What activities need to be performed to cause the necessary learning? 6. What resources will be required to achieve the desired outcome? By placing the focus on ultimate outcomes or results, planners can think backwards through the logic model to identify how best to achieve the desired results. Planners therefore need to understand the difference between the categories of the logic model. Performance evaluation[edit] The logic model is often used in government or not-for-profit organizations, where the mission and vision are not aimed at achieving a financial benefit. In such situations, where profit is not the intended result, it may be difficult to monitor progress toward outcomes. A program logic model provides such indicators, in terms of output and outcome measures of performance. It is therefore important in these organizations to carefully specify the desired results, and consider how to monitor them over time. Often, such as in education or social programs, the outcomes are long-term and mission success is far in the future. In these cases, intermediate or shorter-term outcomes may be identified that provide an indication of progress toward the ultimate long-term outcome. Traditionally, government programs were described only in terms of their budgets. It is easy to measure the amount of money spent on a program, but this is a poor indicator of mission success. Likewise it is relatively easy to measure the amount of work done (e.g. number of workers or number of years spent), but the workers may have just been 'spinning their wheels' without getting very far in terms of ultimate results or outcomes. The production of outputs is a better indicator that something was delivered to customers, but it is still possible that the output did not really meet the customer's needs, was not used, etc. Therefore, the focus on results or outcomes has become a mantra in government and not-for-profit programs. The President's Management Agenda [7] is an example of the increasing emphasis on results in government management. It states: "Government likes to begin things to declare grand new programs and causes. But good beginnings are not the measure of success. What matters in the end is completion. Performance. Results." [8]
However, although outcomes are used as the primary indicators of program success or failure they are still insufficient. Outcomes may easily be achieved through processes independent of the program and an evaluation of those outcomes would suggest program success when in fact external outputs were responsible for the outcomes (Rossi, Lipsey and Freeman, 2004). In this respect, Rossi, Lipsey and Freeman (2004) suggest that a typical evaluation study should concern itself with measuring how the process indicators (inputs and outputs) have had an effect on the outcome indicators. A program logic model would need to be assessed or designed in order for an evaluation of these standards to be possible. The logic model can and, indeed, should be used in both formative (during the implementation to offer the chance to improve the program) and summative (after the completion of the program) evaluations. A FRAMEWORK FOR PROGRAM EVALUATION Program evaluation offers a way to understand and improve community health and development practice using methods that are useful, feasible, proper, and accurate. The framework described below is a practical non-prescriptive tool that summarizes in a logical order the important elements of program evaluation. THE FRAMEWORK CONTAI NS TWO RELATED DI MENSI ONS: Steps in evaluation practice, and Standards for "good" evaluation.
The six connected steps of the framework are actions that should be a part of any evaluation. Although in practice the steps may be encountered out of order, it will usually make sense to follow them in the recommended sequence. That's because earlier steps provide the foundation for subsequent progress. Thus, decisions about how to carry out a given step should not be finalized until prior steps have been thoroughly addressed. However, these steps are meant to be adaptable, not rigid. Sensitivity to each program's unique context (for example, the program's history and organizational climate) is essential for sound evaluation. They are intended to serve as starting points around which community organizations can tailor an evaluation to best meet their needs. Engage stakeholders Describe the program Focus the evaluation design Gather credible evidence Justify conclusions Ensure use and share lessons learned Understanding and adhering to these basic steps will improve most evaluation efforts. The second part of the framework is a basic set of standards to assess the quality of evaluation activities. There are 30 specific standards, organized into the following four groups: Utility Feasibility Propriety Accuracy These standards help answer the question, "Will this evaluation be a 'good' evaluation?" They are recommended as the initial criteria by which to judge the quality of the program evaluation efforts. ENGAGE STAKEHOLDERS Stakeholders are people or organizations that have something to gain or lose from what will be learned from an evaluation, and also in what will be done with that knowledge. Evaluation cannot be done in isolation. Almost everything done in community health and development work involves partnerships - alliances among different organizations, board members, those affected by the problem, and others. Therefore, any serious effort to evaluate a program must consider the different values held by the partners. Stakeholders must be part of the evaluation to ensure that their unique perspectives are understood. When stakeholders are not appropriately involved, evaluation findings are likely to be ignored, criticized, or resisted. However, if they are part of the process, people are likely to feel a good deal of ownership for the evaluation process and results. They will probably want to develop it, defend it, and make sure that the evaluation really works. That's why this evaluation cycle begins by engaging stakeholders. Once involved, these people will help to carry out each of the steps that follows. THREE PRI NCI PLE GROUPS OF STAKEHOLDERS ARE I MPORTANT TO I NVOLVE: People or organizations involved in program operations may include community members, sponsors, collaborators, coalition partners, funding officials, administrators, managers, and staff. People or organizations served or affected by the program may include clients, family members, neighborhood organizations, academic institutions, elected and appointed officials, advocacy groups, and community residents. Individuals who are openly skeptical of or antagonistic toward the program may also be important to involve. Opening an evaluation to opposing perspectives and enlisting the help of potential program opponents can strengthen the evaluation's credibility. Likewise, individuals or groups who could be adversely or inadvertently affected by changes arising from the evaluation have a right to be engaged. For example, it is important to include those who would be affected if program services were expanded, altered, limited, or ended as a result of the evaluation. Primary intended users of the evaluation are the specific individuals who are in a position to decide and/or do something with the results.They shouldn't be confused with primary intended users of the program, although some of them should be involved in this group. In fact, primary intended users should be a subset of all of the stakeholders who have been identified. A successful evaluation will designate primary intended users, such as program staff and funders, early in its development and maintain frequent interaction with them to be sure that the evaluation specifically addresses their values and needs. The amount and type of stakeholder involvement will be different for each program evaluation. For instance, stakeholders can be directly involved in designing and conducting the evaluation. They can be kept informed about progress of the evaluation through periodic meetings, reports, and other means of communication. It may be helpful, when working with a group such as this, to develop an explicit process to share power and resolve conflicts. This may help avoid overemphasis of values held by any specific stakeholder. DESCRI BE THE PROGRAM A program description is a summary of the intervention being evaluated. It should explain what the program is trying to accomplish and how it tries to bring about those changes. The description will also illustrate the program's core components and elements, its ability to make changes, its stage of development, and how the program fits into the larger organizational and community environment. How a program is described sets the frame of reference for all future decisions about its evaluation. For example, if a program is described as, "attempting to strengthen enforcement of existing laws that discourage underage drinking," the evaluation might be very different than if it is described as, "a program to reduce drunk driving by teens." Also, the description allows members of the group to compare the program to other similar efforts, and it makes it easier to figure out what parts of the program brought about what effects. Moreover, different stakeholders may have different ideas about what the program is supposed to achieve and why. For example, a program to reduce teen pregnancy may have some members who believe this means only increasing access to contraceptives, and other members who believe it means only focusing on abstinence. Evaluations done without agreement on the program definition aren't likely to be very useful. In many cases, the process of working with stakeholders to develop a clear and logical program description will bring benefits long before data are available to measure program effectiveness. THERE ARE SEVERAL SPECI FI C ASPECTS THAT SHOULD BE I NCLUDED WHEN DESCRI BI NG A PROGRAM. Statement of need A statement of need describes the problem, goal, or opportunity that the program addresses; it also begins to imply what the program will do in response. Important features to note regarding a program's need are: the nature of the problem or goal, who is affected, how big it is, and whether (and how) it is changing. Expectations Expectations are the program's intended results. They describe what the program has to accomplish to be considered successful. For most programs, the accomplishments exist on a continuum (first, we want to accomplish X... then, we want to do Y...). Therefore, they should be organized by time ranging from specific (and immediate) to broad (and longer- term) consequences. For example, a program's vision, mission, goals, and objectives, all represent varying levels of specificity about a program's expectations. Activities Activities are everything the program does to bring about changes. Describing program components and elements permits specific strategies and actions to be listed in logical sequence. This also shows how different program activities, such as education and enforcement, relate to one another. Describing program activities also provides an opportunity to distinguish activities that are the direct responsibility of the program from those that are conducted by related programs or partner organizations. Things outside of the program that may affect its success, such as harsher laws punishing businesses that sell alcohol to minors, can also be noted. Resources Resources include the time, talent, equipment, information, money, and other assets available to conduct program activities. Reviewing the resources a program has tells a lot about the amount and intensity of its services. It may also point out situations where there is a mismatch between what the group wants to do and the resources available to carry out these activities. Understanding program costs is a necessity to assess the cost-benefit ratio as part of the evaluation. Stage of development A program's stage of development reflects its maturity. All community health and development programs mature and change over time. People who conduct evaluations, as well as those who use their findings, need to consider the dynamic nature of programs. For example, a new program that just received its first grant may differ in many respects from one that has been running for over a decade. At least three phases of development are commonly recognized: planning, implementation, and effects or outcomes. In the planning stage, program activities are untested and the goal of evaluation is to refine plans as much as possible. In the implementation phase, program activities are being field tested and modified; the goal of evaluation is to see what happens in the "real world" and to improve operations. In the effects stage, enough time has passed for the program's effects to emerge; the goal of evaluation is to identify and understand the program's results, including those that were unintentional. Context A description of the program's context considers the important features of the environment in which the program operates. This includes understanding the area's history, geography, politics, and social and economic conditions, and also what other organizations have done. A realistic and responsive evaluation is sensitive to a broad range of potential influences on the program. An understanding of the context lets users interpret findings accurately and assess their generalizability. For example, a program to improve housing in an inner-city neighborhood might have been a tremendous success, but would likely not work in a small town on the other side of the country without significant adaptation. Logic model A logic model synthesizes the main program elements into a picture of how the program is supposed to work. It makes explicit the sequence of events that are presumed to bring about change. Often this logic is displayed in a flow-chart, map, or table to portray the sequence of steps leading to program results. Creating a logic model allows stakeholders to improve and focus program direction. It reveals assumptions about conditions for program effectiveness and provides a frame of reference for one or more evaluations of the program. A detailed logic model can also be a basis for estimating the program's effect on endpoints that are not directly measured. For example, it may be possible to estimate the rate of reduction in disease from a known number of persons experiencing the intervention if there is prior knowledge about its effectiveness. The breadth and depth of a program description will vary for each program evaluation. And so, many different activities may be part of developing that description. For instance, multiple sources of information could be pulled together to construct a well-rounded description. The accuracy of an existing program description could be confirmed through discussion with stakeholders. Descriptions of what's going on could be checked against direct observation of activities in the field. A narrow program description could be fleshed out by addressing contextual factors (such as staff turnover, inadequate resources, political pressures, or strong community participation) that may affect program performance. FOCUS THE EVALUATI ON DESI GN By focusing the evaluation design, we mean doing advance planning about where the evaluation is headed, and what steps it will take to get there. It isn't possible or useful for an evaluation to try to answer all questions for all stakeholders; there must be a focus. A well- focused plan is a safeguard against using time and resources inefficiently. Depending on what you want to learn, some types of evaluation will be better suited than others. However, once data collection begins, it may be difficult or impossible to change what you are doing, even if it becomes obvious that other methods would work better. A thorough plan anticipates intended uses and creates an evaluation strategy with the greatest chance to be useful, feasible, proper, and accurate. AMONG THE I SSUES TO CONSI DER WHEN FOCUSI NG AN EVALUATI ON ARE: Purpose Purpose refers to the general intent of the evaluation. A clear purpose serves as the basis for the design, methods, and use of the evaluation. Taking time to articulate an overall purpose will stop your organization from making uninformed decisions about how the evaluation should be conducted and used. There are at least four general purposes for which a community group might conduct an evaluation: To gain insight.This happens, for example, when deciding whether to use a new approach (e.g., would a neighborhood watch program work for our community?) Knowledge from such an evaluation will provide information about its practicality. For a developing program, information from evaluations of similar programs can provide the insight needed to clarify how its activities should be designed. To improve how things get done.This is appropriate in the implementation stage when an established program tries to describe what it has done. This information can be used to describe program processes, to improve how the program operates, and to fine-tune the overall strategy. Evaluations done for this purpose include efforts to improve the quality, effectiveness, or efficiency of program activities. To determine what the effects of the program are. Evaluations done for this purpose examine the relationship between program activities and observed consequences. For example, are more students finishing high school as a result of the program? Programs most appropriate for this type of evaluation are mature programs that are able to state clearly what happened and who it happened to. Such evaluations should provide evidence about what the program's contribution was to reaching longer-term goals such as a decrease in child abuse or crime in the area. This type of evaluation helps establish the accountability, and thus, the credibility, of a program to funders and to the community. To affect those who participate in it. The logic and reflection required of evaluation participants can itself be a catalyst for self-directed change. And so, one of the purposes of evaluating a program is for the process and results to have a positive influence. Such influences may: o Empower program participants (for example, being part of an evaluation can increase community members' sense of control over the program); o Supplement the program (for example, using a follow-up questionnaire can reinforce the main messages of the program); o Promote staff development (for example, by teaching staff how to collect, analyze, and interpret evidence); or o Contribute to organizational growth (for example, the evaluation may clarify how the program relates to the organization's mission). Users Users are the specific individuals who will receive evaluation findings. They will directly experience the consequences of inevitable trade-offs in the evaluation process. For example, a trade-off might be having a relatively modest evaluation to fit the budget with the outcome that the evaluation results will be less certain than they would be for a full-scale evaluation. Because they will be affected by these tradeoffs, intended users have a right to participate in choosing a focus for the evaluation. An evaluation designed without adequate user involvement in selecting the focus can become a misguided and irrelevant exercise. By contrast, when users are encouraged to clarify intended uses, priority questions, and preferred methods, the evaluation is more likely to focus on things that will inform (and influence) future actions. Uses Uses describe what will be done with what is learned from the evaluation. There is a wide range of potential uses for program evaluation. Generally speaking, the uses fall in the same four categories as the purposes listed above: to gain insight, improve how things get done, determine what the effects of the program are, and affect participants. The following list gives examples of uses in each category. SOME SPECIFIC EXAMPLES OF EVALUATION USES TO GAI N I NSI GHT: o Assess needs and wants of community members o Identify barriers to use of the program o Learn how to best describe and measure program activities TO I MPROVE HOW THI NGS GET DONE: o Refine plans for introducing a new practice o Determine the extent to which plans were implemented o Improve educational materials o Enhance cultural competence o Verify that participants' rights are protected o Set priorities for staff training o Make mid-course adjustments o Clarify communication o Determine if client satisfaction can be improved o Compare costs to benefits o Find out which participants benefit most from the program o Mobilize community support for the program TO DETERMI NE WHAT THE EFFECTS OF THE PROGRAM ARE: o Assess skills development by program participants o Compare changes in behavior over time o Decide where to allocate new resources o Document the level of success in accomplishing objectives o Demonstrate that accountability requirements are fulfilled o Use information from multiple evaluations to predict the likely effects of similar programs TO AFFECT PARTI CIPANTS: o Reinforce messages of the program o Stimulate dialogue and raise awareness about community issues o Broaden consensus among partners about program goals o Teach evaluation skills to staff and other stakeholders o Gather success stories o Support organizational change and improvement Questions The evaluation needs to answer specific questions. Drafting questions encourages stakeholders to reveal what they believe the evaluation should answer. That is, what questions are more important to stakeholders? The process of developing evaluation questions further refines the focus of the evaluation. Methods The methods available for an evaluation are drawn from behavioral science and social research and development. Three types of methods are commonly recognized. They are experimental, quasi-experimental, and observational or case study designs. Experimental designs use random assignment to compare the effect of an intervention between otherwise equivalent groups (for example, comparing a randomly assigned group of students who took part in an after-school reading program with those who didn't). Quasi-experimental methods make comparisons between groups that aren't equal (e.g. program participants vs. those on a waiting list) or use of comparisons within a group over time, such as in an interrupted time series in which the intervention may be introduced sequentially across different individuals, groups, or contexts. Observational or case study methods use comparisons within a group to describe and explain what happens (e.g., comparative case studies with multiple communities). No design is necessarily better than another. Evaluation methods should be selected because they provide the appropriate information to answer stakeholders' questions, not because they are familiar, easy, or popular. The choice of methods has implications for what will count as evidence, how that evidence will be gathered, and what kind of claims can be made. Because each method option has its own biases and limitations, evaluations that mix methods are generally more robust. Over the course of an evaluation, methods may need to be revised or modified. Circumstances that make a particular approach useful can change. For example, the intended use of the evaluation could shift from discovering how to improve the program to helping decide about whether the program should continue or not. Thus, methods may need to be adapted or redesigned to keep the evaluation on track. Agreements Agreements summarize the evaluation procedures and clarify everyone's roles and responsibilities. An agreement describes how the evaluation activities will be implemented. Elements of an agreement include statements about the intended purpose, users, uses, and methods, as well as a summary of the deliverables, those responsible, a timeline, and budget. The formality of the agreement depends upon the relationships that exist between those involved. For example, it may take the form of a legal contract, a detailed protocol, or a simple memorandum of understanding. Regardless of its formality, creating an explicit agreement provides an opportunity to verify the mutual understanding needed for a successful evaluation. It also provides a basis for modifying procedures if that turns out to be necessary. As you can see, focusing the evaluation design may involve many activities. For instance, both supporters and skeptics of the program could be consulted to ensure that the proposed evaluation questions are politically viable. A menu of potential evaluation uses appropriate for the program's stage of development could be circulated among stakeholders to determine which is most compelling. Interviews could be held with specific intended users to better understand their information needs and timeline for action. Resource requirements could be reduced when users are willing to employ more timely but less precise evaluation methods. GATHER CREDI BLE EVI DENCE Credible evidence is the raw material of a good evaluation. The information learned should be seen by stakeholders as believable, trustworthy, and relevant to answer their questions. This requires thinking broadly about what counts as "evidence." Such decisions are always situational; they depend on the question being posed and the motives for asking it. For some questions, a stakeholder's standard for credibility could demand having the results of a randomized experiment. For another question, a set of well-done, systematic observations such as interactions between an outreach worker and community residents, will have high credibility. The difference depends on what kind of information the stakeholders want and the situation in which it is gathered. Context matters! In some situations, it may be necessary to consult evaluation specialists. This may be especially true if concern for data quality is especially high. In other circumstances, local people may offer the deepest insights. Regardless of their expertise, however, those involved in an evaluation should strive to collect information that will convey a credible, well-rounded picture of the program and its efforts. Having credible evidence strengthens the evaluation results as well as the recommendations that follow from them. Although all types of data have limitations, it is possible to improve an evaluation's overall credibility. One way to do this is by using multiple procedures for gathering, analyzing, and interpreting data. Encouraging participation by stakeholders can also enhance perceived credibility. When stakeholders help define questions and gather data, they will be more likely to accept the evaluation's conclusions and to act on its recommendations. THE FOLLOWI NG FEATURES OF EVI DENCE GATHERI NG TYPI CALLY AFFECT HOW CREDI BLE I T I S SEEN AS BEI NG: Indicators Indicators translate general concepts about the program and its expected effects into specific, measurable parts. Examples of indicators include: The program's capacity to deliver services The participation rate The level of client satisfaction The amount of intervention exposure (how many people were exposed to the program, and for how long they were exposed) Changes in participant behavior Changes in community conditions or norms Changes in the environment (e.g., new programs, policies, or practices) Longer-term changes in population health status (e.g., estimated teen pregnancy rate in the county) Indicators should address the criteria that will be used to judge the program. That is, they reflect the aspects of the program that are most meaningful to monitor. Several indicators are usually needed to track the implementation and effects of a complex program or intervention. One way to develop multiple indicators is to create a "balanced scorecard," which contains indicators that are carefully selected to complement one another. According to this strategy, program processes and effects are viewed from multiple perspectives using small groups of related indicators. For instance, a balanced scorecard for a single program might include indicators of how the program is being delivered; what participants think of the program; what effects are observed; what goals were attained; and what changes are occurring in the environment around the program. Another approach to using multiple indicators is based on a program logic model, such as we discussed earlier in the section. A logic model can be used as a template to define a full spectrum of indicators along the pathway that leads from program activities to expected effects. For each step in the model, qualitative and/or quantitative indicators could be developed. Indicators can be broad-based and don't need to focus only on a program's long -term goals. They can also address intermediary factors that influence program effectiveness, including such intangible factors as service quality, community capacity, or inter -organizational relations. Indicators for these and similar concepts can be created by systematically identifying and then tracking markers of what is said or done when the concept is expressed. In the course of an evaluation, indicators may need to be modified or new ones adopted. Also, measuring program performance by tracking indicators is only one part of evaluation, and shouldn't be confused as a basis for decision making in itself. There are definite perils to using performance indicators as a substitute for completing the evaluation process and reaching fully justified conclusions. For example, an indicator, such as a rising rate of unemployment, may be falsely assumed to reflect a failing program when it may actually be due to changing environmental conditions that are beyond the program's control. Sources Sources of evidence in an evaluation may be people, documents, or observations. More than one source may be used to gather evidence for each indicator. In fact, selecting multiple sources provides an opportunity to include different perspectives about the program and enhances the evaluation's credibility. For instance, an inside perspective may be reflected by internal documents and comments from staff or program managers; whereas clients and those who do not support the program may provide different, but equally relevant perspectives. Mixing these and other perspectives provides a more comprehensive view of the program or intervention. The criteria used to select sources should be clearly stated so that users and other stakeholders can interpret the evidence accurately and assess if it may be biased. In addition, some sources provide information in narrative form (for example, a person's experience when taking part in the program) and others are numerical (for example, how many people were involved in the program). The integration of qualitative and quantitative information can yield evidence that is more complete and more useful, thus meeting the needs and expectations of a wider range of stakeholders. Quality Quality refers to the appropriateness and integrity of information gathered in an evaluation. High quality data are reliable and informative. It is easier to collect if the indicators have been well defined. Other factors that affect quality may include instrument design, data collection procedures, training of those involved in data collection, source selection, coding, data management, and routine error checking. Obtaining quality data will entail tradeoffs (e.g. breadth vs. depth); stakeholders should decide together what is most important to them. Because all data have limitations, the intent of a practical evaluation is to strive for a level of quality that meets the stakeholders' threshold for credibility. Quantity Quantity refers to the amount of evidence gathered in an evaluation. It is necessary to estimate in advance the amount of information that will be required and to establish criteria to decide when to stop collecting data - to know when enough is enough. Quantity affects the level of confidence or precision users can have - how sure we are that what we've learned is true. It also partly determines whether the evaluation will be able to detect effects. All evidence collected should have a clear, anticipated use. Logistics By logistics, we mean the methods, timing, and physical infrastructure for gathering and handling evidence. People and organizations also have cultural preferences that dictate acceptable ways of asking questions and collecting information, including who would be perceived as an appropriate person to ask the questions. For example, some participants may be unwilling to discuss their behavior with a stranger, whereas others are more at ease with someone they don't know. Therefore, the techniques for gathering evidence in an evaluation must be in keeping with the cultural norms of the community. Data collection procedures should also ensure that confidentiality is protected. J USTI FY CONCLUSI ONS The process of justifying conclusions recognizes that evidence in an evaluation does not necessarily speak for itself. Evidence must be carefully considered from a number of different stakeholders' perspectives to reach conclusions that are well -substantiated and justified. Conclusions become justified when they are linked to the evidence gathered and judged against agreed-upon values set by the stakeholders. Stakeholders must agree that conclusions are justified in order to use the evaluation results with confidence. THE PRI NCI PAL ELEMENTS I NVOLVED I N JUSTI FYI NG CONCLUSI ONS BASED ON EVI DENCE ARE: Standards Standards reflect the values held by stakeholders about the program. They provide the basis to make program judgments. The use of explicit standards for judgment is fundamental to sound evaluation. In practice, when stakeholders articulate and negotiate their values, these become the standards to judge whether a given program's performance will, for instance, be considered "successful," "adequate," or "unsuccessful." Analysis and synthesis Analysis and synthesis are methods to discover and summarize an evaluation's findings. They are designed to detect patterns in evidence, either by isolating important findings (analysis) or by combining different sources of information to reach a larger understanding (synthesis). Mixed method evaluations require the separate analysis of each evidence element, as well as a synthesis of all sources to examine patterns that emerge. Deciphering facts from a given body of evidence involves deciding how to organize, classify, compare, and display information. These decisions are guided by the questions being asked, the types of data available, and especially by input from stakeholders and primary intended users. Interpretation Interpretation is the effort to figure out what the findings mean. Uncovering facts about a program's performance isn't enough to make conclusions. The facts must be interpreted to understand their practical significance. For example, saying, "15 % of the people in our area witnessed a violent act last year," may be interpreted differently depending on the situation. For example, if 50% of community members had watched a violent act in the last year when they were surveyed five years ago, the group can suggest that, while still a problem, things are getting better in the community. However, if five years ago only 7% of those surveyed said the same thing, community organizations may see this as a sign that they might want to change what they are doing. In short, interpretations draw on information and perspectives that stakeholders bring to the evaluation. They can be strengthened through active participation or interaction with the data and preliminary explanations of what happened. Judgements Judgments are statements about the merit, worth, or significance of the program. They are formed by comparing the findings and their interpretations against one or more selected standards. Because multiple standards can be applied to a given program, stakeholders may reach different or even conflicting judgments. For instance, a program that increases its outreach by 10% from the previous year may be judged positively by program managers, based on standards of improved performance over time. Community members, however, may feel that despite improvements, a minimum threshold of access to services has still not been reached. Their judgment, based on standards of social equity, would therefore be negative. Conflicting claims about a program's quality, value, or importance often indicate that stakeholders are using different standards or values in making judgments. This type of disagreement can be a catalyst to clarify values and to negotiate the appropriate basis (or bases) on which the program should be judged. Recommendations Recommendations are actions to consider as a result of the evaluation. Forming recommendations requires information beyond just what is necessary to form judgments. For example, knowing that a program is able to increase the services available to battered women doesn't necessarily translate into a recommendation to continue the effort, particularly when there are competing priorities or other effective alternatives. Thus, recommendations about what to do with a given intervention go beyond judgments about a specific program's effectiveness. If recommendations aren't supported by enough evidence, or if they aren't in keeping with stakeholders' values, they can really undermine an evaluation's credibility. By contrast, an evaluation can be strengthened by recommendations that anticipate and react to what users will want to know. THREE THI NGS MI GHT I NCREASE THE CHANCES THAT RECOMMENDATI ONS WI LL BE RELEVANT AND WELL- RECEI VED: Sharing draft recommendations Soliciting reactions from multiple stakeholders Presenting options instead of directive advice Justifying conclusions in an evaluation is a process that involves different possible steps. For instance, conclusions could be strengthened by searching for alternative explanations from the ones you have chosen, and then showing why they are unsupported by the evidence. When there are different but equally well supported conclusions, each could be presented with a summary of their strengths and weaknesses. Techniques to analyze, synthesize, and interpret findings might be agreed upon before data collection begins. ENSURE USE AND SHARE LESSONS LEARNED It is naive to assume that lessons learned in an evaluation will necessarily be used in decision making and subsequent action. Deliberate effort on the part of evaluators is needed to ensure that the evaluation findings will be used appropriately. Preparing for their use involves strategic thinking and continued vigilance in looking for opportunities to communicate and influence. Both of these should begin in the earliest stages of the process and continue throughout the evaluation. THE ELEMENTS OF KEY I MPORTANCE TO BE SURE THAT THE RECOMMENDATI ONS FROM AN EVALUATI ON ARE USED ARE: Design Design refers to how the evaluation's questions, methods, and overall processes are constructed. As discussed in the third step of this framework (focusing the evaluation design), the evaluation should be organized from the start to achieve specific agreed-upon uses. Having a clear purpose that is focused on the use of what is learned helps those who will carry out the evaluation to know who will do what with the findings. Furthermore, the process of creating a clear design will highlight ways that stakeholders, through their many contributions, can improve the evaluation and facilitate the use of the results. Preparation Preparation refers to the steps taken to get ready for the future uses of the evaluation findings. The ability to translate new knowledge into appropriate action is a skill that can be strengthened through practice. In fact, building this skill can itself be a useful benefit of the evaluation. It is possible to prepare stakeholders for future use of the results by discussing how potential findings might affect decision making. For example, primary intended users and other stakeholders could be given a set of hypothetical results and asked what decisions or actions they would make on the basis of this new knowledge. If they indicate that the evidence presented is incomplete or irrelevant and that no action would be taken, then this is an early warning sign that the planned evaluation should be modified. Preparing for use also gives stakeholders more time to explore both positive and negative implications of potential results and to identify different options for program improvement. Feedback Feedback is the communication that occurs among everyone involved in the evaluation. Giving and receiving feedback creates an atmosphere of trust among stakeholders; it keeps an evaluation on track by keeping everyone informed about how the evaluation is proceeding. Primary intended users and other stakeholders have a right to comment on evaluation decisions. From a standpoint of ensuring use, stakeholder feedback is a necessary part of every step in the evaluation. Obtaining valuable feedback can be encouraged by holding discussions during each step of the evaluation and routinely sharing interim findings, provisional interpretations, and draft reports. Follow-up Follow-up refers to the support that many users need during the evaluation and after they receive evaluation findings. Because of the amount of effort required, reaching justified conclusions in an evaluation can seem like an end in itself. It is not. Active follow-up may be necessary to remind users of the intended uses of what has been learned. Follow-up may also be required to stop lessons learned from becoming lost or ignored in the process of making complex or political decisions. To guard against such oversight, it may be helpful to have someone involved in the evaluation serve as an advocate for the evaluation's findings during the decision -making phase. Facilitating the use of evaluation findings also carries with it the responsibility to prevent misuse. Evaluation results are always bounded by the context in which the evaluation was conducted. Some stakeholders, however, may be tempted to take results out of context or to use them for different purposes than what they were developed for. For instance, over- generalizing the results from a single case study to make decisions that affect all sites in a national program is an example of misuse of a case study evaluation. Similarly, program opponents may misuse results by overemphasizing negative findings without giving proper credit for what has worked. Active follow-up can help to prevent these and other forms of misuse by ensuring that evidence is only applied to the questions that were the central focus of the evaluation. Dissemination Dissemination is the process of communicating the procedures or the lessons learned from an evaluation to relevant audiences in a timely, unbiased, and consistent fashion. Like other elements of the evaluation, the reporting strategy should be discussed in advance with intended users and other stakeholders. Planning effective communications also requires considering the timing, style, tone, message source, vehicle, and format of information products. Regardless of how communications are constructed, the goal for dissemination is to achieve full disclosure and impartial reporting. Along with the uses for evaluation findings, there are also uses that flow from the very process of evaluating. These "process uses" should be encouraged. The people who take part in an evaluation can experience profound changes in beliefs and behavior. For instance, an evaluation challenges staff members to act differently in what they are doing, and to question assumptions that connect program activities with intended effects. Evaluation also prompts staff to clarify their understanding of the goals of the program. This greater clarity, in turn, helps staff members to better function as a team focused on a common end. In short, immersion in the logic, reasoning, and values of evaluation can have very positive effects, such as basing decisions on systematic judgments instead of on unfounded assumptions. Additional process uses for evaluation include: By defining indicators, what really matters to stakeholders becomes clear It helps make outcomes matter by changing the reinforcements connected with achieving positive results. For example, a funder might offer "bonus grants" or "outcome dividends" to a program that has shown a significant amount of community change and improvement. STANDARDS FOR "GOOD" EVALUATION There are standards to assess whether all of the parts of an evaluation are well -designed and working to their greatest potential. The Joint Committee on Educational Evaluation developed "The Program Evaluation Standards" for this purpose. These standards, designed to assess evaluations of educational programs, are also relevant for programs and interventions related to community health and development. The program evaluation standards make it practical to conduct sound and fair evaluations. They offer well-supported principles to follow when faced with having to make tradeoffs or compromises. Attending to the standards can guard against an imbalanced evaluation, such as one that is accurate and feasible, but isn't very useful or sensitive to the context. Another example of an imbalanced evaluation is one that would be genuinely useful, but is impossible to carry out. The following standards can be applied while developing an evaluation design and throughout the course of its implementation. Remember, the standards are written as guiding principles, not as rigid rules to be followed in all situations. THE 30 MORE SPECI FI C STANDARDS ARE GROUPED I NTO FOUR CATEGORI ES: Utility Feasibility Propriety Accuracy The utility standards are: Stakeholder Identification: People who are involved in (or will be affected by) the evaluation should be identified, so that their needs can be addressed. Evaluator Credibility: The people conducting the evaluation should be both trustworthy and competent, so that the evaluation will be generally accepted as credible or believable. Information Scope and Selection: Information collected should address pertinent questions about the program, and it should be responsive to the needs and interests of clients and other specified stakeholders. Values Identification: The perspectives, procedures, and rationale used to interpret the findings should be carefully described, so that the bases for judgments about merit and value are clear. Report Clarity: Evaluation reports should clearly describe the program being evaluated, including its context, and the purposes, procedures, and findings of the evaluation. This will help ensure that essential information is provided and easily understood. Report Timeliness and Dissemination: Significant midcourse findings and evaluation reports should be shared with intended users so that they can be used in a timely fashion. Evaluation Impact: Evaluations should be planned, conducted, and reported in ways that encourage follow-through by stakeholders, so that the evaluation will be used. FEASI BI LI TY STANDARDS The feasibility standards are to ensure that the evaluation makes sense - that the steps that are planned are both viable and pragmatic. The feasibility standards are: Practical Procedures: The evaluation procedures should be practical, to keep disruption of everyday activities to a minimum while needed information is obtained. Political Viability: The evaluation should be planned and conducted with anticipation of the different positions or interests of various groups. This should help in obtaining their cooperation so that possible attempts by these groups to curtail evaluation operations or to misuse the results can be avoided or counteracted. Cost Effectiveness: The evaluation should be efficient and produce enough valuable information that the resources used can be justified. PROPRI ETY STANDARDS The propriety standards ensure that the evaluation is an ethical one, conducted with regard for the rights and interests of those involved. The eight propriety standards follow. Service Orientation: Evaluations should be designed to help organizations effectively serve the needs of all of the targeted participants. Formal Agreements: The responsibilities in an evaluation (what is to be done, how, by whom, when) should be agreed to in writing, so that those involved are obligated to follow all conditions of the agreement, or to formally renegotiate it. Rights of Human Subjects: Evaluation should be designed and conducted to respect and protect the rights and welfare of human subjects, that is, all participants in the study. Human Interactions: Evaluators should respect basic human dignity and worth when working with other people in an evaluation, so that participants don't feel threatened or harmed. Complete and Fair Assessment: The evaluation should be complete and fair in its examination, recording both strengths and weaknesses of the program being evaluated. This allows strengths to be built upon and problem areas addressed. Disclosure of Findings: The people working on the evaluation should ensure that all of the evaluation findings, along with the limitations of the evaluation, are accessible to everyone affected by the evaluation, and any others with expressed legal rights to receive the results. Conflict of Interest: Conflict of interest should be dealt with openly and honestly, so that it does not compromise the evaluation processes and results. Fiscal Responsibility: The evaluator's use of resources should reflect sound accountability procedures and otherwise be prudent and ethically responsible, so that expenditures are accounted for and appropriate. ACCURACY STANDARDS The accuracy standards ensure that the evaluation findings are considered correct. There are 12 accuracy standards: Program Documentation: The program should be described and documented clearly and accurately, so that what is being evaluated is clearly identified. Context Analysis: The context in which the program exists should be thoroughly examined so that likely influences on the program can be identified. Described Purposes and Procedures: The purposes and procedures of the evaluation should be monitored and described in enough detail that they can be identified and assessed. Defensible Information Sources: The sources of information used in a program evaluation should be described in enough detail that the adequacy of the information can be assessed. Valid Information: The information gathering procedures should be chosen or developed and then implemented in such a way that they will assure that the interpretation arrived at is valid. Reliable Information: The information gathering procedures should be chosen or developed and then implemented so that they will assure that the information obtained is sufficiently reliable. Systematic Information: The information from an evaluation should be systematically reviewed and any errors found should be corrected. Analysis of Quantitative Information: Quantitative information - data from observations or surveys - in an evaluation should be appropriately and systematically analyzed so that evaluation questions are effectively answered. Analysis of Qualitative Information: Qualitative information - descriptive information from interviews and other sources - in an evaluation should be appropriately and systematically analyzed so that evaluation questions are effectively answered. Justified Conclusions: The conclusions reached in an evaluation should be explicitly justified, so that stakeholders can understand their worth. Impartial Reporting: Reporting procedures should guard against the distortion caused by personal feelings and biases of people involved in the evaluation, so that evaluation reports fairly reflect the evaluation findings. Metaevaluation: The evaluation itself should be evaluated against these and other pertinent standards, so that it is appropriately guided and, on completion, stakeholders can closely examine its strengths and weaknesses. APPLYING THE FRAMEWORK: CONDUCTING OPTIMAL EVALUATIONS There is an ever-increasing agreement on the worth of evaluation; in fact, doing so is often required by funders and other constituents. So, community health and development professionals can no longer question whether or not to evaluate their programs. Instead, the appropriate questions are: What is the best way to evaluate? What are we learning from the evaluation? How will we use what we learn to become more effective? The framework for program evaluation helps answer these questions by guiding users to select evaluation strategies that are useful, feasible, proper, and accurate. To use this framework requires quite a bit of skill in program evaluation. In most cases there are multiple stakeholders to consider, the political context may be divisive, steps don't always follow a logical order, and limited resources may make it difficult to take a preferred course of action. An evaluator's challenge is to devise an optimal strategy, given the conditions she is working under. An optimal strategy is one that accomplishes each step in the framework in a way that takes into account the program context and is able to meet or exceed the relevant standards. This framework also makes it possible to respond to common concerns about program evaluation. For instance, many evaluations are not undertaken because they are seen as being too expensive. The cost of an evaluation, however, is relative; it depends upon the question being asked and the level of certainty desired for the answer. A simple, low-cost evaluation can deliver information valuable for understanding and improvement. Rather than discounting evaluations as a time-consuming sideline, the framework encourages evaluations that are timed strategically to provide necessary feedback. This makes it possible to make evaluation closely linked with everyday practices. Another concern centers on the perceived technical demands of designing and conducting an evaluation. However, the practical approach endorsed by this framework focuses on questions that can improve the program. Finally, the prospect of evaluation troubles many staff members because they perceive evaluation methods as punishing ("They just want to show what we're doing wrong."), exclusionary ("Why aren't we part of it? We're the ones who know what's going on."), and adversarial ("It's us against them.") The framework instead encourages an evaluation approach that is designed to be helpful and engages all interested stakeholders in a process that welcomes their participation. What is program evaluation? Program evaluation is the systematic assessment of the processes and/or outcomes of a program with the intent of furthering its development and improvement. As such, it is a collaborative process in which evaluators work closely with program staff to craft and implement an evaluation design that is responsive to the needs of the program. For example, during program implementation, evaluators can provide formative evaluation findings so that program staff can make immediate, data-based decisions about program implementation and delivery. In addition, evaluators can, towards the end of a program or upon its completion, provide cumulative and summative evaluation findings, often required by funding agencies and used to make decisions about program continuation or expansion. How is evaluation different than research? Evaluators use many of the same qualitative and quantitative methodologies used by researchers in other fields. Indeed, program evaluations are as rigorous and systematic in collecting data as traditional social research. That being said, the primary purpose of evaluation is to provide timely and constructive information for decision- making about particular programs, not to advance more wide-ranging knowledge or theory. Accordingly, evaluation is typically more client-focused than traditional research, in that evaluators work closely with program staff to create and carry-out an evaluation plan that attend to the particular needs of their program. How is evaluation different than assessment? The primary difference between evaluation and assessment lies in the focus of examination. Whereas evaluation serves to facilitate a program's development, implementation, and improvement by examining its processes and/or outcomes; the purpose of an assessment is to determine individuals or group's performances by measuring their skill level on a variable of interest (e.g., reading comprehension, math or social skills, to mention just a few). In line with this distinctionand quite common in evaluating educational programs where the intended outcome is often some specified level of academic achievementassessment data may be used in determining program impact and success. How much does it cost? The cost of an evaluation is entirely contingent upon the scope and nature of the evaluation activities and measures requested. The National Science Foundation's rule of thumb about evaluation budgets is 10% of the total grant amount. We at the OEA are committed to providing cost effective evaluation plans that are both responsive to the evaluative needs of a given program and also suitable to its budget. As a result, we have worked in the pastand aspire to work in the futurewith programs and projects representing a wide-range of financial plans. My proposal requires an evaluation section. Can you help me with that? Many federal agencies (e.g., NSF, NIH) require that proposals include information about how the effectiveness of the proposed program will be evaluated. This section usually contains a brief description of possible metrics for program outcomes and a plan for both formative and summative evaluation of the program. The program evaluators at OEA will provide text for the evaluation section of your proposal free of charge, assuming that you plan to work with OEA if and when your grant is funded. Depending on time and available resources, our staff will also provide feedback on the grant as a whole and guidance on the development of your goals and outcomes. The best way to begin is to _contact us_ as early in the proposal process as possible. My proposal is due very soon, can you still help me? Although we prefer being contacted well in advance of proposal deadlines, we also understand that project timelines and planning processes may not always be ideal. For that reason, we will do our best to work with you and your program even if the time period is limited. For all clients, we respectfully request that you contact us before including our evaluation services and practices in grant proposals, even if this contact is initiated immediately preceding a fast approaching deadline.
Definition of Program Evaluation Evaluation is the systematic application of scientific methods to assess the design, implementation, improvement or outcomes of a program (Rossi & Freeman, 1993; Short, Hennessy, & Campbell, 1996). The term "program" may include any organized action such as media campaigns, service provision, educational services, public policies, research projects, etc. ( Center for Disease Control and Prevention [CDC], 1999). Purposes for Program Evaluation Demonstrate program effectiveness to funders Improve the implementation and effectiveness of programs Better manage limited resources Document program accomplishments Justify current program funding Support the need for increased levels of funding Satisfy ethical responsibility to clients to demonstrate positive and negative effects of program participation (Short, Hennessy, & Campbell, 1996). Document program development and activities to help ensure successful replication Barriers Program evaluations require funding, time and technical skills: requirements that are often perceived as diverting limited program resources from clients. Program staff are often concerned that evaluation activities will inhibit timely accessibility to services or compromise the safety of clients. Evaluation can necessitate alliances between historically separate community groups (e.g. academia, advocacy groups, service providers; Short, Hennessy, & Campbell, 1996).
Mutual misperceptions regarding the goals and process of evaluation can result in adverse attitudes (CDC, 1999; Chalk & King, 1998). Overcoming Barriers Collaboration is the key to successful program evaluation. In evaluation terminology, stakeholders are defined as entities or individuals that are affected by the program and its evaluation (Rossi & Freeman, 1993; C DC, 1999). Involvement of these stakeholders is an integral part of program evaluation. Stakeholders include but are not limited to program staff, program clients, decision makers, and evaluators. A participatory approach to evaluation based on respect for one another's roles and equal partnership in the process overcomes barriers to a mutually beneficial evaluation (Burt, Harrell, Newmark, Aron, & Jacobs, 1997; Chalk & King, 1998). Identifying an evaluator with the necessary technical skills as well as a collaborative approach to the process is integral. Programs have several options for identifying an evaluator. Health departments, other state agencies, local universities, evaluation associations and other programs can provide recommendations. Additionally, several companies and university departments providing these services can be located on the internet. Selecting an evaluator entails finding an individual who has an understanding of the program and funding requirements for evaluations, demonstrated experience, and knowledge of the issue that the program is targeting (CDC, 1992). Types of Evaluation Various types of evaluation can be used to assess different aspects or stages of program development. As terminology and definitions of evaluation types are not uniform, an effort has been made to briefly introduce a number of types here. Context Evaluation Investigating how the program operates or will operate in a particular social, political, physical and economic environment. This type of evaluation could include a community needs or organizational assessment (http://www.wkkf.org/Publications/evalhdbk/default.htm). Sample question: What are the environmental barriers to accessing program services? Formative Evaluation Assessing needs that a new program should fulfill (Short, Hennessy, & Campbell, 1996), examining the early stages of a program's development (Rossi & Freeman, 1993), or testing a program on a small scale before broad dissemination (Coyle, Boruch, & Turner, 1991). Sample question: Who is the intended audience for the program? Process Evaluation Examining the implementation and operation of program components. Sample question: Was the program administered as planned? Impact Evaluation Investigating the magnitude of both positive and negative changes produced by a program (Rossi & Freeman, 1993). Some evaluators limit these changes to those occurring immediately (Green & Kreuter, 1991). Sample question: Did participant knowledge change after attending the program? Outcome Evaluation Assessing the short and long-term results of a program. Sample question: What are the long-term positive effects of program participation? Performance or Program Monitoring Similar to process evaluation, differing only by providing regular updates of evaluation results to stakeholders rather than summarizing results at the evaluation's conclusion (Rossi & Freeman, 1993; Burt, Harrell, Newmark, Aron, & Jacobs, 1997). Evaluation Standards and Designs Evaluation should be incorporated during the initial stages of program development. An initial step of the evaluation process is to describe the program in detail. This collaborative activity can create a mutual understanding of the program, the evaluation process, and program and evaluation terminology. Developing a program description also helps ensure that program activities and objectives are clearly defined and that the objectives can be measured. In general, the evaluation should be feasible, useful, culturally competent, ethical and accurate (CDC, 1999). Data should be collected over time using multiple instruments that are valid, meaning they measure what they are supposed to measure, and reliable, meaning they produce similar results consistently (Rossi & Freeman, 1993). The use of qualitative as well as quantitative data can provide a more comprehensive picture of the program. Evaluations of programs aimed at violence prevention should also be particularly sensitive to issues of safety and confidentiality. Experimental designs are defined by the random assignment of individuals to a group participating in the program or to a control group not receiving the program. These ideal experimental conditions are not always practical or ethical in "real world" constraints of program delivery. A possible solution to blending the need for a comparison group with feasibility is the quasi-experimental design in which an equivalent group (i.e. individuals receiving standard services) is compared to the group participating in the target program. However, the use of this design may introduce difficulties in attributing the causation of effects to the target program. While non-experimental designs may be easiest to implement in a program setting and provide a large quantity of data, drawing conclusions of program effects are difficult. Logic Models Logic models are flowcharts that depict program components. These models can include any number of program elements, showing the development of a program from theory to activities and outcomes. Infrastructure, inputs, processes, and outputs are often included. The process of developing logic models can serve to clarify program elements and expectations for the stakeholders. By depicting the sequence and logic of inputs, processes and outputs, logic models can help ensure that the necessary data are collected to make credible statements of causality (CDC, 1999). Communicating Evaluation Findings Preparation, effective communication and timeliness in order to ensure the utility of evaluation findings. Questions that should be answered at the evaluation's inception include: what will be communicated? to whom? by whom? and how? The target audience must be identified and the report written to address their needs including the use of non-technical language and a user-friendly format (National Committee for Injury Prevention and Control, 1989). Policy makers, current and potential funders, the media, current and potential clients, and members of the community at large should be considered as possible audiences. Evaluation reports describe the process as well as findings based on the data
Summative Evaluation Summative evaluation looks at the impact of an intervention on the target group. This type of evaluation is arguably what is considered most often as 'evaluation' by project staff and funding bodies- that is, finding out what the project achieved. Summative evaluation can take place during the project implementation, but is most often undertaken at the end of a project. As such, summative evaluation can also be referred to as ex-post evaluation (meaning after the event). Summative evaluation is often associated with more objective, quantitative methods of data collection. Summative evaluation is linked to the evaluation drivers of accountability. It is recommended to use a balance of both quantitative and qualitative methods in order to get a better understanding of what your project has achieved, and how or why this has occurred. Using qualitative methods of data collection can also provide a good insight into unintended consequences and lessons for improvement. Summative evaluation is outcome-focused more than process focussed. It is important to distinguish outcome from output. Summative evaluation is not about stating that three workshops were held, with a total of fifty people attending (outputs), but rather the result of these workshops, such as increased knowledge or increased uptake of rainwater tanks (outcomes). Why undertake a summative evalation? Here are some key reasons why you should undertake a summative evaluation: Summative evaluation provides a means to find out whether your project has reached its goals/objectives/outcomes. Summative evaluation allows you to quantify the changes in resource use attributable to your project so that you can track how you are the impact of your project. Summative evaluation allows you to compare the impact of different projects and make results-based decisions on future spending allocations (taking into account unintended consequences ). Summative evaluation allows you to develop a better understanding of the process of change, and finding out what works, what doesnt, and why. This allows you to gather the knowledge to learn and improve future project designs and implementation.
Categories of summative evaluation
Outcome Evaluation When Project implementation and post-project> Why To assess whether the project has met its goals, whether there were any unintended consequences, what were the learnings, and how to improve Data type Quantitative Qualitative Examples Metering Meter reading Audits or counts Questionnaires Deemed Savings Footprint Calculators Focus Group Storytelling / Most Significant Change Outcome Hierarchy Some types of summative evaluation require the collection of baseline data in order to provide a before and after intervention figures. As such, it is important to factor this into the evaluation design. It is considered good evaluation practice to include both formative and summative evaluation.
Types of evaluation Formative evaluation Formative evaluation is about gathering information in order to plan, refine and improve a programme. It most often takes place when a service or programme is being set up and can continue throughout the life of the project. Its intent is to assess ongoing project activities and provide information to monitor and improve the project. It is done at several points in the developmental life of a project and its activities. It is common for the evaluator to participate in decision-making during this phase with an emphasis on quick feedback to support necessary programme changes during early stages. Types of questions answered in a formative evaluation includes: What is the need for the programme (ie, needs assessment)? What approach should the programme take? What record keeping systems are needed to enable the programme to be evaluated? Process evaluation Process evaluation is about documenting the development and process of the programme, in order to assess strengths and weaknesses and determine why outcomes occur. Process evaluations describe and assess program materials and activities. Examination of materials is likely to occur while programs are being developed, as a check on the appropriateness of the approach and procedures that will be used in the programme. Types of questions answered in a process evaluation include: What happens in the programme? To what extent is it being implemented as planned? What changes are being made? What immediate improvements need to be made? What is the participants experience of the programme? What is working well? What needs to be changed? Are the resources for the programme adequate? Is the programme reaching the intended audiences? Impact evaluation and outcome evaluation Impact evaluations look beyond the immediate results of policies, instruction, or services to identify longer-term as well as unintended program effects. It may also examine what happens when several programmes operate in unison. Outcome evaluations study the immediate or direct effects of the program on participants. The scope of an outcome evaluation can extend beyond knowledge or attitudes, however, to examine the immediate behavioural effects of programmes. Types of questions answered in an impact and/or outcome evaluation include: How effective is the programme? What has changed as a result of the programme? Which audiences benefit from the programme? Which audiences do not? Are there unintended outcomes of the programme? How significant are they? National Science Foundation. (2002). User-Friendly Handbook for Project Evaluation. National Science Foundation. Waa, A., Holibar, F., and Spinola, C. (1998). Planning and doing programme evaluation: An introductory guide for health promotion. Alcohol and Public Health Research Unit: Whariki Runanga, Wananga, Hauora me te Paekaka, University of Auckland, New Zealand. Selecting evaluation methods The overall goal in selecting evaluation method(s) is to get the most useful information in the most cost effective and realistic fashion. Consider the following questions when selecting your evaluation methods: 1. What information is needed? 2. Where is the information? 3. Who has the information? 4. What is the best way to get the information? Then 1. Will the methods get all of the needed information? 2. Will the audience find these methods non-intrusive and culturally appropriate? 3. How can the information be analysed? 4. PLUS resources how much money, time, people?
Types of Evaluation
When you come to evaluate your project, you will need to focus on two aspects of your project. You will need to look at firstly the activities, and secondly the effect your project has had. In evaluation language this is known as process and impact/outcome evaluation. Apart from process and impact evaluation, it is also useful to also consider summative evaluation. Process Evaluation This involves judging the activities (or strategies) of your project. This often involves looking at what has been done, who has been reached, and the quality of the activities. It involves seeking answers to questions such as : Has the project reached the appropriate people? Are all the projects activities going to plan? If not, why not? Were any changes made to the intended activities? If so, why? Are materials, information, presentations of good quality? Are the participants and other key people satisfied? Impact/Outcome Evaluation This involves judging the extent to which your project has had an effect on the changes you were seeking. In other words, the extent to which your project has met its goal and objectives. Impact evaluation judges how well the objectives were achieved and outcome evaluation involves judging how well the goal has been achieved. It involves seeking answers to questions such as : What progress has been made toward achieving the goal? To what extent has the project met its objectives? How effective has the project been at producing changes? Are there any factors outside of the project that have contributed to (or prevented) the desired change? Has the project resulted in any unintended change? Summative Evaluation This is done at the end of the project and involves considering the project as a whole, from beginning to 'end'. It is meant to summarise and inform decisons about whether to continue the project (or parts of it), whether it is valuable to expand into other settings. It involves seeking answers to questions such as: Overall... what were the main benefits and disappointments? what things helped and hindered the project? in retrospect, what could have strengthened it? what would you advise others embarking on something similar? what aspects will be sustained and how? is it worth continuing in its current form? Why/why not? what recommendations have emerged about where to from here?
Program evaluation Program evaluation is a systematic method for collecting, analyzing, and using information to answer questions about projects, policies and programs, [1] particularly about their effectiveness and efficiency. In both the public and private sectors, stakeholders often want to know whether the programs they are funding, implementing, voting for, receiving or objecting to are producing the intended effect. Whileprogram evaluation first focuses around this definition, important considerations often include how much the program costs per participant, how the program could be improved, whether the program is worthwhile, whether there are better alternatives, if there are unintended outcomes, and whether the program goals are appropriate and useful. [2] Evaluators help to answer these questions, but the best way to answer the questions is for the evaluation to be a joint project between evaluators and stakeholders. [3]
The process of evaluation is considered to be a relatively recent phenomenon. However, planned social evaluation has been documented as dating as far back as 2200 BC. [4] Evaluation became particularly relevant in the U.S. in the 1960s during the period of the Great Society social programs associated with the Kennedy and Johnson administrations. [5][6] Extraordinary sums were invested in social programs, but the impacts of these investments were largely unknown. Program evaluations can involve both quantitative and qualitative methods of social research. People who do program evaluation come from many different backgrounds, such as sociology, psychology,economics, and social work. Some graduate schools also have specific training programs for program evaluation. Doing an evaluation[edit] Program evaluation may be conducted at several stages during a program's lifetime. Each of these stages raises different questions to be answered by the evaluator, and correspondingly different evaluation approaches are needed. Rossi, Lipsey and Freeman (2004) suggest the following kinds of assessment, which may be appropriate at these different stages: Assessment of the need for the program Assessment of program design and logic/theory Assessment of how the program is being implemented (i.e., is it being implemented according to plan? Are the program's processes maximizing possible outcomes?) Assessment of the program's outcome or impact (i.e., what it has actually achieved) Assessment of the program's cost and efficiency Assessing needs[edit] A needs assessment examines the population that the program intends to target, to see whether the need as conceptualized in the program actually exists in the population; whether it is, in fact, a problem; and if so, how it might best be dealt with. This includes identifying and diagnosing the actual problem the program is trying to address, who or what is affected by the problem, how widespread the problem is, and what are the measurable effects that are caused by the problem. For example, for a housing program aimed at mitigating homelessness, a program evaluator may want to find out how many people are homeless in a given geographic area and what their demographics are. Rossi, Lipsey and Freeman (2004) caution against undertaking an intervention without properly assessing the need for one, because this might result in a great deal of wasted funds if the need did not exist or was misconceived. Needs assessment involves the processes or methods used by evaluators to describe and diagnose social needs [7] This is essential for evaluators because they need to identify whether programs are effective and they cannot do this unless they have identified what the problem/need is. Programs that do not do a needs assessment can have the illusion that they have eradicated the problem/need when in fact there was no need in the first place. Needs assessment involves research and regular consultation with community stakeholders and with the people that will benefit from the project before the program can be developed and implemented. Hence it should be a bottom-up approach. In this way potential problems can be realized early because the process would have involved the community in identifying the need and thereby allowed the opportunity to identify potential barriers. The important task of a program evaluator is thus to: First, construct a precise definition of what the problem is. [7] Evaluators need to first identify the problem/need. This is most effectively done by collaboratively including all possible stakeholders, i.e., the community impacted by the potential problem, the agents/actors working to address and resolve the problem, funders, etc. Including buy-in early on in the process reduces potential for push-back, miscommunication, and incomplete information later on. Second, assess the extent of the problem. [7] Having clearly identified what the problem is, evaluators need to then assess the extent of the problem. They need to answer the where and how big questions. Evaluators need to work out where the problem is located and how big it is. Pointing out that a problem exists is much easier than having to specify where it is located and how rife it is. Rossi, Lipsey & Freeman (2004) gave an example that: a person identifying some battered children may be enough evidence to persuade one that child abuse exists. But indicating how many children it affects and where it is located geographically and socially would require knowledge about abused children, the characteristics of perpetrators and the impact of the problem throughout the political authority in question. This can be difficult considering that child abuse is not a public behavior, also keeping in mind that estimates of the rates on private behavior are usually not possible because of factors like unreported cases. In this case evaluators would have to use data from several sources and apply different approaches in order to estimate incidence rates. There are two more questions that need to be answered: [8] Evaluators need to also answer the how and what questions [8] The how question requires that evaluators determine how the need will be addressed. Having identified the need and having familiarized oneself with the community evaluators should conduct a performance analysis to identify whether the proposed plan in the program will actually be able to eliminate the need. The what question requires that evaluators conduct a task analysis to find out what the best way to perform would be. For example whether the job performance standards are set by an organization or whether some governmental rules need to be considered when undertaking the task. [8]
Third, define and identify the target of interventions and accurately describe the nature of the service needs of that population [7] It is important to know what/who the target population is/are it might be individuals, groups, communities, etc. There are three units of the population: population at risk, population in need and population in demand [7]
Population at risk: are people with a significant probability of developing the risk e.g. the population at risk for birth control programs are women of child bearing age. Population in need: are people with the condition that the program seeks to address; e.g. the population in need for a program that aims to provide ARVs to HIV positive people are people that are HIV positive. Population in demand: that part of the population in need that agrees to be having the need and are willing to take part in what the program has to offer e.g. not all HIV positive people will be willing to take ARVs. Being able to specify what/who the target is will assist in establishing appropriate boundaries, so that interventions can correctly address the target population and be feasible to apply< [7]
There are four steps in conducting a needs assessment: [9]
1. Perform a gap analyses Evaluators need to compare current situation to the desired or necessary situation. The difference or the gap between the two situations will help identify the need, purpose and aims of the program. 2. Identify priorities and importance In the first step above, evaluators would have identified a number of interventions that could potentially address the need e.g. training and development, organization development etc. These must now be examined in view of their significance to the programs goals and constraints. This must be done by considering the following factors: cost effectiveness (consider the budget of the program, assess cost/benefit ratio), executive pressure (whether top management expects a solution) and population (whether many key people are involved). 3. Identify causes of performance problems and/or opportunities When the needs have been prioritized the next step is to identify specific problem areas within the need to be addressed. And to also assess the skills of the people that will be carrying out the interventions. 4. Identify possible solutions and growth opportunities Compare the consequences of the interventions if it was to be implemented or not. Needs analysis is hence a very crucial step in evaluating programs because the effectiveness of a program cannot be assessed unless we know what the problem was in the first place. Assessing program theory[edit] The program theory, also called a logic model or impact pathway, [10] is an assumption, implicit in the way the program is designed, about how the program's actions are supposed to achieve the outcomes it intends. This 'logic model' is often not stated explicitly by people who run programs, it is simply assumed, and so an evaluator will need to draw out from the program staff how exactly the program is supposed to achieve its aims and assess whether this logic is plausible. For example, in an HIV prevention program, it may be assumed that educating people about HIV/AIDS transmission, risk and safe sex practices will result in safer sex being practiced. However, research in South Africa increasingly shows that in spite of increased education and knowledge, people still often do not practice safe sex. [11] Therefore, the logic of a program which relies on education as a means to get people to use condoms may be faulty. This is why it is important to read research that has been done in the area. Explicating this logic can also reveal unintended or unforeseen consequences of a program, both positive and negative. The program theory drives the hypotheses to test for impact evaluation. Developing a logic model can also build common understanding amongst program staff and stakeholders about what the program is actually supposed to do and how it is supposed to do it, which is often lacking (see Participatory impact pathways analysis). Rossi, Lipsey & Freeman (2004) suggest four approaches and procedures that can be used to assess the program theory. [7] These approaches are discussed below. Assessment in relation to social needs [7]
This entails assessing the program theory by relating it to the needs of the target population the program is intended to serve. If the program theory fails to address the needs of the target population it will be rendered ineffective even when if it is well implemented. [7]
Assessment of logic and plausibility [7]
This form of assessment involves asking a panel of expert reviewers to critically review the logic and plausibility of the assumptions and expectations inherent in the program's design. [7] The review process is unstructured and open ended so as to address certain issues on the program design. Rutman (1980), Smith (1989), and Wholey (1994) suggested the questions listed below to assist with the review process. [7]
Are the program goals and objectives well defined? Are the program goals and objectives feasible? Is the change process presumed in the program theory feasible? Are the procedures for identifying members of the target population, delivering service to them, and sustaining that service through completion well defined and suffiient? Are the constituent components, activities, and functions of the program well defined and sufficient? Are the resources allocated to the program and its various activities adequate? Assessment through comparison with research and practice [7]
This form of assessment requires gaining information from research literature and existing practices to assess various components of the program theory. The evaluator can assess whether the program theory is congruent with research evidence and practical experiences of programs with similar concepts. [7]
Assessment via preliminary observation [7]
This approach involves incorporating firsthand observations into the assessment process as it provides a reality check on the concordance between the program theory and the program itself. [7] The observations can focus on the attainability of the outcomes, circumstances of the target population, and the plausibility of the program activities and the supporting resources. [7]
These different forms of assessment of program theory can be conducted to ensure that the program theory is sound. Assessing implementation[edit] Process analysis looks beyond the theory of what the program is supposed to do and instead evaluates how the program is being implemented. This evaluation determines whether the components identified as critical to the success of the program are being implemented. The evaluation determines whether target populations are being reached, people are receiving the intended services, staff are adequately qualified. Process evaluation is an ongoing process in which repeated measures may be used to evaluate whether the program is being implemented effectively. Assessing the impact (effectiveness)[edit] The impact evaluation determines the causal effects of the program. This involves trying to measure if the program has achieved its intended outcomes, i.e. program outcomes. Program Outcomes[edit] An outcome is the state of the target population or the social conditions that a program is expected to have changed. [7] Program outcomes are the observed characteristics of the target population or social conditions, not of the program. Thus the concept of an outcome does not necessarily mean that the program targets have actually changed or that the program has caused them to change in any way. [7]
There are two kinds of outcomes, namely outcome level and outcome change, also associated with program effect. [7]
Outcome level refers to the status of an outcome at some point in time. Outcome change refers to the difference between outcome levels at different points in time. Program effect refers to that portion of an outcome change that can be attributed uniquely to a program as opposed to the influence of some other factor. Measuring Program Outcomes[edit] Outcome measurement is a matter of representing the circumstances defined as the outcome by means of observable indicators that vary systematically with changes or differences in those circumstances. [7] Outcome measurement is a systematic way to assess the extent to which a program has achieved its intended outcomes. [12] According to Mouton (2009) measuring the impact of a program means demonstrating or estimating the accumulated differentiated proximate and emergent effect, some of which might be unintended and therefore unforeseen. [13]
Outcome measurement serves to help you understand whether the program is effective or not. It further helps you to clarify your understanding of your program. But the most important reason for undertaking the effort is to understand the impacts of your work on the people you serve. [13] With the information you collect, you can determine which activities to continue and build upon, and which you need to change in order to improve the effectiveness of the program. This can involve using sophisticated statistical techniques in order to measure the effect of the program and to find causal relationship between the program and the various outcomes. More information about impact evaluation is found under the heading 'Determining Causation'. Assessing efficiency[edit] Finally, cost-benefit or cost-effectiveness analysis assesses the efficiency of a program. Evaluators outline the benefits and cost of the program for comparison. An efficient program has a lower cost-benefit ratio. Determining causation[edit] Perhaps the most difficult part of evaluation is determining whether the program itself is causing the changes that are observed in the population it was aimed at. Events or processes outside of the program may be the real cause of the observed outcome (or the real prevention of the anticipated outcome). Causation is difficult to determine. One main reason for this is self selection bias. [14] People select themselves to participate in a program. For example, in a job training program, some people decide to participate and others do not. Those who do participate may differ from those who do not in important ways. They may be more determined to find a job or have better support resources. These characteristics may actually be causing the observed outcome of increased employment, not the job training program. Evaluations conducted with random assignment are able to make stronger inferences about causation. Randomly assigning people to participate or to not participate in the program, reduces or eliminates self-selection bias. Thus, the group of people who participate would likely be more comparable to the group who did not participate. However, since most programs cannot use random assignment, causation cannot be determined. Impact analysis can still provide useful information. For example, the outcomes of the program can be described. Thus the evaluation can describe that people who participated in the program were more likely to experience a given outcome than people who did not participate. If the program is fairly large, and there are enough data, statistical analysis can be used to make a reasonable case for the program by showing, for example, that other causes are unlikely. Reliability, validity and sensitivity in program evaluation[edit] It is important to ensure that the instruments (for example, tests, questionnaires, etc.) used in program evaluation are as reliable, valid and sensitive as possible. According to Rossi et al. (2004, p. 222), [7] 'a measure that is poorly chosen or poorly conceived can completely undermine the worth of an impact assessment by producing misleading estimates. Only if outcome measures are valid, reliable and appropriately sensitive can impact assessments be regarded as credible'. Reliability[edit] The reliability of a measurement instrument is the 'extent to which the measure produces the same results when used repeatedly to measure the same thing' (Rossi et al., 2004, p. 218). [7] The more reliable a measure is, the greater its statistical power and the more credible its findings. If a measuring instrument is unreliable, it may dilute and obscure the real effects of a program, and the program will 'appear to be less effective than it actually is' (Rossi et al., 2004, p. 219). [7] Hence, it is important to ensure the evaluation is as reliable as possible. Validity[edit] The validity of a measurement instrument is 'the extent to which it measures what it is intended to measure' (Rossi et al., 2004, p. 219). [7] This concept can be difficult to accurately measure: in general use in evaluations, an instrument may be deemed valid if accepted as valid by the stakeholders (stakeholders may include, for example, funders, program administrators, et cetera). Sensitivity[edit] The principal purpose of the evaluation process is to measure whether the program has an effect on the social problem it seeks to redress; hence, the measurement instrument must be sensitive enough to discern these potential changes (Rossi et al., 2004). [7] A measurement instrument may be insensitive if it contains items measuring outcomes which the program couldn't possibly effect, or if the instrument was originally developed for applications to individuals (for example standardized psychological measures) rather than to a group setting (Rossi et al., 2004). [7] These factors may result in 'noise' which may obscure any effect the program may have had. Only measures which adequately achieve the benchmarks of reliability, validity and sensitivity can be said to be credible evaluations. It is the duty of evaluators to produce credible evaluations, as their findings may have far reaching effects. A discreditable evaluation which is unable to show that a program is achieving its purpose when it is in fact creating positive change may cause the program to lose its funding undeservedly. [improper synthesis?]
Steps to Program Evaluation Framework According to the Center for Decease Control (CDC) there are six steps to a complete program evaluation. The steps described are: engage stakeholder, describe the program, focus the evaluation design, gather credible evidence, justify conclusions, and ensure use and share lessons learned. [15] These steps can happen in a cycle framework to represent the continuing process of evaluation. Methodological constraints and challenges[edit] The shoestring approach[edit] The shoestring evaluation approach is designed to assist evaluators operating under limited budget, limited access or availability of data and limited turnaround time, to conduct effective evaluations that are methodologically rigorous(Bamberger, Rugh, Church & Fort, 2004). [16] This approach has responded to the continued greater need for evaluation processes that are more rapid and economical under difficult circumstances of budget, time constraints and limited availability of data. However, it is not always possible to design an evaluation to achieve the highest standards available. Many programs do not build an evaluation procedure into their design or budget. Hence, many evaluation processes do not begin until the program is already underway, which can result in time, budget or data constraints for the evaluators, which in turn can affect the reliability, validity or sensitivity of the evaluation. > The shoestring approach helps to ensure that the maximum possible methodological rigor is achieved under these constraints. Budget constraints[edit] Frequently, programs are faced with budget constraints because most original projects do not include a budget to conduct an evaluation (Bamberger et al., 2004). Therefore, this automatically results in evaluations being allocated smaller budgets that are inadequate for a rigorous evaluation. Due to the budget constraints it might be difficult to effectively apply the most appropriate methodological instruments. These constraints may consequently affect the time available in which to do the evaluation (Bamberger et al., 2004). [16] Budget constraints may be addressed by simplifying the evaluation design, revising the sample size, exploring economical data collection methods (such as using volunteers to collect data, shortening surveys, or using focus groups and key informants) or looking for reliable secondary data (Bamberger et al., 2004). [16]
Time constraints[edit] The most time constraint that can be faced by an evaluator is when the evaluator is summoned to conduct an evaluation when a project is already underway if they are given limited time to do the evaluation compared to the life of the study, or if they are not given enough time for adequate planning. Time constraints are particularly problematic when the evaluator is not familiar with the area or country in which the program is situated (Bamberger et al., 2004). [16] Time constraints can be addressed by the methods listed under budget constraints as above, and also by careful planning to ensure effective data collection and analysis within the limited time space. Data constraints[edit] If the evaluation is initiated late in the program, there may be no baseline data on the conditions of the target group before the intervention began (Bamberger et al., 2004). [16] Another possible cause of data constraints is if the data have been collected by program staff and contain systematic reporting biases or poor record keeping standards and is subsequently of little use (Bamberger et al., 2004). [16] Another source of data constraints may result if the target group are difficult to reach to collect data from - for example homeless people, drug addicts, migrant workers, et cetera (Bamberger et al., 2004). [16] Data constraints can be addressed by reconstructing baseline data from secondary data or through the use of multiple methods. Multiple methods, such as the combination of qualitative and quantitative data can increase validity through triangulation and save time and money. Additionally, these constraints may be dealt with through careful planning and consultation with program stakeholders. By clearly identifying and understanding client needs ahead of the evaluation, costs and time of the evaluative process can be streamlined and reduced, while still maintaining credibility. All in all, time, monetary and data constraints can have negative implications on the validity, reliability and transferability of the evaluation. The shoestring approach has been created to assist evaluators to correct the limitations identified above by identifying ways to reduce costs and time, reconstruct baseline data and to ensure maximum quality under existing constraints (Bamberger et al., 2004). Five-tiered approach[edit] The five-tiered approach to evaluation further develops the strategies that the shoestring approach to evaluation is based upon. [17] It was originally developed by Jacobs (1988) as an alternative way to evaluate community-based programs and as such was applied to a state wide child and family program in Massachusetts, U.S.A. [18] The five-tiered approach is offered as a conceptual framework for matching evaluations more precisely to the characteristics of the programs themselves, and to the particular resources and constraints inherent in each evaluation context. [17] In other words, the five-tiered approach seeks to tailor the evaluation to the specific needs of each evaluation context. The earlier tiers (1-3) generate descriptive and process-oriented information while the later tiers (4-5) determine both the short-term and the long-term effects of the program. [19] The five levels are organized as follows: Tier 1: needs assessment (sometimes referred to as pre-implementation) [20]
Tier 2: monitoring and accountability Tier 3: quality review and program clarification (sometimes referred to as understanding and refining) [21]
Tier 4: achieving outcomes Tier 5: establishing impact For each tier, purpose(s) are identified, along with corresponding tasks that enable the identified purpose of the tier to be achieved. [21] For example, the purpose of the first tier, Needs assessment, would be to document a need for a program in a community. The task for that tier would be to assess the community's needs and assets by working with all relevant stakeholders. [21]
While the tiers are structured for consecutive use, meaning that information gathered in the earlier tiers is required for tasks on higher tiers, it acknowledges the fluid nature of evaluation. [19] Therefore, it is possible to move from later tiers back to preceding ones, or even to work in two tiers at the same time. [21] It is important for program evaluators to note, however, that a program must be evaluated at the appropriate level. [20]
The five-tiered approach is said to be useful for family support programs which emphasise community and participant empowerment. This is because it encourages a participatory approach involving all stakeholders and it is through this process of reflection that empowerment is achieved. [18]
Methodological challenges presented by language and culture[edit] The purpose of this section is to draw attention to some of the methodological challenges and dilemmas evaluators are potentially faced with when conducting a program evaluation in a developing country. In many developing countries the major sponsors of evaluation are donor agencies from the developed world, and these agencies require regular evaluation reports in order to maintain accountability and control of resources, as well as generate evidence for the programs success or failure. [22] However, there are many hurdles and challenges which evaluators face when attempting to implement an evaluation program which attempts to make use of techniques and systems which are not developed within the context to which they are applied. [23] Some of the issues include differences in culture, attitudes, language and political process. [23][24]
Culture is defined by Ebbutt (1998, p. 416) as a constellation of both written and unwritten expectations, values, norms, rules, laws, artifacts, rituals and behaviors that permeate a society and influence how people behave socially. [24] Culture can influence many facets of the evaluation process, including data collection, evaluation program implementation and the analysis and understanding of the results of the evaluation. [24] In particular, instruments which are traditionally used to collect data such as questionnaires and semi-structured interviews need to be sensitive to differences in culture, if they were originally developed in a different cultural context. [25] The understanding and meaning of constructs which the evaluator is attempting to measure may not be shared between the evaluator and the sample population and thus the transference of concepts is an important notion, as this will influence the quality of the data collection carried out by evaluators as well as the analysis and results generated by the data. [25]
Language also plays an important part in the evaluation process, as language is tied closely to culture. [25] Language can be a major barrier to communicating concepts which the evaluator is trying to access, and translation is often required. [24] There are a multitude of problems with translation, including the loss of meaning as well as the exaggeration or enhancement of meaning by translators. [24] For example, terms which are contextually specific may not translate into another language with the same weight or meaning. In particular, data collection instruments need to take meaning into account as the subject matter may not be considered sensitive in a particular context might prove to be sensitive in the context in which the evaluation is taking place. [25] Thus, evaluators need to take into account two important concepts when administering data collection tools: lexical equivalence and conceptual equivalence. [25] Lexical equivalence asks the question: how does one phrase a question in two languages using the same words? This is a difficult task to accomplish, and uses of techniques such as back-translation may aid the evaluator but may not result in perfect transference of meaning. [25] This leads to the next point, conceptual equivalence. It is not a common occurrence for concepts to transfer unambiguously from one culture to another. [25] Data collection instruments which have not undergone adequate testing and piloting may therefore render results which are not useful as the concepts which are measured by the instrument may have taken on a different meaning and thus rendered the instrument unreliable and invalid. [25]
Thus, it can be seen that evaluators need to take into account the methodological challenges created by differences in culture and language when attempting to conduct a program evaluation in a developing country. Utilization results[edit] There are three conventional uses of evaluation results: persuasive utilization, direct (instrumental) utilization, and conceptual utilization. Persuasive utilization[edit] Persuasive utilization is the enlistment of evaluation results in an effort to persuade an audience to either support an agenda or to oppose it. Unless the 'persuader' is the same person that ran the evaluation, this form of utilization is not of much interest to evaluators as they often cannot foresee possible future efforts of persuasion. [7]
Direct (instrumental) utilization[edit] Evaluators often tailor their evaluations to produce results that can have a direct influence in the improvement of the structure, or on the process, of a program. For example, the evaluation of a novel educational intervention may produce results that indicate no improvement in students' marks. This may be due to the intervention not having a sound theoretical background, or it may be that the intervention is not conducted as originally intended. The results of the evaluation would hopefully cause to the creators of the intervention to go back to the drawing board to re-create the core structure of the intervention, or even change the implementation processes. [7]
Conceptual utilization[edit] But even if evaluation results do not have a direct influence in the re-shaping of a program, they may still be used to make people aware of the issues the program is trying to address. Going back to the example of an evaluation of a novel educational intervention, the results can also be used to inform educators and students about the different barriers that may influence students' learning difficulties. A number of studies on these barriers may then be initiated by this new information. [7]
Variables affecting utilization[edit] There are five conditions that seem to affect the utility of evaluation results, namely relevance, communication between the evaluators and the users of the results, information processing by the users,the plausibility of the results, as well as the level of involvement or advocacy of the users. [7]
Guidelines for maximizing utilization[edit] Quoted directly from Rossi et al. (2004, p. 416).: [7]
Evaluators must understand the cognitive styles of decisionmakers Evaluation results must be timely and available when needed Evaluations must respect stakeholders' program commitments Utilization and dissemination plans should be part of the evaluation design Evaluations should include an assessment of utilization Internal versus external program evaluators[edit] The choice of the evaluator chosen to evaluate the program may be regarded as equally important as the process of the evaluation. Evaluators may be internal (persons associated with the program to be executed) or external (Persons not associated with any part of the execution/implementation of the program). (Division for oversight services,2004). The following provides a brief summary of the advantages and disadvantages of internal and external evaluators adapted from the Division of oversight services (2004), for a more comprehensive list of advantages and disadvantages of internal and external evaluators, see (Division of oversight services, 2004). Internal evaluators[edit] Advantages May have better overall knowledge of the program and possess informal knowledge of the program Less threatening as already familiar with staff Less costly Disadvantages May be less objective May be more preocuppied with other activities of the program and not give the evaluation complete attention May not be adequately trained as an evaluator. External evaluators[edit] Advantages More objective of the process, offers new perspectives, different angles to observe and critique the process May be able to dedicate greater amount of time and attention to the evaluation May have greater expertise and evaluation brain Disadvantages May be more costly and require more time for the contract, monitoring, negotiations etc. May be unfamiliar with program staff and create anxiety about being evaluated May be unfamiliar with organization policies, certain constraints affecting the program. Three paradigms[edit] Positivist[edit] Potter (2006) [26] identifies and describes three broad paradigms within program evaluation . The first, and probably most common, is the positivist approach, in which evaluation can only occur where there are objective, observable and measurable aspects of a program, requiring predominantly quantitative evidence. The positivist approach includes evaluation dimensions such as needs assessment, assessment of program theory, assessment of program process, impact assessment and efficiency assessment (Rossi, Lipsey and Freeman, 2004). [27] A detailed example of the positivist approach is a study conducted by the Public Policy Institute of California report titled "Evaluating Academic Programs in California's Community Colleges", in which the evaluators examine measurable activities (i.e. enrollment data) and conduct quantitive assessments like factor analysis. [28]
Interpretive[edit] The second paradigm identified by Potter (2006) is that of interpretive approaches, where it is argued that it is essential that the evaluator develops an understanding of the perspective, experiences and expectations of all stakeholders. This would lead to a better understanding of the various meanings and needs held by stakeholders, which is crucial before one is able to make judgments about the merit or value of a program. The evaluators contact with the program is often over an extended period of time and, although there is no standardized method, observation, interviews and focus groups are commonly used. A report commissioned by the World Bank details 8 approaches in which qualitative and quantitative methods can be integrated and perhaps yield insights not achievable through only one method. [29]
Critical-emancipatory[edit] Potter (2006) also identifies critical-emancipatory approaches to program evaluation, which are largely based on action research for the purposes of social transformation. This type of approach is much more ideological and often includes a greater degree of social activism on the part of the evaluator. This approach would be appropriate for qualitative and participative evaluations. Because of its critical focus on societal power structures and its emphasis on participation and empowerment, Potter argues this type of evaluation can be particularly useful in developing countries. Despite the paradigm which is used in any program evaluation, whether it be positivist, interpretive or critical-emancipatory, it is essential to acknowledge that evaluation takes place in specific socio-political contexts. Evaluation does not exist in a vacuum and all evaluations, whether they are aware of it or not, are influenced by socio-political factors. It is important to recognize the evaluations and the findings which result from this kind of evaluation process can be used in favour or against particular ideological, social and political agendas (Weiss, 1999). [30] This is especially true in an age when resources are limited and there is competition between organizations for certain projects to be prioritised over others (Louw, 1999). [31]
Empowerment evaluation[edit] Main article: Empowerment evaluation Empowerment evaluation makes use of evaluation concepts, techniques, and findings to foster improvement and self-determination of a particular program aimed at a specific target population/program participants. [32] Empowerment evaluation is value oriented towards getting program participants involved in bringing about change in the programs they are targeted for. One of the main focuses in empowerment evaluation is to incorporate the program participants in the conducting of the evaluation process. This process is then often followed by some sort of critical reflection of the program. In such cases, an external/outsider evaluator serves as a consultant/coach/facilitator to the program participants and seeks to understand the program from the perspective of the participants. Once a clear understanding of the participants perspective has been gained appropriate steps and strategies can be devised (with the valuable input of the participants) and implemented in order to reach desired outcomes. According to Fetterman (2002) [32] empowerment evaluation has three steps; Establishing a mission Taking stock Planning for the future Establishing a mission[edit] The first step involves evaluators asking the program participants and staff members (of the program) to define the mission of the program. Evaluators may opt to carry this step out by bringing such parties together and asking them to generate and discuss the mission of the program. The logic behind this approach is to show each party that there may be divergent views of what the program mission actually is. Taking stock[edit] Taking stock as the second step consists of two important tasks. The first task is concerned with program participants and program staff generating a list of current key activities that are crucial to the functioning of the program. The second task is concerned with rating the identified key activities, also known as prioritization. For example, each party member may be asked to rate each key activity on a scale from 1 to 10, where 10 is the most important and 1 the least important. The role of the evaluator during this task is to facilitate interactive discussion amongst members in an attempt to establish some baseline of shared meaning and understanding pertaining to the key activities.In addition, relevant documentation (such as financial reports and curriculum information) may be brought into the discussion when considering some of the key activities. Planning for the future[edit] After prioritizing the key activities the next step is to plan for the future. Here the evaluator asks program participants and program staff how they would like to improve the program in relation to the key activities listed. The objective is to create a thread of coherence whereby the mission generated (step 1) guides the stock take (step 2) which forms the basis for the plans for the future (step 3). Thus, in planning for the future specific goals are aligned with relevant key activities. In addition to this it is also important for program participants and program staff to identify possible forms of evidence (measurable indicators) which can be used to monitor progress towards specific goals. Goals must be related to the program's activities, talents, resources and scope of capability- in short the goals formulated must be realistic. These three steps of empowerment evaluation produce the potential for a program to run more effectively and more in touch with the needs of the target population. Empowerment evaluation as a process which is facilitated by a skilled evaluator equips as well as empowers participants by providing them with a 'new' way of critically thinking and reflecting on programs. Furthermore, it empowers program participants and staff to recognize their own capacity to bring about program change through collective action. [33]
Transformative Paradigm[edit] The transformative paradigm is integral in incorporating social justice in evaluation. Donna Mertens, primary researcher in this field, states that the transformative paradigm, focuses primarily on viewpoints of marginalized groups and interrogating systemic power structures through mixed methods to further social justice and human rights. [34] The transformative paradigm arose after marginalized group, who have historically been pushed to the side in evaluation, began to collaborate with scholars to advocate for social justice and human rights in evaluation. The transformative paradigm introduces many different paradigms and lenses to the evaluation process, leading it to continually call into question the evaluation process. Both the American Evaluation Association and National Association of Social Workers call attention to the ethical duty to possess cultural competence when conducting evaluations. Cultural competence in evaluation can be broadly defined as a systemic, response inquiry that is actively cognizant, understanding, and appreciative of the cultural context in which the evaluation takes place; that frames and articulates epistemology of the evaluation endeavor; that employs culturally and contextually appropriate methodology; and that uses stakeholder-generated, interpretive means to arrive at the results and further use of the findings. [35] Many health and evaluation leaders are careful to point out that cultural competence cannot be determined by a simple checklist, but rather it is an attribute that develops over time. The root of cultural competency in evaluation is a genuine respect for communities being studied and openness to seek depth in understanding different cultural contexts, practices and paradigms of thinking. This includes being creative and flexible to capture different cultural contexts, and heightened awareness of power differentials that exist in an evaluation context. Important skills include: ability to build rapport across difference, gain the trust of the community members, and self-reflect and recognize ones own biases. [36]
Paradigms[edit] The paradigms axiology, ontology, epistemology, and methodology are reflective of social justice practice in evaluation. These examples focus on addressing inequalities and injustices in society by promoting inclusion and equality in human rights. Axiology (Values and Value Judgements)[edit] The transformative paradigms axiological assumption rests on four primary principles: [34]
The importance of being culturally respectful The promotion of social justice The furtherance of human rights Addressing inequities Ontology (Reality)[edit] Differences in perspectives on what is real are determined by diverse values and life experiences. In turn these values and life experiences are often associated with differences in access to privilege, based on such characteristics as disability, gender, sexual identity, religion, race/ethnicity, national origins, political party, income level, are, language, and immigration or refugee status. [34]
Epistemology (Knowledge)[edit] Knowledge is constructed within the context of power and privilege with consequences attached to which version of knowledge is given privilege. [34] Knowledge is socially and historically located within a complex cultural context. [37]
Methodology (Systematic Inquiry)[edit] Methodological decisions are aimed at determining the approach that will best facilitate use of the process and findings to enhance social justice; identify the systemic forces that support the status quo and those that will allow change to happen; and acknowledge the need for a critical and reflexive relationship between the evaluator and the stakeholders. [34]
Lenses[edit] While operating through social justice, it is imperative to be able to view the world through the lens of those who experience injustices. Critical Race Theory, Feminist Theory, and Queer/LGBTQ Theory are frameworks for how we think should think about providing justice for marginalized groups. These lenses create opportunity to make each theory priority in addressing inequality. Critical Race Theory[edit] Critical Race Theory(CRT)is an extension of critical theory that is focused in inequities based on race and ethnicity. Daniel Solorzano describes the role of CRT as providing a framework to investigate and make visible those systemic aspects of society that allow the discriminatory and oppressive status quo of racism to continue. [38]
Feminist Theory[edit] The essence of feminist theories is to expose the individual and institutional practices that have denied access to women and other oppressed groups and have ignored or devalued women [39]
Queer/LGBTQ Theory[edit] Queer/LGBTQ theorists question the heterosexist bias that pervades society in terms of power over and discrimination toward sexual orientation minorities. Because of the sensitivity of issues surrounding LGBTQ status, evaluators need to be aware of safe ways to protect such individuals identities and ensure that discriminatory practices are brought to light in order to bring about a more just society. [34]
Government requirements[edit] Given the Federal budget deficit, the Obama Administration moved to apply an "evidence-based approach" to government spending, including rigorous methods of program evaluation. The President's 2011 Budget earmarked funding for 19 government program evaluations for agencies such as the Department of Education and the United States Agency for International Development (USAID). An inter- agency group delivers the goal of increasing transparency and accountability by creating effective evaluation networks and drawing on best practices. [40] A six-step framework for conducting evaluation of public health programs, published by the Centers for Disease Control and Prevention (CDC), initially increased the emphasis on program evaluation of government programs in the US. The framework is as follows: 1. Engage stakeholders 2. Describe the program. 3. Focus the evaluation. 4. Gather credible evidence. 5. Justify conclusions. 6. Ensure use and share lessons learned. CIPP Model of evaluation[edit] History of the CIPP model[edit] The CIPP model of evaluation was developed by Daniel Stufflebeam and colleagues in the 1960s.CIPP is an acronym for Context, Input, Process and Product. CIPP is an evaluation model that requires the evaluation of context, input, process and product in judging a programmes value. CIPP is a decision-focused approach to evaluation and emphasises the systematic provision of information for programme management and operation. [41]
CIPP model[edit] The CIPP framework was developed as a means of linking evaluation with programme decision-making. It aims to provide an analytic and rational basis for programme decision-making, based on a cycle of planning, structuring, implementing and reviewing and revising decisions, each examined through a different aspect of evaluation context, input, process and product evaluation. [41]
The CIPP model is an attempt to make evaluation directly relevant to the needs of decision-makers during the phases and activities of a programme. [41] Stufflebeams context, input, process, and product(CIPP) evaluation model is recommended as a framework to systematically guide the conception, design, implementation, and assessment of service-learning projects, and provide feedback and judgment of the projects effectiveness for continuous improvement. [41]
Four aspects of CIPP evaluation[edit] These aspects are context, inputs, process, and product. These four aspects of CIPP evaluation assist a decision-maker to answer four basic questions: What should we do? This involves collecting and analysing needs assessment data to determine goals, priorities and objectives. For example, a context evaluation of a literacy program might involve an analysis of the existing objectives of the literacy programme, literacy achievement test scores, staff concerns (general and particular), literacy policies and plans and community concerns, perceptions or attitudes and needs. [41]
How should we do it? This involves the steps and resources needed to meet the new goals and objectives and might include identifying successful external programs and materials as well as gathering information. [41]
Are we doing it as planned? This provides decision-makers with information about how well the programme is being implemented. By continuously monitoring the program, decision-makers learn such things as how well it is following the plans and guidelines, conflicts arising, staff support and morale, strengths and weaknesses of materials, delivery and budgeting problems. [41]
Did the programme work? By measuring the actual outcomes and comparing them to the anticipated outcomes, decision-makers are better able to decide if the program should be continued, modified, or dropped altogether. This is the essence of product evaluation. [41]
Using CIPP in the different stages of the evaluation[edit] The CIPP model is unique as an evaluation guide as it allows evaluators to evaluate the program at different stages, namely: before the program commences by helping evaluators to assess the need and at the end of the program to assess whether or not the program had an effect. CIPP model allows you to ask formative questions at the beginning of the program, then later gives you a guide of how to evaluate the programs impact by allowing you to ask summative questions on all aspects of the program. Context: What needs to be done? Vs. Were important needs addressed? Input: How should it be done? Vs. Was a defensible design employed? Process: Is it being done? Vs. Was the design well executed? Product: Is it succeeding? Vs. Did the effort succeed?
What is assessment? Adrian Tennant takes a look at what is meant by assessment. Many people assume that assessment is simply another word for testing but this article outlines its role as an important aspect of teaching and learning. Introduction When people see, or hear, the word assessment they normally react in a fairly negative way. It might be a deep sigh or a cry of Oh no!, but rarely will it be a smile or a cry of joy. Why is it that people feel this way at the mention of assessment? I think the first problem is that people dont really understand what is meant (or should be meant) by assessment. A second issue could be that they have had fairly bad experiences in the past and this has an influence on them. And, thirdly, it could be that assessment is often seen as a pass or fail thing and nobody likes to fail. Anchor Point:2So, what do we mean by assessment? assessment, noun [U] the process of making a judgement or forming an opinion, after considering something or someone carefully I think the most interesting thing here is the word process. The purpose of most forms of assessment in the English Language classroom should be to inform people of how much progress a student is making. Assessment can take many different forms and does not need to be limited to tests and exams. Here are two types of assessment: 1. Activity assessment a) Did you like that activity? b) Was that activity easy or difficult? c) What was the hardest part of that? d) Was the activity useful? How? Why? 2. Self-assessment a) Now I can b) I still need to work on c) Ive improved in d) Today I learnt e) In the test I got X and Y wrong. Im going to study these for homework. As you can see, the onus here is on the students to think about what theyve done. Unlike tests which are handed out, collected in and marked by a teacher and then handed back, these forms of assessment are about the process of learning rather than only on the product. Anchor Point:3Does this mean that tests are not a valid form of assessment? No, not at all. But they are not the only form of assessment. If students only think of assessment in terms of a formal test or exam then it is likely that they will have negative feelings towards the idea of assessment. Its also important to emphasize that assessment shouldnt be about how good or bad someone is at a particular point in time, it should be about the progress they have made, the work theyve put in and the learning that has taken place. In other words, it should be about the process of learning and not simply the results. One thing that is quite useful to do with formal tests is to actually analyze the process as well as the product. Here are a couple of ideas that can be used for this purpose: 1. After collecting in the test, hand out a blank copy to each student. Ask them to look at the test and a) say how well they think they did on each particular question; b) say which questions were easy, ok, difficult; and, c) say what score they think they got. Then, when you hand back the marked tests ask them to compare their thoughts to the actual test, i.e. Did they get the questions right that they thought they had?, etc. 2. After collecting in the test, hand out a blank copy to each student. Ask them to look at the test, choose two questions and tell a partner how they worked out the answer. Anchor Point:4When should assessment take place? The simple answer is that it should take place at every stage of the learning process and that it should be fairly frequent. Of course, there are many different forms of assessment. So, at the start of a course some form of diagnostic assessment should take place to see how much students know. This can then be used as a form of benchmark used later on to see how much progress has been made. Throughout a course various forms of assessment can be used, from homework, project work, in class activities to more formal tests. If you are required to give students a certain number of tests each year say three then one thing you could do is give them five and tell them that only the best three will be used. This kind of flexibility not only helps students be a little less worried but also takes into account that people have bad days sometimes. In fact, we will see this idea of selection again when we look at portfolios. Anchor Point:5Helping students become comfortable One of our first tasks as a teacher has got to be to help our students become more comfortable with the idea of assessment. Because assessment often has a negative connotation and is equated with tests, passing, failing and scores, this can be quite a challenge. But if we can make our students understand that assessment is actually beneficial then it will make the whole process easier. Here are a few simple ideas aimed at achieving this: 1. Talk about assessment with your students. a) What is assessment? b) Why do we assess students? c) How are we going to assess them? d) What are the criteria used? Are these criteria clear? 2. Get students involved in assessment. a) Use self-assessment, i.e. Can do statements. b) Use peer assessment. c) Get students to come up with assessment criteria / agree criteria with students. d) Get students involved in picking or designing assessment tasks. 3. Make assessment part of the teaching and learning process. a) If you can build in a form of assessment regularly, maybe even every lesson, then your students will become used to it and therefore more comfortable. b) Make sure you include the results of any assessment into your teaching. For example, if students have a particular problem with an aspect of grammar then go back over the grammar in a lesson making it clear that you are doing this because it was identified as a problem from the assessment. If students can see that you actually take notice of the assessment, and not simply the score, it will become more meaningful and positive for them Well give more specific ideas in some of the subsequent articles. However, the key here is to make students see assessment as part of the teaching and learning process that has a direct influence on what is taught. If students understand that assessment is about the process and not simply about a product (i.e. a score), then they will start to have a more positive attitude towards it. Anchor Point:6And finally In this series of articles well take a closer look at the following areas of assessment: Diagnostic tests Portfolios 'Can do' statements, self-assessment and peer assessment Assessing skills Assessing tasks and lessons Preparing students for tests and exams Assessing Young Learners
Whats the difference between formative and summative evaluations? In formative evaluation, programs or projects are typically assessed during their development or early implementation to provide information about how best to revise and modify for improvement. This type of evaluation often is helpful for pilot projects and new programs, but can be used for progress monitoring of ongoing programs. In summative evaluation, programs or projects are assessed at the end of an operating cycle, and findings typically are used to help decide whether a program should be adopted, continued, or modified for improvement. Both evaluation methods are recommended for use, when possible, to provide program staff with ongoing feedback for program modifications (formative) as well as periodic review of long-term progress on major program goals and objectives (summative), and to meet regular reporting requirements (e.g., for a grantor, agency, or organizational manager).