P. 1
Evaluating Training - What You Need to Know: Definitions, Best Practices, Benefits and Practical Solutions

Evaluating Training - What You Need to Know: Definitions, Best Practices, Benefits and Practical Solutions

|Views: 1.406|Likes:
Publicado porEmereo Publishing
The evaluation of a training curriculum can focus on many different results in order to determine its effectiveness, such as the participant's increased knowledge and behavior change, as well as the impact of training on client and agency outcomes. There are also a variety of different research methods and strategies that can be used to evaluate the effectiveness of a curriculum.

This book is your one-stop, ultimate resource for Evaluating Training. Here you will find the most up-to-date information, analysis, background and everything you need to know.

In easy to read chapters, with extensive references and links covering all aspects of Evaluating Training: Evaluation, 100 Best Workplaces in Europe, Academic equivalency evaluation, Accountability, Accuracy and precision, American Evaluation Association, Australian Drug Evaluation Committee, BAPCo consortium, Career portfolio, Careerscope, CESG Claims Tested Mark, Commercial Product Assurance, Common Criteria, Common Criteria Testing Laboratory, Competency evaluation, Continuous assessment, Cryptographic Module Testing Laboratory, Defence Evaluation and Research Agency, Stewart Donaldson, Ecological indicator, Educational assessment, Educational evaluation, Encomium, Evaluation approaches, Evaluation Assurance Level, Expression (mathematics), Expression (computer science), Formation evaluation, Formative assessment, Formative evaluation, General Learning Ability, Goddard College, Graphical Evaluation and Review Technique, Health technology assessment, Immanent evaluation, Impact assessment, Impact evaluation, Integrity, International Association for the Evaluation of Educational Achievement, Joint Committee on Standards for Educational Evaluation, Knowledge survey, Leadership accountability, Narrative evaluation, Natural experiment, Operations Evaluation Department, Pearson Assessment & Information, Princeton Application Repository for Shared-Memory Computers, Problem, Program evaluation, Program Evaluation and Review Technique, Quality (business), Quality assurance, Quantitative risk assessment software, Recognition (sociology), Registration, Evaluation, Authorisation and Restriction of Chemicals, Review, Risk assessment, Risk Matrix, Scale of one to ten, SDET, Self-evaluation motives, Shifting baseline, SPECpower, Standard Performance Evaluation Corporation, Summative assessment, The Sunday Times 100 Best Companies to Work For, Teaching And Learning International Survey, Technology assessment, Transferable skills analysis, Voting, World Bank's Inspection Panel, XTS-400, Accelerated aging, Adaptive comparative judgement, Alternative assessment, Aptitude, Axiomatic design, Axiomatic product development lifecycle, Behavioral Risk Factor Surveillance System, Between-group design, British Polling Council, Business excellence, Case series, Case study, Central composite design, Challenge-dechallenge-rechallenge, Check weigher, Class rank, Clerk of Works, Clinical trial, Cohort study, Common-cause and special-cause, Component-Based Usability Testing, Computer-based assessment, Conformity assessment, Consensus decision-making, Consensus-seeking decision-making, Content analysis, Context analysis, Contingent valuation, Control limits, Cost-benefit analysis, Creative participation, Critical appraisal, Critical to quality, CTQ tree, Cytel, Design Science, Destructive testing, Digital strategy, E-assessment, Economic appraisal, Economic impact analysis, Eddy Test, Educational accreditation, Higher education accreditation, Eightfold Path (policy analysis)...and much more

This book explains in-depth the real drivers and workings of Evaluating Training. It reduces the risk of your time and resources investment decisions by enabling you to compare your understanding of Evaluating Training with the objectivity of experienced professionals.

Unique, authoritative, and wide-ranging, it offers practical and strategic advice for managers, business owners and students worldwide.
The evaluation of a training curriculum can focus on many different results in order to determine its effectiveness, such as the participant's increased knowledge and behavior change, as well as the impact of training on client and agency outcomes. There are also a variety of different research methods and strategies that can be used to evaluate the effectiveness of a curriculum.

This book is your one-stop, ultimate resource for Evaluating Training. Here you will find the most up-to-date information, analysis, background and everything you need to know.

In easy to read chapters, with extensive references and links covering all aspects of Evaluating Training: Evaluation, 100 Best Workplaces in Europe, Academic equivalency evaluation, Accountability, Accuracy and precision, American Evaluation Association, Australian Drug Evaluation Committee, BAPCo consortium, Career portfolio, Careerscope, CESG Claims Tested Mark, Commercial Product Assurance, Common Criteria, Common Criteria Testing Laboratory, Competency evaluation, Continuous assessment, Cryptographic Module Testing Laboratory, Defence Evaluation and Research Agency, Stewart Donaldson, Ecological indicator, Educational assessment, Educational evaluation, Encomium, Evaluation approaches, Evaluation Assurance Level, Expression (mathematics), Expression (computer science), Formation evaluation, Formative assessment, Formative evaluation, General Learning Ability, Goddard College, Graphical Evaluation and Review Technique, Health technology assessment, Immanent evaluation, Impact assessment, Impact evaluation, Integrity, International Association for the Evaluation of Educational Achievement, Joint Committee on Standards for Educational Evaluation, Knowledge survey, Leadership accountability, Narrative evaluation, Natural experiment, Operations Evaluation Department, Pearson Assessment & Information, Princeton Application Repository for Shared-Memory Computers, Problem, Program evaluation, Program Evaluation and Review Technique, Quality (business), Quality assurance, Quantitative risk assessment software, Recognition (sociology), Registration, Evaluation, Authorisation and Restriction of Chemicals, Review, Risk assessment, Risk Matrix, Scale of one to ten, SDET, Self-evaluation motives, Shifting baseline, SPECpower, Standard Performance Evaluation Corporation, Summative assessment, The Sunday Times 100 Best Companies to Work For, Teaching And Learning International Survey, Technology assessment, Transferable skills analysis, Voting, World Bank's Inspection Panel, XTS-400, Accelerated aging, Adaptive comparative judgement, Alternative assessment, Aptitude, Axiomatic design, Axiomatic product development lifecycle, Behavioral Risk Factor Surveillance System, Between-group design, British Polling Council, Business excellence, Case series, Case study, Central composite design, Challenge-dechallenge-rechallenge, Check weigher, Class rank, Clerk of Works, Clinical trial, Cohort study, Common-cause and special-cause, Component-Based Usability Testing, Computer-based assessment, Conformity assessment, Consensus decision-making, Consensus-seeking decision-making, Content analysis, Context analysis, Contingent valuation, Control limits, Cost-benefit analysis, Creative participation, Critical appraisal, Critical to quality, CTQ tree, Cytel, Design Science, Destructive testing, Digital strategy, E-assessment, Economic appraisal, Economic impact analysis, Eddy Test, Educational accreditation, Higher education accreditation, Eightfold Path (policy analysis)...and much more

This book explains in-depth the real drivers and workings of Evaluating Training. It reduces the risk of your time and resources investment decisions by enabling you to compare your understanding of Evaluating Training with the objectivity of experienced professionals.

Unique, authoritative, and wide-ranging, it offers practical and strategic advice for managers, business owners and students worldwide.

More info:

Published by: Emereo Publishing on Aug 18, 2011
Direitos Autorais:Traditional Copyright: All rights reserved
Preço de Lista:$39.95


Read on Scribd mobile: iPhone, iPad and Android.
This book can be read on up to 6 mobile devices.
Full version available to members
See more
See less



What You Need to Know:
Definitions, Best Practices, Benefits and Practical Solutions
In-depth: the real drivers and workings
Reduces the risk of your time and resources investment decisions
Feed your understanding with the objectivity of experienced Professionals
The evaluation of a training curriculum can focus on many different results in order to determine its ef-
fectiveness, such as the participant’s increased knowledge and behavior change, as well as the impact
of training on client and agency outcomes. There are also a variety of different research methods and
strategies that can be used to evaluate the effectiveness of a curriculum.
This book is your one-stop, ultimate resource for Evaluating Training. Here you will find the most up-to-
date information, analysis, background and everything you need to know.
In easy to read chapters, with extensive references and links covering all aspects of Evaluating Training:
Evaluation, 100 Best Workplaces in Europe, Academic equivalency evaluation, Accountability, Accuracy
and precision, American Evaluation Association, Australian Drug Evaluation Committee, BAPCo consor-
tium, Career portfolio, Careerscope, CESG Claims Tested Mark, Commercial Product Assurance, Com-
mon Criteria, Common Criteria Testing Laboratory, Competency evaluation, Continuous assessment,
Cryptographic Module Testing Laboratory, Defence Evaluation and Research Agency, Stewart Donaldson,
Ecological indicator, Educational assessment, Educational evaluation, Encomium, Evaluation approaches,
Evaluation Assurance Level, Expression (mathematics), Expression (computer science), Formation eval-
uation, Formative assessment, Formative evaluation, General Learning Ability, Goddard College, Graphi-
cal Evaluation and Review Technique, Health technology assessment, Immanent evaluation, Impact
assessment, Impact evaluation, Integrity, International Association for the Evaluation of Educational
Achievement, Joint Committee on Standards for Educational Evaluation, Knowledge survey, Leadership
accountability, Narrative evaluation, Natural experiment, Operations Evaluation Department, Pearson
Assessment & Information, Princeton Application Repository for Shared-Memory Computers, Problem,
Program evaluation, Program Evaluation and Review Technique, Quality (business), Quality assurance,
Quantitative risk assessment software, Recognition (sociology), Registration, Evaluation, Authorisation
and Restriction of Chemicals, Review, Risk assessment, Risk Matrix, Scale of one to ten, SDET, Self-
evaluation motives, Shifting baseline, SPECpower, Standard Performance Evaluation Corporation, Sum-
mative assessment, The Sunday Times 100 Best Companies to Work For, Teaching And Learning Inter-
national Survey, Technology assessment, Transferable skills analysis, Voting, World Bank’s Inspection
Panel, XTS-400, Accelerated aging, Adaptive comparative judgement, Alternative assessment, Aptitude,
Axiomatic design, Axiomatic product development lifecycle, Behavioral Risk Factor Surveillance Sys-
tem, Between-group design, British Polling Council, Business excellence, Case series, Case study, Cen-
tral composite design, Challenge-dechallenge-rechallenge, Check weigher, Class rank, Clerk of Works,
Clinical trial, Cohort study, Common-cause and special-cause, Component-Based Usability Testing,
Computer-based assessment, Conformity assessment, Consensus decision-making, Consensus-seeking
decision-making, Content analysis, Context analysis, Contingent valuation, Control limits, Cost-benefit
analysis, Creative participation, Critical appraisal, Critical to quality, CTQ tree, Cytel, Design Science,
Destructive testing, Digital strategy, E-assessment, Economic appraisal, Economic impact analysis, Eddy
Test, Educational accreditation, Higher education accreditation, Eightfold Path (policy analysis)...and
much more
This book explains in-depth the real drivers and workings of Evaluating Training. It reduces the risk of
your time and resources investment decisions by enabling you to compare your understanding of Evalu-
ating Training with the objectivity of experienced professionals.
Unique, authoritative, and wide-ranging, it offers practical and strategic advice for managers, business
owners and students worldwide.

Topic relevant selected content from the highest rated entries, typeset, printed and
Combine the advantages of up-to-date and in-depth knowledge with the
convenience of printed books.
A portion of the proceeds of each book will be donated to the Wikimedia Foundation
to support their mission: to empower and engage people around the world to collect
and develop educational content under a free license or in the public domain, and to
disseminate it effectively and globally.
The content within this book was generated collaboratively by volunteers. Please be
advised that nothing found here has necessarily been reviewed by people with the
expertise required to provide you with complete, accurate or reliable information.
Some information in this book maybe misleading or simply wrong. The publisher
does not guarantee the validity of the information found here. If you need specifc
advice (for example, medical, legal, fnancial, or risk management) please seek a
professional who is licensed or knowledgeable in that area.
Sources, licenses and contributors of the articles and images are listed in the
section entitled “References”. Parts of the books may be licensed under the GNU
Free Documentation License. A copy of this license is included in the section entitled
“GNU Free Documentation License”
All used third-party trademarks belong to their respective owners.
Evaluation 1
100 Best Workplaces in Europe 8
Academic equivalency evaluation 9
Accountability 10
Accuracy and precision 15
American Evaluation Association 19
Australian Drug Evaluation Committee 21
BAPCo consortium 22
Career portfolio 22
Careerscope 24
CESG Claims Tested Mark 25
Commercial Product Assurance 26
Common Criteria 27
Common Criteria Testing Laboratory 33
Competency evaluation 34
Continuous assessment 35
Cryptographic Module Testing Laboratory 35
Defence Evaluation and Research Agency 36
Stewart Donaldson 37
Ecological indicator 39
Educational assessment 41
Educational evaluation 49
Encomium 52
Evaluation approaches 53
Evaluation Assurance Level 58
Expression (mathematics) 62
Expression (computer science) 64
Formation evaluation 65
Formative assessment 70
Formative evaluation 77
General Learning Ability 78
Goddard College 79
Graphical Evaluation and Review Technique 85
Health technology assessment 85
Immanent evaluation 86
Impact assessment 87
Impact evaluation 87
Integrity 97
International Association for the Evaluation of Educational Achievement 103
Joint Committee on Standards for Educational Evaluation 105
Knowledge survey 108
Leadership accountability 109
Narrative evaluation 110
Natural experiment 112
Operations Evaluation Department 114
Pearson Assessment & Information 115
Princeton Application Repository for Shared-Memory Computers 118
Problem 120
Program evaluation 121
Program Evaluation and Review Technique 130
Quality (business) 136
Quality assurance 140
Quantitative risk assessment software 146
Recognition (sociology) 147
Registration, Evaluation, Authorisation and Restriction of Chemicals 149
Review 153
Risk assessment 155
Risk Matrix 161
Scale of one to ten 163
SDET 164
Self-evaluation motives 165
Shifting baseline 167
SPECpower 169
Standard Performance Evaluation Corporation 170
Summative assessment 173
The Sunday Times 100 Best Companies to Work For 174
Teaching And Learning International Survey 174
Technology assessment 175
Transferable skills analysis 177
Voting 180
World Bank's Inspection Panel 183
XTS-400 186
Accelerated aging 190
Adaptive comparative judgement 193
Alternative assessment 196
Aptitude 197
Axiomatic design 198
Axiomatic product development lifecycle 199
Behavioral Risk Factor Surveillance System 201
Between-group design 202
British Polling Council 204
Business excellence 206
Case series 208
Case study 208
Central composite design 213
Challenge-dechallenge-rechallenge 214
Check weigher 216
Class rank 220
Clerk of Works 221
Clinical trial 223
Cohort study 239
Common-cause and special-cause 242
Component-Based Usability Testing 248
Computer-based assessment 250
Conformity assessment 251
Consensus decision-making 252
Consensus-seeking decision-making 266
Content analysis 267
Context analysis 272
Contingent valuation 279
Control limits 281
Cost-benefit analysis 282
Creative participation 287
Critical appraisal 288
Critical to quality 288
CTQ tree 289
Cytel 290
Design Science 293
Destructive testing 296
Digital strategy 297
E-assessment 301
Economic appraisal 304
Economic impact analysis 305
Eddy Test 306
Educational accreditation 308
Higher education accreditation 309
Eightfold Path (policy analysis) 315
Electronic patient-reported outcome 315
Embedded case study 318
Ethnography 319
Event correlation 326
Experiment 329
Experimental research design 337
Expertise finding 339
Factorial experiment 343
Feasibility study 346
Field experiment 348
Field research 350
Field work 351
First pass yield 352
First-in-man study 353
Fixtureless in-circuit test 353
Force field analysis 354
Gender Evaluation Methodology 355
Grade (education) 357
Grading on a curve 386
Harlan Hanson 388
Hoshin Kanri 391
In-basket test 393
Inquiry 393
Institutional Learning and Change Initiative 403
International Baccalaureate 404
Interview 409
Lean Integration 411
Least cost planning methodology 413
Lexis ratio 414
Logic model 415
Meta-analysis 422
Multiple mini interview 428
Naturalistic observation 430
Non-response bias 431
Nondestructive testing 432
Observational techniques 442
Paid survey 443
Participant observation 444
Personal management interview 446
Physical test 447
Pick chart 448
Pilot experiment 449
Placebo-controlled study 450
Policy analysis 458
Poll average 464
Process Optimization, Standardization and Innovation Technique 465
Position-specific scoring matrix 467
Process improvement 469
Proof of concept 469
Provocation test 471
Qualitative research 471
Quality audit 476
Quantitative research 477
Quasi-experiment 480
Question focused dataset 481
Questionnaire 482
Questionnaire construction 485
Random digit dialing 489
Rating (pharmaceutical industry) 490
Reference class forecasting 490
Repeated measures design 492
Risk-benefit analysis 494
Rolled throughput yield 495
Rsl testing 495
Rubric (academic) 496
SAT Subject Tests 499
Segal–Cover score 502
Self- and Peer-Assessment 503
Self-assessment 507
Separation test 509
Single-subject design 510
Single-subject research 513
Six Sigma 514
Spelling test 523
Statistical process control 524
Statistical survey 528
Statistics 533
Strengths and Difficulties Questionnaire 542
Stress testing 543
Structured interview 546
Student Achievement and School Accountability Programs 547
Terahertz nondestructive evaluation 549
Test management 551
Test method 553
Test-retest 555
Trained panel 556
Transformative assessment 556
Triangulation (social science) 558
U3 tool 559
Usability testing 561
Video ethnography 565
Article Sources and Contributors 567
Image Sources, Licenses and Contributors 579
Article Licenses
License 581
Evaluation is systematic determination of merit, worth, and significance of something or someone using criteria
against a set of standards.
Evaluation often is used to characterize and appraise subjects of interest in a wide range of human enterprises,
including the arts, criminal justice, foundations and non-profit organizations, government, health care, and other
human services.
Evaluation is the comparison of actual impacts against strategic plans. It looks at original objectives, at what was
accomplished and how it was accomplished. It can be formative, that is taking place during the life of a project or
organisation, with the intention of improving the strategy or way of functioning of the project or organisation. It can
also be summative, drawing lessons from a completed project or an organisation that is no longer functioning.
Evaluation is inherently a theoretically informed approach (whether explicitly or not), and consequently a definition
of evaluation would have be tailored to the theory, approach, needs, purpose and methodology of the evaluation
itself. Having said this, evaluation has been defined as:
• A systematic, rigorous, and meticulous application of scientific methods to assess the design, implementation,
improvement or outcomes of a program. It is a resource-intensive process, frequently requiring resources, such as,
evaluator expertise, labour, time and a sizeable budget
• 'The critical assessment, in as objective a manner as possible, of the degree to which a service or its component
parts fulfils stated goals' (St Leger and Walsworth-Bell).
The focus of this definition is on attaining objective
knowledge, and scientifically or quantitatively measuring predetermined and external concepts.
• 'A study designed to assist some audience to assess an object’s merit and worth' (Shufflebeam).
In this
definition the focus is on facts as well as value laden judgements of the programs outcomes and worth.
The main purpose of a program evaluation can be to "determine the quality of a program by formulating a
judgment" Stake and Schwandt (2006).
An alternative view is that "projects, evaluators and other stakeholders (including funders) will all have potentially
different ideas about how best to evaluate a project since each may have a different definition of ‘merit’. The core of
the problem is thus about defining what is of value."
From this perspective, evaluation "is a contested term", as
"evaluators" use the term evaluation to describe an assessment, or investigation of a program whilst others simply
understand evaluation as being synonymous with applied research.
Not all evaluations serve the same purpose some evaluations serve a monitoring function rather than focusing
solely on measurable program outcomes or evaluation findings and a full list of types of evaluations would be
difficult to compile.
This is because evaluation is not part of a unified theoretical framework,
drawing on a
number of disciplines, which include management and organisational theory, policy analysis, education, sociology,
social anthropology, and social change.
Within the last three decades there have been tremendous theoretical and methodological developments within the
field of evaluation.
Despite its progress, there are still many fundamental problems faced by this field as "unlike
medicine, evaluation is not a discipline that has been developed by practicing professionals over thousands of years,
so we are not yet at the stage where we have huge encyclopaedias that will walk us through any evaluation
step-by-step", or provide a clear definition of what evaluation entails (Davidson, 2005).
It could therefore be
argued that a key problem that evaluators face is the lack of a clear definition of evaluation, which may "underline
why program evaluation is periodically called into question as an original process, whose primary function is the
production of legitimate and justified judgments which serve as the bases for relevant recommendations."
However, the strict adherence to a set of methodological assumptions may make the field of evaluation more
acceptable to a mainstream audience but this adherence will work towards preventing evaluators from developing
new strategies for dealing with the myriad problems that programs face.
It is claimed that only a minority of evaluation reports are used by the evaluand (client) (Datta, 2006).
justification of this is that "when evaluation findings are challenged or utilization has failed, it was because
stakeholders and clients found the inferences weak or the warrants unconvincing" (Fournier and Smith, 1993).
Some reasons for this situation may be the failure of the evaluator to establish a set of shared aims with the evaluand,
or creating overly ambitious aims, as well as failing to compromise and incorporate the cultural differences of
individuals and programs within the evaluation aims and process.
None of these problems are due to a lack of a definition of evaluation but are rather due to evaluators attempting to
impose predisposed notions and definitions of evaluations on clients. The central reason for the poor utilization of
evaluations is arguably due to the lack of tailoring of evaluations to suit the needs of the client, due to a predefined
idea (or definition) of what an evaluation is rather than what the client needs are (House, 1980).
Depending on the topic of interest, there are professional groups which look to the quality and rigor of the evaluation
The Joint Committee on Standards for Educational Evaluation
has developed standards for program, personnel,
and student evaluation. The Joint Committee standards are broken into four sections: Utility, Feasibility, Propriety,
and Accuracy. Various European institutions have also prepared their own standards, more or less related to those
produced by the Joint Committee. They provide guidelines about basing value judgments on systematic inquiry,
evaluator competence and integrity, respect for people, and regard for the general and public welfare.
The American Evaluation Association has created a set of Guiding Principles
for evaluators. The order of these
principles does not imply priority among them; priority will vary by situation and evaluator role. The principles run
as follows:
• Systematic Inquiry: Evaluators conduct systematic, data-based inquiries about whatever is being evaluated.
• Competence: Evaluators provide competent performance to stakeholders.
• Integrity / Honesty: Evaluators ensure the honesty and integrity of the entire evaluation process.
• Respect for People: Evaluators respect the security, dignity and self-worth of the respondents, program
participants, clients, and other stakeholders with whom they interact.
• Responsibilities for General and Public Welfare: Evaluators articulate and take into account the diversity of
interests and values that may be related to the general and public welfare.
Furthermore, the international organizations such as the I.M.F. and the World Bank have independent evaluation
functions. The various funds, programmes, and agencies of the United Nations has a mix of independent,
semi-independent and self-evaluation functions, which have organized themselves as a system-wide UN Evaluation
Group (UNEG),
that works together to strengthen the function, and to establish UN norms and standards for
evaluation. There is also an evaluation group within the OECD-DAC, which endeavors to improve development
evaluation standards.
Evaluation approaches are conceptually distinct ways of thinking about, designing and conducting evaluation efforts.
Many of the evaluation approaches in use today make truly unique contributions to solving important problems,
while others refine existing approaches in some way.
Classification of approaches
Two classifications of evaluation approaches by House
and Stufflebeam & Webster
can be combined into a
manageable number of approaches in terms of their unique and important underlying principles.
House considers all major evaluation approaches to be based on a common ideology, liberal democracy. Important
principles of this ideology include freedom of choice, the uniqueness of the individual, and empirical inquiry
grounded in objectivity. He also contends they are all based on subjectivist ethics, in which ethical conduct is based
on the subjective or intuitive experience of an individual or group. One form of subjectivist ethics is utilitarian, in
which “the good” is determined by what maximizes some single, explicit interpretation of happiness for society as a
whole. Another form of subjectivist ethics is intuitionist / pluralist, in which no single interpretation of “the good” is
assumed and these interpretations need not be explicitly stated nor justified.
These ethical positions have corresponding epistemologies—philosophies of obtaining knowledge. The objectivist
epistemology is associated with the utilitarian ethic. In general, it is used to acquire knowledge capable of external
verification (intersubjective agreement) through publicly inspectable methods and data. The subjectivist
epistemology is associated with the intuitionist/pluralist ethic. It is used to acquire new knowledge based on existing
personal knowledge and experiences that are (explicit) or are not (tacit) available for public inspection.
House further divides each epistemological approach by two main political perspectives. Approaches can take an
elite perspective, focusing on the interests of managers and professionals. They also can take a mass perspective,
focusing on consumers and participatory approaches.
Stufflebeam and Webster place approaches into one of three groups according to their orientation toward the role of
values, an ethical consideration. The political orientation promotes a positive or negative view of an object
regardless of what its value actually and might be. They call this pseudo-evaluation. The questions orientation
includes approaches that might or might not provide answers specifically related to the value of an object. They call
this quasi-evaluation. The values orientation includes approaches primarily intended to determine the value of some
object. They call this true evaluation.
When the above concepts are considered simultaneously, fifteen evaluation approaches can be identified in terms of
epistemology, major perspective (from House), and orientation.
Two pseudo-evaluation approaches, politically
controlled and public relations studies, are represented. They are based on an objectivist epistemology from an elite
perspective. Six quasi-evaluation approaches use an objectivist epistemology. Five of them—experimental research,
management information systems, testing programs, objectives-based studies, and content analysis—take an elite
perspective. Accountability takes a mass perspective. Seven true evaluation approaches are included. Two
approaches, decision-oriented and policy studies, are based on an objectivist epistemology from an elite perspective.
Consumer-oriented studies are based on an objectivist epistemology from a mass perspective. Two
approaches—accreditation/certification and connoisseur studies—are based on a subjectivist epistemology from an
elite perspective. Finally, adversary and client-centered studies are based on a subjectivist epistemology from a mass
Summary of approaches
The following table is used to summarize each approach in terms of four attributes—organizer, purpose, strengths,
and weaknesses. The organizer represents the main considerations or cues practitioners use to organize a study. The
purpose represents the desired outcome for a study at a very general level. Strengths and weaknesses represent other
attributes that should be considered when deciding whether to use the approach for a particular study. The following
narrative highlights differences between approaches grouped together.
Summary of approaches for conducting evaluations
Approach Attribute
Organizer Purpose Key strengths Key weaknesses
Politically controlled Threats Get, keep or increase influence,
power or money.
Secure evidence advantageous
to the client in a conflict.
Violates the principle of full &
frank disclosure.
Public relations Propaganda needs Create positive public image. Secure evidence most likely to
bolster public support.
Violates the principles of balanced
reporting, justified conclusions, &
Determine causal relationships
between variables.
Strongest paradigm for
determining causal
Requires controlled setting, limits
range of evidence, focuses
primarily on results.
information systems
Continuously supply evidence
needed to fund, direct, &
control programs.
Gives managers detailed
evidence about complex
Human service variables are rarely
amenable to the narrow,
quantitative definitions needed.
Testing programs Individual
Compare test scores of
individuals & groups to
selected norms.
Produces valid & reliable
evidence in many performance
areas. Very familiar to public.
Data usually only on testee
performance, overemphasizes
test-taking skills, can be poor
sample of what is taught or
Objectives-based Objectives Relates outcomes to objectives. Common sense appeal, widely
used, uses behavioral
objectives & testing
Leads to terminal evidence often
too narrow to provide basis for
judging the value of a program.
Content analysis Content of a
Describe & draw conclusion
about a communication.
Allows for unobtrusive
analysis of large volumes of
unstructured, symbolic
Sample may be unrepresentative
yet overwhelming in volume.
Analysis design often overly
simplistic for question.
Accountability Performance
Provide constituents with an
accurate accounting of results.
Popular with constituents.
Aimed at improving quality of
products and services.
Creates unrest between
practitioners & consumers. Politics
often forces premature studies.
Decision-oriented Decisions Provide a knowledge & value
base for making & defending
Encourages use of evaluation
to plan & implement needed
programs. Helps justify
decisions about plans &
Necessary collaboration between
evaluator & decision-maker
provides opportunity to bias
Policy studies Broad issues Identify and assess potential
costs & benefits of competing
Provide general direction for
broadly focused actions.
Often corrupted or subverted by
politically motivated actions of
Consumer-oriented Generalized needs
& values, effects
Judge the relative merits of
alternative goods & services.
Independent appraisal to
protect practitioners &
consumers from shoddy
products & services. High
public credibility.
Might not help practitioners do a
better job. Requires credible &
competent evaluators.
Accreditation /
Standards &
Determine if institutions,
programs, & personnel should
be approved to perform
specified functions.
Helps public make informed
decisions about quality of
organizations & qualifications
of personnel.
Standards & guidelines typically
emphasize intrinsic criteria to the
exclusion of outcome measures.
Connoisseur Critical guideposts Critically describe, appraise, &
illuminate an object.
Exploits highly developed
expertise on subject of interest.
Can inspire others to more
insightful efforts.
Dependent on small number of
experts, making evaluation
susceptible to subjectivity, bias,
and corruption.
Adversary “Hot” issues Present the pro & cons of an
Ensures balances presentations
of represented perspectives.
Can discourage cooperation,
heighten animosities.
Client-centered Specific concerns
& issues
Foster understanding of
activities & how they are
valued in a given setting &
from a variety of perspectives.
Practitioners are helped to
conduct their own evaluation.
Low external credibility,
susceptible to bias in favor of
Note. Adapted and condensed primarily from House (1978) and Stufflebeam & Webster (1980).
Politically controlled and public relations studies are based on an objectivist epistemology from an elite perspective.
Although both of these approaches seek to misrepresent value interpretations about some object, they go about it a
bit differently. Information obtained through politically controlled studies is released or withheld to meet the special
interests of the holder.
Public relations information is used to paint a positive image of an object regardless of the actual situation. Neither
of these approaches is acceptable evaluation practice, although the seasoned reader can surely think of a few
examples where they have been used.
Objectivist, elite, quasi-evaluation
As a group, these five approaches represent a highly respected collection of disciplined inquiry approaches. They are
considered quasi-evaluation approaches because particular studies legitimately can focus only on questions of
knowledge without addressing any questions of value. Such studies are, by definition, not evaluations. These
approaches can produce characterizations without producing appraisals, although specific studies can produce both.
Each of these approaches serves its intended purpose well. They are discussed roughly in order of the extent to
which they approach the objectivist ideal.
Experimental research is the best approach for determining causal relationships between variables. The potential
problem with using this as an evaluation approach is that its highly controlled and stylized methodology may not be
sufficiently responsive to the dynamically changing needs of most human service programs.
Management information systems (MISs) can give detailed information about the dynamic operations of complex
programs. However, this information is restricted to readily quantifiable data usually available at regular intervals.
Testing programs are familiar to just about anyone who has attended school, served in the military, or worked for a
large company. These programs are good at comparing individuals or groups to selected norms in a number of
subject areas or to a set of standards of performance. However, they only focus on testee performance and they might
not adequately sample what is taught or expected.
Objectives-based approaches relate outcomes to prespecified objectives, allowing judgments to be made about their
level of attainment. Unfortunately, the objectives are often not proven to be important or they focus on outcomes too
narrow to provide the basis for determining the value of an object.
Content analysis is a quasi-evaluation approach because content analysis judgments need not be based on value
statements. Instead, they can be based on knowledge. Such content analyses are not evaluations. On the other hand,
when content analysis judgments are based on values, such studies are evaluations.
Objectivist, mass, quasi-evaluation
Accountability is popular with constituents because it is intended to provide an accurate accounting of results that
can improve the quality of products and services. However, this approach quickly can turn practitioners and
consumers into adversaries when implemented in a heavy-handed fashion.
Objectivist, elite, true evaluation
Decision-oriented studies are designed to provide a knowledge base for making and defending decisions. This
approach usually requires the close collaboration between an evaluator and decision-maker, allowing it to be
susceptible to corruption and bias.
Policy studies provide general guidance and direction on broad issues by identifying and assessing potential costs
and benefits of competing policies. The drawback is these studies can be corrupted or subverted by the politically
motivated actions of the participants.
Objectivist, mass, true evaluation
Consumer-oriented studies are used to judge the relative merits of goods and services based on generalized needs
and values, along with a comprehensive range of effects. However, this approach does not necessarily help
practitioners improve their work, and it requires a very good and credible evaluator to do it well.
Subjectivist, elite, true evaluation
Accreditation / certification programs are based on self-study and peer review of organizations, programs, and
personnel. They draw on the insights, experience, and expertise of qualified individuals who use established
guidelines to determine if the applicant should be approved to perform specified functions. However, unless
performance-based standards are used, attributes of applicants and the processes they perform often are
overemphasized in relation to measures of outcomes or effects.
Connoisseur studies use the highly refined skills of individuals intimately familiar with the subject of the evaluation
to critically characterize and appraise it. This approach can help others see programs in a new light, but it is difficult
to find a qualified and unbiased connoisseur.
Subjectivist, mass, true evaluation
The adversary approach focuses on drawing out the pros and cons of controversial issues through quasi-legal
proceedings. This helps ensure a balanced presentation of different perspectives on the issues, but it is also likely to
discourage later cooperation and heighten animosities between contesting parties if “winners” and “losers” emerge.
Client-centered studies address specific concerns and issues of practitioners and other clients of the study in a
particular setting. These studies help people understand the activities and values involved from a variety of
perspectives. However, this responsive approach can lead to low external credibility and a favorable bias toward
those who participated in the study.
Methods and techniques
Evaluation is methodologically diverse using both qualitative methods and quantitative methods, including case
studies, survey research, statistical analysis, and model building among others. A more detailed list of methods,
techniques and approaches for conducting evaluations would include the following:
• Accelerated aging • Data mining • Marketing research • Quality Management
• Action research • Delphi Technique • Meta-analysis • Quantitative research
• Advanced Product Quality
• Design Focused
• Metrics • Questionnaire
• Alternative assessment • Discourse analysis • Multivariate statistics • Questionnaire construction
• Appreciative Inquiry • Educational accreditation • Naturalistic observation • Root cause analysis
• Assessment • Electronic portfolio • Observational techniques • Rubrics
• Axiomatic design • Environmental scanning • Opinion polling • Sampling
• Benchmarking • Ethnography • Organizational learning • Self-assessment
• Case study • Experiment • Outcome mapping • Six Sigma
• Change management • Experimental techniques • Participant observation • Standardized testing
• Clinical trial • Factor analysis • Participatory Impact Pathways
• Statistical process control
• Cohort study • Factorial experiment • Policy analysis • Statistical survey
• Competitor analysis • Feasibility study • Post occupancy evaluation • Statistics
• Consensus decision-making • Field experiment • Process improvement • Strategic planning
• Consensus-seeking
• Fixtureless in-circuit test • Project management • Structured interviewing
• Content analysis • Focus group • Qualitative research • Systems theory
• Conversation analysis • Force field analysis • Quality audit • Student testing
• Cost-benefit analysis • Game theory • Quality circle • Total Quality Management
• Grading • Quality control • Triangulation
• Historical method • Quality management • Wizard of Oz experiment
• Inquiry
• Interview
Notes and references
[1] Rossi, P.H.; Lipsey, M.W., & Freeman, H.E. (2004). Evaluation: A systematic approach (7th ed.). Thousand Oaks: Sage.
ISBN 978-0761908944.
[2] Reeve, J; Peerbhoy, D. (2007). "Evaluating the evaluation: Understanding the utility and limitations of evaluation as a tool for organizational
learning". Health Education Journal 66 (2): 120–131. doi:10.1177/0017896907076750.
[3] Hurteau, M.; Houle, S., & Mongiat, S. (2009). "How Legitimate and Justified are Judgments in Program Evaluation?" (http:// evi.sagepub.
com/ content/ 15/ 3/ 307. abstract). Evaluation 15 (3): 307–319. doi:10.1177/1356389009105883. .
[4] Alkin; Ellett (1990). not given. p. 454.
[5] Potter, C. (2006). "Psychology and the art of program evaluation". South African journal of psychology 36 (1): 82–102.
[6] Joint Committee on Standards for Educational Evaluation (http:// www. wmich.edu/ evalctr/ jc/ )
[7] American Evaluation Association Guiding Principles for Evaluators (http:/ / www.eval. org/Publications/ GuidingPrinciples.asp)
[8] UNEG (http:/ / www.uneval. org)
[9] DAC Network on Development Evaluation Home Page (http:/ / www. oecd.org/site/ 0,2865,en_21571361_34047972_1_1_1_1_1,00. html)
[10] House, E. R. (1978). Assumptions underlying evaluation models. Educational Researcher. 7(3), 4-12.
[11] Stufflebeam, D. L., & Webster, W. J. (1980). "An analysis of alternative approaches to evaluation" (http:/ / www. jstor. org/stable/
1163593). Educational Evaluation and Policy Analysis. 2(3), 5-19. OCLC 482457112
External links
• Links to Assessment and Evaluation Resources (http:/ / www.education. purdue.edu/ assessmentcouncil/ Links/
Index.htm) - List of links to resources on several topics
• Glossaries (http:/ / ec. wmich. edu/ glossary/ index.htm)
• Evaluation Portal Link Collection (http:// www. evaluation. lars-balzer.name/ links/ ) Evaluation link collection
with information about evaluation journals, dissemination, projects, societies, how-to texts, books, and much
• Free Resources for Methods in Evaluation and Social Research (http:/ / gsociology. icaap. org/methods/ )
100 Best Workplaces in Europe
100 Best Workplaces in Europe is a ranking of the 100 workplaces in Europe performed each year Financial
Times, in partnership with Great Place to Work. The list is based on employee surveys and a review of the
company's culture. Two thirds of the total score is from employee responses to a 57 question survey on the culture of
the company. The rest of the score is based on demographics, pay, benefits, culture, and community involvement.
The 2007 survey winner was the car manufacturer Ferrari of Italy.
External links
• Great Place to Work Institute
• 100 Best Workplaces in Europe
[1] http:/ / www. greatplacetowork.com
[2] http:/ / www. ft.com/ reports/ bestwork2007/
Academic equivalency evaluation
Academic equivalency evaluation
An academic equivalency evaluation is an analytical report which determines the equivalency in the United States
educational system of a potential US immigrant's foreign academic and professional credentials. This evaluation
determines the level of education and number of years completed, as well as the field of specialization. Academic
evaluations consider the educational system of the foreign country, the quality of the university attended by the
candidate, the credit hours and number of years of coursework, the nature of the courses, and the grades attained in
the courses.
An academic equivalency evaluation is primarily required for H-1B visa applicants who have not earned an
academic degree at a university or college in the United States, but have acquired a degree from another country.
H-1B visas require a bachelor's degree or its equivalent as a minimum.
Academic equivalency evaluations provide
the equivalent basis the United States Citizenship and Immigration Services accepts in order to apply for an H-1B
visa. Academic equivalency evaluations can also be used towards other visas such as TN status, E-3, L-1B, Green
Card, and I-140.
Documents Required
Companies that provide academic equivalency evaluations typically require copies of any diplomas, transcripts, and
post-graduate degrees, that a candidate may have.
Academic degrees that can be evaluated may include, but not
limited to; bachelor's degrees, master's degrees, or Ph.D., etc.
[1] 8 U.S.C. 1184(i)(1)(B) (http:// www. law. cornell.edu/ uscode/ html/ uscode08/ usc_sec_08_00001184----000-.html)
[2] Requirements for an academic equivalency evaluation (http:// trustfortecorp.com/ academic.html)
Accountability is a concept in ethics and governance with several meanings. It is often used synonymously with
such concepts as responsibility,
answerability, blameworthiness, liability, and other terms associated with the
expectation of account-giving. As an aspect of governance, it has been central to discussions related to problems in
the public sector, nonprofit and private (corporate) worlds. In leadership roles,
accountability is the
acknowledgment and assumption of responsibility for actions, products, decisions, and policies including the
administration, governance, and implementation within the scope of the role or employment position and
encompassing the obligation to report, explain and be answerable for resulting consequences.
As a term related to governance, accountability has been difficult to define.

It is frequently described as an
account-giving relationship between individuals, e.g. "A is accountable to B when A is obliged to inform B about A’s
(past or future) actions and decisions, to justify them, and to suffer punishment in the case of eventual
Accountability cannot exist without proper accounting practices; in other words, an absence of
accounting means an absence of accountability.
History and etymology
"Accountability" stems from late Latin accomptare (to account), a prefixed form of computare (to calculate), which
in turn derived from putare (to reckon).
While the word itself does not appear in English until its use in 13th
century Norman England,

the concept of account-giving has ancient roots in record keeping activities related to
governance and money-lending systems that first developed in Ancient Israel,
and later, Rome.
Bruce Stone, O.P. Dwivedi, and Joseph G. Jabbra list 8 types of accountability, namely: moral, administrative,
political, managerial, market, legal/judicial, constituency relation, and professional.
Leadership accountability
cross cuts many of these distinctions.
Political accountability
Political accountability is the accountability of the government, civil servants and politicians to the public and to
legislative bodies such as a congress or a parliament.
In a few cases, recall elections can be used to revoke the office of an elected official. Generally, however, voters do
not have any direct way of holding elected representatives to account during the term for which they have been
elected. Additionally, some officials and legislators may be appointed rather than elected. Constitution, or statute,
can empower a legislative body to hold their own members, the government, and government bodies to account. This
can be through holding an internal or independent inquiry. Inquiries are usually held in response to an allegation of
misconduct or corruption. The powers, procedures and sanctions vary from country to country. The legislature may
have the power to impeach the individual, remove them, or suspend them from office for a period of time. The
accused person might also decide to resign before trial. Impeachment in the United States has been used both for
elected representatives and other civil offices, such as district court judges.
In parliamentary systems, the government relies on the support or parliament, which gives parliament power to hold
the government to account. For example, some parliaments can pass a vote of no confidence in the government.
Ethical accountability
Ethical accountability is the practice of improving overall personal and organizational performance by developing
and promoting responsible tools and professional expertise, and by advocating an effective enabling environment for
people and organizations to embrace a culture of sustainable development. Ethical accountability may include the
individual, as well as small and large businesses, not-for-profit organizations, research institutions and academics,
and government. One scholarly paper has posited that "it is unethical to plan an action for social change without
excavating the knowledge and wisdom of the people who are responsible for implementing the plans of action and
the people whose lives will be affected."
Debates around the practice of ethical accountability on the part of
researchers in the social field - whether professional or others - have been thoroughly explored by Norma Romm in
her work on Accountability in Social Research [16], including her book on New Racism: Revisiting Researcher
Accountabilities, reviewed by Carole Truman in the journal Sociological Research Online [17]. Here it is suggested
that researcher accountability implies that researchers are cognisant of, and take some responsibility for, the potential
impact of their ways of doing research - and of writing it up - on the social fields of which the research is part. That
is, accountability is linked to considering carefully, and being open to challenge in relation to, one's choices
concerning how research agendas are framed and the styles in which write-ups of research "results" are created.
Administrative accountability
Internal rules and norms as well as some independent commission are mechanisms to hold civil servant within the
administration of government accountable. Within department or ministry, firstly, behavior is bounded by rules and
regulations; secondly, civil servants are subordinates in a hierarchy and accountable to superiors. Nonetheless, there
are independent “watchdog” units to scrutinize and hold departments accountable; legitimacy of these commissions is
built upon their independence, as it avoids any conflicts of interest. Apart from internal checks, some “watchdog”
units accept complaints from citizens, bridging government and society to hold civil servants accountable to citizens,
but not merely governmental departments.
Market accountability
Under voices for decentralization and privatization of the government, services provided are nowadays more
“customer-driven” and should aim to provide convenience and various choices to citizens; with this perspective, there
are comparisons and competition between public and private services and this, ideally, improves quality of service.
As mentioned by Bruce Stone, the standard of assessment for accountability is therefore “responsiveness of service
providers to a body of ‘sovereign’ customers and produce quality service. Outsourcing service is one means to adopt
market accountability. Government can choose among a shortlist of companies for outsourced service; within the
contracting period, government can hold the company by rewriting contracts or by choosing another company.
Constituency relations
Within this perspective, a particular agency or the government is accountable if voices from agencies, groups or
institutions, which is outside the public sector and representing citizens’ interests in a particular constituency or field,
are heard. Moreover, the government is obliged to empower members of agencies with political rights to run for
elections and be elected; or, appoint them into the public sector as a way to hold the government representative and
ensure voices from all constituencies are included in policy-making process.
Public/private overlap
With the increase over the last several decades in public service provision by private entities, especially in Britain
and the United States, some have called for increased political accountability mechanisms to be applied to otherwise
non-political entities. Legal scholar Anne Davies, for instance, argues that the line between public institutions and
private entities like corporations is becoming blurred in certain areas of public service provision in the United
Kingdom and that this can compromise political accountability in those areas. She and others argue that some
administrative law reforms are necessary to address this accountability gap.
With respect to the public/private overlap in the United States, public concern over the contracting out of
government (including military) services and the resulting accountability gap has been highlighted recently
following the shooting incident involving the Blackwater security firm in Iraq.
Contemporary evolution
Accountability involves either the expectation or assumption of account-giving behavior. The study of account
giving as a sociological act was articulated in a 1968 article on "Accounts" by Marvin Scott and Stanford Lyman
and Stephen Soroka , although it can be traced as well to J. L. Austin's 1956 essay "A Plea for Excuses,"
in which
he used excuse-making as an example of speech acts.
Communications scholars have extended this work through the examination of strategic uses of excuses,
justifications, rationalizations, apologies and other forms of account giving behavior by individuals and corporations,
and Philip Tetlock and his colleagues have applied experimental design techniques to explore how individuals
behave under various scenarios and situations that demand accountability.
Recently, accountability has become an important topic in the discussion about the legitimacy of international
Because there is no global democratically elected body to which organizations must account, global
organizations from all sectors bodies are often criticized as having large accountability gaps. The Charter 99 for
Global Democracy,
spearheaded by the One World Trust, first proposed that cross-sector principles of
accountability be researched and observed by institutions that affect people, independent of their legal status. One
paradigmatic problem arising in the global context is that of institutions such as the World Bank and the
International Monetary Fund who are founded and supported by wealthy nations and provide aid, in the form of
grants and loans, to developing nations. Should those institutions be accountable to their founders and investors or to
the persons and nations they help? In the debate over global justice and its distributional consequences,
Cosmopolitans tend to advocate greater accountability to the disregarded interests of traditionally marginalized
populations and developing nations. On the other hand, those in the Nationalism and Society of States traditions
deny the tenets of moral universalism and argue that beneficiaries of global development initiatives have no
substantive entitlement to call international institutions to account. The One World Trust Global Accountability
Report, published in a first full cycle 2006 to 2008,
is one attempt to measure the capability of global
organizations to be accountable to their stakeholders.
Accountability is becoming an increasingly important issue for the non-profit world. Several NGOs signed the
"accountability charter" in 2005. In the Humanitarian field, initiatives such as the HAPI (Humanitarian
Accountability Partnership International) appeared. Individual NGOs have set their own accountability systems (for
example, the ALPS, Accountability, Learning and Planning System of ActionAid)
Accountability in education
Sudbury schools choose to recognize that students are personally responsible for their acts, in opposition to virtually
all schools today that deny it. The denial is threefold: schools do not permit students to choose their course of action
fully; they do not permit students to embark on the course, once chosen; and they do not permit students to suffer the
consequences of the course, once taken. Freedom of choice, freedom of action, freedom to bear the results of
action—these are the three great freedoms that constitute personal responsibility. Sudbury schools claim that
"Ethics" is a course taught by life experience. They adduce that the absolutely essential ingredient for acquiring
values—and for moral action is personal responsibility, that schools will become involved in the teaching of morals
when they become communities of people who fully respect each others' right to make choices, and that the only
way the schools can become meaningful purveyors of ethical values is if they provide students and adults with
real-life experiences that are bearers of moral import. Students are given complete responsibility for their own
education and the school is run by a direct democracy in which students and staff are equals.





Proposed symbolism
Viktor Frankl, neurologist and psychiatrist, founder of logotherapy and one of the key figures in existential therapy,
in his book Man's Search for Meaning recommended "that the Statue of Liberty on the East Coast (that has become a
symbol of Liberty and Freedom) should be supplemented by a Statue of Responsibility on the West Coast." Frankl
stated: "Freedom, however, is not the last word. Freedom is only part of the story and half of the truth. Freedom is
but the negative aspect of the whole phenomenon whose positive aspect is responsibleness. In fact, freedom is in
danger of degenerating into mere arbitrariness unless it is lived in terms of responsibleness."

[1] Dykstra, Clarence A. (February 1939). "The Quest for Responsibility". American Political Science Review (The American Political Science
Review, Vol. 33, No. 1) 33 (1): 1–25. doi:10.2307/1949761. JSTOR 1949761.
[2] Williams, Christopher (2006) Leadership accountability in a globalizing world. London: Palgrave Macmillan.
[3] Mulgan, Richard (2000). "'Accountability': An Ever-Expanding Concept?". Public Administration 78 (3): 555–573.
[4] Sinclair, Amanda (1995). "The Chameleon of Accountability: Forms and Discourses". Accounting, Organizations and Society 20 (2/3):
219–237. doi:10.1016/0361-3682(93)E0003-Y.
[5] Schedler, Andreas (1999). "Conceptualizing Accountability". In Andreas Schedler, Larry Diamond, Marc F. Plattner. The Self-Restraining
State: Power and Accountability in New Democracies. London: Lynne Rienner Publishers. pp. 13–28. ISBN 1-55587-773-7.
[6] Oxford English Dictionary 2nd Ed.
[7] Dubnick, Melvin (1998). "Clarifying Accountability: An Ethical Theory Framework". In Charles Sampford, Noel Preston and C. A. Bois.
Public Sector Ethics: Finding And Implementing Values. Leichhardt, NSW, Australia: The Federation Press/Routledge. pp. 68–8l.
[8] Seidman, Gary I (Winter 2005). "The Origins of Accountability: Everything I Know About the Sovereign's Immunity, I Learned from King
Henry III". St. Louis University Law Journal 49 (2): 393–480.
[9] Walzer, Michael (1994). "The Legal Codes of Ancient Israel". In Ian Shapiro. the Rule of Law. NY: New York University Press.
pp. 101–119.
[10] Urch, Edwin J. (July 1929). "The Law Code of Hammurabi". American Bar Association Journal 15 (7): 437–441.
[11] Ezzamel, Mahmoud (December 1997). "Accounting, Control and Accountability: Preliminary Evidence from Ancient Egypt". Critical
Perspectives on Accounting 8 (6): 563–601. doi:10.1006/cpac.1997.0123.
[12] Roberts, Jennnifer T. (1982). Accountability in Athenian Government. Madison, WI: University of Wisconsin Press.
[13] Plescia, Joseph (January 2001). "Judicial Accountability and Immunity in Roman Law". American Journal of Legal History (The American
Journal of Legal History, Vol. 45, No. 1) 45 (1): 51–70. doi:10.2307/3185349. JSTOR 3185349.
[14] Jabbra, J. G. and Dwivedi, 0. P. (eds.), Public Service Accountability: A Comparative Perspective, Kumarian Press, Hartford, CT, 1989,
ISBN 0-7837-7581-4
[15] Communication praxis for ethical accountability (http:// www3.interscience.wiley. com/ journal/121356758/ abstract?CRETRY=1&
SRETRY=0) Laouris, Y., R. Laouri, and Aleco Christakis; Cyprus Neuroscience and Technology Institute, Cyprus; July 2008
[16] http:/ / www. springer.com/ social+ sciences/ book/ 978-0-306-46564-2?detailsPage=reviews
[17] http:// www. socresonline. org.uk/ 16/ 2/ reviews/ 3. html
[18] "oxford law - the faculty and its members : anne davies" (http:/ / www.competition-law.ox. ac.uk/ members/ profile.
phtml?lecturer_code=daviesa). Competition-law.ox.ac.uk. . Retrieved 2009-08-26.
[19] Harriman, Ed (2007-09-28). "Blackwater poisons the well" (http:// commentisfree.guardian.co. uk/ ed_harriman/2007/ 09/
blackwater_poisons_the_well. html). London: Commentisfree.guardian.co.uk. . Retrieved 2009-08-26.
[20] Scott, Marvin B.; Lyman, Stanford M. (February 1968). "Accounts". American Sociological Review (American Sociological Review, Vol.
33, No. 1) 33 (1): 46–62. doi:10.2307/2092239. JSTOR 2092239. PMID 5644339.
[21] Austin, J.L. 1956-7. A plea for excuses. Proceedings of the Aristotelian Society. Reprinted in J. O. Urmson & G. J. Warnock, eds., 1979, J.
L. Austin: Philosophical Papers, 3rd edition. Oxford: Clarendon Press, 175-204.
[22] Grant, Ruth W.; Keohane, Robert O. (2005). "Accountability and Abuses of Power in World Politics". American Political Science Review
99 (1): 29–43. doi:10.1017/S0003055405051476.
[23] http:// www. oneworldtrust.org/ index. php?option=com_docman&task=doc_download& gid=14&Itemid=55
[24] http:// www. oneworldtrust.org/ index. php?option=com_content&view=article&id=73& Itemid=60
[25] Greenberg, D. (1992), Education in America - A View from Sudbury Valley, "'Ethics' is a Course Taught By Life Experience." (http://
books.google. com/ books?id=YQn_BA76TF4C& pg=PA60& lpg=PA60& dq= Ethics +is+ a+Course+ Taught+By+ Life+
Experience,+ DANIEL+GREENBERG,+ + EDUCATION+IN+AMERICA,+ A+ View+ From+Sudbury+Valley& source=bl&
ots=Mg-gISVCwd& sig=k0nRX2sR8yRek3fp3ymUI_JRGTo& hl=en& ei=XVbKSf_uNNKrjAee57TPAw&sa=X& oi=book_result&
resnum=1&ct=result) Retrieved, 24 October 2009.
[26] Greenberg, D. (1987) The Sudbury Valley School Experience "Back to Basics - Moral basics." (http:// www.sudval. com/
05_underlyingideas. html#09) Retrieved, 24 October 2009.
[27] Feldman, J. (2001) "The Moral Behavior of Children and Adolescents at a Democratic School." Pdf. (http:// eric. ed. gov/ ERICWebPortal/
contentdelivery/servlet/ERICServlet?accno=ED453128) This study examined moral discourse, reflection, and development in a school
community with a process similar to that described by Lawrence Kohlberg. Data were drawn from an extensive set of field notes made in an
ethnographic study at Sudbury Valley School (an ungraded, democratically structured school in Framingham, MA), where students, ranging in
age from 4 to 19, are free to choose their own activities and companions. Vignettes were analyzed using grounded theory approach to
qualitative analysis, and themes were developed from an analysis of observations of meetings. Each theme describes a participation level that
students assume in the process and that provide opportunities for them to develop and deepen understanding of the balance of personal rights
and responsibilities within a community. The study adds to the understanding of education and child development by describing a school that
differs significantly in its practice from the wider educational community and by validating Kohlberg's thesis about developing moral
reasoning. Retrieved, 24 October 2009.
[28] The Sudbury Valley School (1970), "Law and Order: Foundations of Discipline" (http:// books.google.com/ books?id=MAqxzEss8k4C&
pg=PA49&dq=The+Crisis+ in+ American+Education+ +An+ Analysis+ and+a+ Proposal,+ The+Sudbury+Valley+School+ (1970),+
Law+ and+ Order:+Foundations+ of+Discipline) The Crisis in American Education — An Analysis and a Proposal. (http:/ / books.google.
com/ books?id=MAqxzEss8k4C& dq=The+Sudbury+Valley+School+ The+Crisis+ in+ American+Education+ +An+ Analysis+ and+
a+Proposal& printsec=frontcover&source=bl&ots=SxfbDj7JFo& sig=uIMBDP2jZ4zZQMRSh6L9a06Mm-I& hl=en&
ei=oiVVStLHJ4WInQP1zf0F&sa=X& oi=book_result&ct=result& resnum=1)(p. 49-55). Retrieved, 24 October 2009.
[29] Greenberg, D. (1992) "Democracy Must be Experienced to be Learned !" (http:// books. google.com/ books?id=YQn_BA76TF4C&
pg=PA103&lpg=PA103&dq=Democracy+Must+ be+Experienced+to+ be+Learned+!+Education+in+America+--+A+ View+from+
Sudbury+ Valley,+ Daniel+Greenberg&source=bl& ots=Mg0bzPQBuk& sig=0_8jBKqYW0PK8d66SZXYtJIX4ZQ& hl=en&
ei=1VwISsCIAYGlsAbd4birCA& sa=X& oi=book_result&ct=result& resnum=1#PPA103,M1) Education in America — A View from
Sudbury Valley. Retrieved, 24 October 2009.
[30] Reiss, S. (2010), Whatever Happened to Personal Responsibility? (http:/ / www.psychologytoday. com/ blog/ who-we-are/201006/
whatever-happened-personal-responsibility). Retrieved August 18, 2010.
[31] Frankl, Viktor Emil (1956) Man's Search for Meaning, p. 209-210.
[32] Warnock, C. (2005) "Statue of Responsibility," (http:/ / www. heraldextra.com/ news/ article_21e93a1f-94db-5533-9e5c-0560fff08972.
html) DAILY HERALD. Retrieved 24 October 2009.
• Hunt, G. ‘The Principle of Complementarity: Freedom of Information, Public Accountability & Whistleblowing’,
chap 5 in R A Chapman & M Hunt (eds) Open Government in a Theoretical and Practical Context. Ashgate,
Aldershot, 2006.
• Hunt, G. (ed) Whistleblowing in the Social Services: Public Accountability & Professional Practice, Arnold
(Hodder), 1998.
Further reading
• Sterling Harwood, "Accountability," in John K. Roth, ed., Ethics: Ready Reference (Salem Press, 1994), reprinted
in Sterling Harwood, ed., Business as Ethical and Business as Usual (Wadsworth Publishing Co., 1996).
• Romm, Norma RA (2001) Accountability in Social Research. New York: Springer. (http:// www. springer.com/
social+ sciences/ book/ 978-0-306-46564-2)
• Williams, Christopher (2006) Leadership accountability in a globalizing world. London: Palgrave Macmillan.
External links
• Citizens' Circle for Accountability (http:// www. accountabilitycircle.org)
Accuracy and precision
In the fields of science, engineering, industry and statistics, the accuracy
of a measurement system is the degree of
closeness of measurements of a quantity to that quantity's actual (true) value. The precision
of a measurement
system, also called reproducibility or repeatability, is the degree to which repeated measurements under unchanged
conditions show the same results.
Although the two words can be synonymous in colloquial use, they are
deliberately contrasted in the context of the scientific method.
Accuracy indicates proximity of measurement results to the true value, precision to
the repeatability or reproducibility of the measurement
A measurement system can be accurate but
not precise, precise but not accurate, neither,
or both. For example, if an experiment
contains a systematic error, then increasing
the sample size generally increases precision
but does not improve accuracy. The end
result would be a consistent yet inaccurate
string of results from the flawed experiment.
Eliminating the systematic error improves
accuracy but does not change precision.
A measurement system is designated valid if
it is both accurate and precise. Related
terms include bias (non-random or directed
effects caused by a factor or factors unrelated to the independent variable) and error (random variability).
The terminology is also applied to indirect measurements--that is, values obtained by a computational procedure
from observed data.
In addition to accuracy and precision, measurements may also have a measurement resolution, which is the smallest
change in the underlying physical quantity that produces a response in the measurement.
In the case of full reproducibility, such as when rounding a number to a representable floating point number, the
word precision has a meaning not related to reproducibility. For example, in the IEEE 754-2008 standard it means
the number of bits in the significand, so it is used as a measure for the relative accuracy with which an arbitrary
number can be represented.
Accuracy and precision
Accuracy versus precision: the target analogy
High accuracy, but low
High precision, but low
Accuracy is the degree of veracity while in some contexts precision may mean the
degree of reproducibility.
The analogy used here to explain the difference between accuracy and precision is the
target comparison. In this analogy, repeated measurements are compared to arrows that
are shot at a target. Accuracy describes the closeness of arrows to the bullseye at the
target center. Arrows that strike closer to the bullseye are considered more accurate. The
closer a system's measurements to the accepted value, the more accurate the system is
considered to be.
To continue the analogy, if a large number of arrows are shot, precision would be the
size of the arrow cluster. (When only one arrow is shot, precision is the size of the
cluster one would expect if this were repeated many times under the same conditions.)
When all arrows are grouped tightly together, the cluster is considered precise since they
all struck close to the same spot, even if not necessarily near the bullseye. The
measurements are precise, though not necessarily accurate.
However, it is not possible to reliably achieve accuracy in individual measurements
without precision—if the arrows are not grouped close to one another, they cannot all be
close to the bullseye. (Their average position might be an accurate estimation of the
bullseye, but the individual arrows are inaccurate.) See also circular error probable for
application of precision to the science of ballistics.
Quantifying accuracy and precision
Ideally a measurement device is both accurate and precise, with measurements all close to and tightly clustered
around the known value. The accuracy and precision of a measurement process is usually established by repeatedly
measuring some traceable reference standard. Such standards are defined in the International System of Units and
maintained by national standards organizations such as the National Institute of Standards and Technology.
This also applies when measurements are repeated and averaged. In that case, the term standard error is properly
applied: the precision of the average is equal to the known standard deviation of the process divided by the square
root of the number of measurements averaged. Further, the central limit theorem shows that the probability
distribution of the averaged measurements will be closer to a normal distribution than that of individual
With regard to accuracy we can distinguish:
• the difference between the mean of the measurements and the reference value, the bias. Establishing and
correcting for bias is necessary for calibration.
• the combined effect of that and precision.
A common convention in science and engineering is to express accuracy and/or precision implicitly by means of
significant figures. Here, when not explicitly stated, the margin of error is understood to be one-half the value of the
last significant place. For instance, a recording of 843.6 m, or 843.0 m, or 800.0 m would imply a margin of 0.05 m
(the last significant place is the tenths place), while a recording of 8,436 m would imply a margin of error of 0.5 m
(the last significant digits are the units).
A reading of 8,000 m, with trailing zeroes and no decimal point, is ambiguous; the trailing zeroes may or may not be
intended as significant figures. To avoid this ambiguity, the number could be represented in scientific notation:
8.0 × 10
 m indicates that the first zero is significant (hence a margin of 50 m) while 8.000 × 10
 m indicates that all
three zeroes are significant, giving a margin of 0.5 m. Similarly, it is possible to use a multiple of the basic
Accuracy and precision
measurement unit: 8.0 km is equivalent to 8.0 × 10
 m. In fact, it indicates a margin of 0.05 km (50 m). However,
reliance on this convention can lead to false precision errors when accepting data from sources that do not obey it.
Looking at this in another way, a value of 8 would mean that the measurement has been made with a precision of 1
(the measuring instrument was able to measure only down to 1s place) whereas a value of 8.0 (though
mathematically equal to 8) would mean that the value at the first decimal place was measured and was found to be
zero. (The measuring instrument was able to measure the first decimal place.) The second value is more precise.
Neither of the measured values may be accurate (the actual value could be 9.5 but measured inaccurately as 8 in both
instances). Thus, accuracy can be said to be the 'correctness' of a measurement, while precision could be identified as
the ability to resolve smaller differences.
Precision is sometimes stratified into:
• Repeatability — the variation arising when all efforts are made to keep conditions constant by using the same
instrument and operator, and repeating during a short time period; and
• Reproducibility — the variation arising using the same measurement process among different instruments and
operators, and over longer time periods.
Accuracy and precision in binary classification
Accuracy is also used as a statistical measure of how well a binary classification test correctly identifies or excludes
a condition.
Condition as determined by Gold standard
True False
Positive True positive False positive → Positive predictive value
Negative False negative True negative → Negative predictive


That is, the accuracy is the proportion of true results (both true positives and true negatives) in the population. It is a
parameter of the test.
On the other hand, precision is defined as the proportion of the true positives against all the positive results (both true
positives and false positives)
An accuracy of 100% means that the measured values are exactly the same as the given values.
Also see Sensitivity and specificity.
Accuracy may be determined from Sensitivity and Specificity, provided Prevalence is known, using the equation:
The accuracy paradox for predictive analytics states that predictive models with a given level of accuracy may have
greater predictive power than models with higher accuracy. It may be better to avoid the accuracy metric in favor of
other metrics such as precision and recall. In situations where the minority class is more important, F-measure may
be more appropriate, especially in situations with very skewed class imbalance. An alternate performance measure
that treats both classes with equal importance is "balanced accuracy":
Accuracy and precision
Accuracy and precision in psychometrics and psychophysics
In psychometrics and psychophysics, the term accuracy is interchangeably used with validity and constant error.
Precision is a synonym for reliability and variable error. The validity of a measurement instrument or psychological
test is established through experiment or correlation with behavior. Reliability is established with a variety of
statistical techniques, classically through an internal consistency test like Cronbach's alpha to ensure sets of related
questions have related responses, and then comparison of those related question between reference and target
Accuracy and precision in logic simulation
In logic simulation, a common mistake in evaluation of accurate models is to compare a logic simulation model to a
transistor circuit simulation model. This is a comparison of differences in precision, not accuracy. Precision is
measured with respect to detail and accuracy is measured with respect to reality.

Accuracy and precision in information systems
The concepts of accuracy and precision have also been studied in the context of data bases, information systems and
their sociotechnical context. The necessary extension of these two concepts on the basis of theory of science suggests
that they (as well as data quality and information quality) should be centered on accuracy defined as the closeness to
the true value seen as the degree of agreement of readings or of calculated values of one same conceived entity,
measured or calculated by different methods, in the context of maximum possible disagreement.
[1] JCGM 200:2008 International vocabulary of metrology (http:/ / www.bipm. org/utils/ common/ documents/ jcgm/ JCGM_200_2008. pdf)
— Basic and general concepts and associated terms (VIM)
[2] John Robert Taylor (1999). An Introduction to Error Analysis: The Study of Uncertainties in Physical Measurements (http:/ / books. google.
com/ books?id=giFQcZub80oC& pg=PA128). University Science Books. pp. 128–129. ISBN 0-935702-75-X. .
[3] John M. Acken, Encyclopedia of Computer Science and Technology, Vol 36, 1997, page 281-306
[4] 1990 Workshop on Logic-Level Modelling for ASICS, Mark Glasser, Rob Mathews, and John M. Acken, SIGDA Newsletter, Vol 20.
Number 1, June 1990
[5] Ivanov, K. (1972). "Quality-control of information: On the concept of accuracy of information in data banks and in management information
systems" (http:/ / www.informatik.umu. se/ ~kivanov/ diss-avh. html). The University of Stockholm and The Royal Institute of Technology.
Doctoral dissertation. Further details are found in Ivanov, K. (1995). A subsystem in the design of informatics: Recalling an archetypal
engineer. In B.. Dahlbom (Ed.), The infological equation: Essays in honor of Börje Langefors (http:// www.informatik.umu. se/ ~kivanov/
BLang80.html), (pp. 287-301). Gothenburg: Gothenburg University, Dept. of Informatics (ISSN 1101-7422).
External links
• BIPM - Guides in metrology (http:// www. bipm. org/en/ publications/ guides/ ) - Guide to the Expression of
Uncertainty in Measurement (GUM) and International Vocabulary of Metrology (VIM)
• "Beyond NIST Traceability: What really creates accuracy" (http:// img. en25.com/ Web/ Vaisala/ NIST-article.
pdf) - Controlled Environments magazine
• Precision and Accuracy with Three Psychophysical Methods (http:/ / www.yorku.ca/ psycho)
• Guidelines for Evaluating and Expressing the Uncertainty of NIST Measurement Results, Appendix D.1:
Terminology (http:// physics. nist. gov/ Pubs/ guidelines/ appd. 1. html)
• Accuracy and Precision (http:// digipac. ca/ chemical/ sigfigs/ contents. htm)
American Evaluation Association
American Evaluation Association
The American Evaluation Association (AEA) is a professional association for evaluators and those with a
professional interest in the field of evaluation, including practitioners, faculty, students, funders, managers, and
government decision-makers. As of 2009, AEA has approximately 5700 members from all 50 US states and over 60
other countries.
The American Evaluation Association's mission is to:
• Improve evaluation practices and methods
• Increase evaluation use
• Promote evaluation as a profession and
• Support the contribution of evaluation to the generation of theory and knowledge about effective human action.
Guiding Principles for Evaluators
AEA Publishes the AEA: Guiding Principles for Evaluators
, which set expectations for evaluators in the areas of:
(a) systematic inquiry, (b) competence, (c) integrity/honesty, (d) respect for people, and (e) responsibilities for
general and public welfare.
AEA sponsors two journals. The American Journal of Evaluation is published quarterly through SAGE Publications
and includes individually peer-reviewed articles on a range of topics in the field.
New Directions for Evaluation
is a peer-reviewed thematic sourcebook published quarterly through Jossey-Bass/Wiley, with each issue focusing on
a different topic or aspect of evaluation.
Topical Interest Groups
AEA has 41 topically-focused subgroups.
Each subgroup develops a strand of content for the association’s annual
conference, and works to build a community of practice through various means.
• Advocacy and Policy Change
• Alcohol, Drug Abuse, and Mental Health
• Assessment in Higher Education
• Business and Industry
• Cluster, Multi-Site and Multi-Level Evaluation
• Collaborative, Participatory & Empowerment Eval
• College Access Programs
• Costs, Effectiveness, Benefits, and Economics
• Crime and Justice
• Disaster and Emergency Management Evaluation
• Distance Education and Other Educational Tech
• Environmental Program Evaluation
• Evaluating the Arts and Culture
• Evaluation Managers and Supervisors
• Evaluation Use
• Extension Education Evaluation
• Feminist Issues in Evaluation
American Evaluation Association
• Government Evaluation
• Graduate Student and New Evaluators
• Health Evaluation
• Human Services Evaluation
• Independent Consulting
• Indigenous Peoples in Evaluation
• Integrating Technology into Evaluation
• International and Cross-Cultural Evaluation
• Lesbian, Gay, Bisexual & Transgender Issues
• Multiethnic Issues in Evaluation
• Needs Assessment
• Non-Profits and Foundations Evaluation
• Organizational Learning and Evaluation Capacity Building
• Prek-12 Educational Evaluation
• Program Theory and Theory Driven Evaluation
• Qualitative Methods
• Quantitative Methods: Theory and Design
• Research on Evaluation
• Research, Technology, and Development Eval
• Social Work
• Special Needs Populations
• Systems in Evaluation
• Teaching of Evaluation
• Theories of Evaluation
Merger of ERS and ENet
In 1986, the Evaluation Research Society and Evaluation Network merged to become the American Evaluation
[1] AEA: About Us http:/ / www.eval. org/aboutus/ organization/aboutus. asp. Retrieved on 2009-05-08
[2] http:// www. eval. org/Publications/ GuidingPrinciples. asp
[3] SAGE: American Journal of Evaluation http:/ / www.sagepub. com/ journalsProdDesc.nav?prodId=Journal201729.Retrieved 2009-05-01
[4] http:/ / www. josseybass. com/ WileyCDA/ Section/ id-161092.html
[5] AEA: About Us http:// www.eval. org/aboutus/ organization/tigs. asp. Retrieved on 2009-05-08.
[6] Evaluation Practice, 1986 (7), 107-110) http:// aje. sagepub. com/ cgi/ reprint/7/ 1/ 107.Retrieved 2009-05-01
External links
• American Evaluation Association (http:// www. eval. org/ )
Australian Drug Evaluation Committee
Australian Drug Evaluation Committee
The Australian Drug Evaluation Committee or ADEC, was a committee that provided independent scientific
advice to the Australian Government regarding therapeutic drugs. The committee was originally formed in 1963 and
more recently authorised under the Therapeutic Goods Act 1989 (Cth) as part of the Therapeutic Goods
Administration (TGA). In 2010, ADEC was replaced by the Advisory Committee on Prescription Medicines
ADEC provided advice to the Minister for Health and Ageing and the Secretary of the Department of Health on:
• quality, risk-benefit, effectiveness and accessibility of drugs referred to ADEC for evaluation
• medical and scientific evaluations of applications for registration of new drugs
An important role of ADEC was the classification of drugs in Australia into pregnancy categories.
The two main subcommittees of ADEC which were responsible for specific aspects of drug regulation in Australia:
• the Adverse Drug Reactions Advisory Committee (ADRAC) (replaced in 2010 by the separate Advisory
Committee on the Safety of Medicines, ACSOM);
• the Pharmaceutical Subcommittee – which made recommendations to ADEC on the pharmaceutical aspects
(chemistry, quality control, pharmacokinetics, etc) of drugs proposed for registration (replaced by the
pharmaceutical subcommittee of the ACPM).
External links
• ADEC website
[1] http:/ / www. tga. gov. au/ committee/ acpm. htm
[2] http:/ / www. tga. gov. au/ docs/ html/ adec/ adec. htm
BAPCo consortium
BAPCo consortium
BAPCo, Business Applications Performance Corporation, is a non-profit consortium with a charter to develop and
distribute a set of objective performance benchmarks for personal computers based on popular software applications
and operating systems.
BAPCo's current membership includes, ARCintuition, Atheros Communications, CNET, Compal Electronics, Dell,
Hewlett-Packard, Intel, Lenovo, Microsoft, SAMSUNG, SanDisk, Seagate, Sony, Toshiba, VNU Business
Publications Limited (UK), ZDNet, and Ziff Davis.
On June 21, 2011 AMD announced it had resigned from the BAPCo organization after failing to endorse the
SYSmark 2012 Benchmark.
Nvidia and VIA quit as well.
External links
• BAPCo.com
[1] "AMD Will Not Endorse SYSmark 2012 Benchmark" (http:/ / www.amd. com/ us/ press-releases/ Pages/ amd-will-not-endorse-2011june21.
aspx). AMD. . Retrieved 2011-06-22.
[2] "Nvidia, AMD, and VIA quit BAPCO over SYSmark 2012" (http:// semiaccurate.com/ 2011/ 06/ 20/
nvidia-amd-and-via-quit-bapco-over-sysmark-2012/). SemiAccurate. . Retrieved 2011-06-22.
[3] http:// www. bapco. com/
Career portfolio
Career portfolios are used to plan, organize and document education, work samples and skills. People use career
portfolios to apply to jobs, apply to college or training programs, get a higher salary, show transferable skills, and to
track personal development. They are more in-depth than a resume, which is used to summarize the above in one or
two pages. Career portfolios serve as proof of one's skills, abilities, and potential in the future. Career portfolios are
becoming common in high schools, college, and workforce development. Many school programs will have students
create, update, and use a career portfolio before moving on to the next level in life.
Career portfolios help with a job or acceptance into higher education institutes. A career portfolio should be personal
and contain critical information. Items that should be included include (but are not limited to) personal information,
evaluations, sample work, and awards and acknowledgments. Career portfolios are often kept in a simple three-ring
binder or online as an Electronic portfolio and updated often. A career portfolio is used as a marketing tool in selling
oneself for personal advancement. In some industries, employers or admission offices commonly request a career
portfolio, so it is a wise idea to have an updated one on hand.
Career portfolio
Online portfolio
In the 21st century web technology has filtered its way in to portfolios especially in the digital work place job
market. While traditional C.V style portfolios still dominate the portfolio world it is common to back it up with a
website containing personal statements, contact details and experience.
Social portfolio web sites such as LinkedIn have become popular, as have services from websites which offer to host
portfolios for clients.
Resume Reels and demo tapes are a type of portfolio. They are used by many in the arts such as musicians, actors,
artists and even journalists.
Creative Professionals are also looking for an portfolios websites for an exclusive online presence to present their
work more professionally and elegantly.
A typical type of a portfolio is one used by artists. An artist's portfolio consists of artwork that the artist can take to
job interviews, conferences, galleries, and other networking opportunities to showcase his or her work, and give
others an idea of what type of genre the artist works in. Art Portfolios, sometimes called "artfolios", can be a variety
of sizes, and usually consist of approximately ten to twenty photographs of the artist's best works. Some artist create
multiple portfolios to showcase different styles and to apply for different types of jobs. For instance, one portfolio
may be mainly for doing technical illustrations and another may be for surreal painting or sculpture.
Models and actors often use electronic portfolios to showcase their career with a digital display of their photos,
biographies and skills. Talent portfolios can also include video.
Animators typically use demo reels, also known as "demo tapes", in place of or in addition to portfolios to
demonstrate their skills to potential employers. Demo tapes have historically been short VHS tapes, but most are
now DVDs. Demo reels are normally less than two minutes in length and showcase some of the animator's best
In industries that do not commonly use portfolios, a portfolio can be a way to stand out from the competition during
job interviews. For example, programmers can use portfolios in addition to a resume in order to showcase their best
work and highlight challenging projects
[1] Baena, Carlos. "Top Five Things NOT to Do On Your Demo Reel" (http:/ / www. animationarena. com/ demo-reel.html). .
[2] "The Power of a Programming Portfolio" (http:// grokcode. com/ 58/ the-power-of-a-programming-portfolio/). .
CareerScope is a standardized and timed interest and aptitude assessment for career guidance. The system is widely
used in schools, job training programs and in rehabilitation agencies and has been validated against widely
recognized criteria. CareerScope delivers an objective assessment (as opposed to subjective self-assessment) that is
written at a fourth-grade reading level. The process is student or client self-administered and takes one hour
(self-timed - interest & aptitude assessments can be split into shorter sessions). The system generates counselor and
client/student report versions. Career recommendations can be generated that are consistent with the Guide for
Occupational Exploration, the Dictionary of Occupational Titles, O*NET as well as the U.S. DOE Career Clusters
and Pathways. It was designed and developed by the nonprofit Vocational Research Institute.
Specific aptitudes assessed by CareerScope
• General Learning Ability
• Verbal Aptitude
• Numerical Aptitude
• Spatial Aptitude
• Form Perception
• Clerical Perception
Specific interest categories assessed by CareerScope
Artistic . . . Plants/Animals . . . Mechanical . . . Business Detail . . . Accommodating . . . Lead/Influence . . .
Scientific . . . Protective . . . Industrial . . . Selling . . . Humanitarian . . . Physical Performing
External links
• Vocational Research Institute
• CareerScope Training Web Site
• Background and Structure of Career Clusters
[1] http:/ / www. vri.org
[2] http:/ / www. theworksuite. com/ careerscopetraining.html
[3] http:// www. theworksuite. com/ id30. html
CESG Claims Tested Mark
CESG Claims Tested Mark
The CESG Claims Tested Mark (abbreviated as CCT Mark), formerly CSIA Claims Tested Mark
, is a UK
Government Standard for computer security.
The CCT Mark is based upon framework where vendors can make claims about the security attributes of their
products and/or services, and independent testing laboratories can evaluate the products/services to determine if they
actually meet the claims. In other words, the CCT Mark provides quality assurance approach to validate whether the
implementation of a computer security product or services has been performed in an appropriate manner.
The CCT Mark was developed under the auspices of the UK Government's Central Sponsor for Information
(CSIA), which is part of the Cabinet Office's Intelligence, Security and Resilience (ISR) function. The
role of providing specialist input to the CCT Mark fell to CESG as the UK National Technical Authority (NTA) for
Information Security, who assumed responsibility for the scheme as a whole on 7 April 2008.
All Testing Laboratories must comply with ISO 17025, with the United Kingdom Accreditation Service (UKAS)
carrying out the accreditation.
The CCT Mark is often compared to the international Common Criteria (CC), which is simultaneously both correct
and incorrect:
• Both provide methods for achieving a measure of assurance of computer security products and systems
• Neither can provide a guarantee that approval means that no exploitable flaws exist, but rather reduce the
likelihood of such flaw being present
• The Common Criteria is constructured in a layered manner, with multiple Evaluation Assurance Level (EAL)
specifications being available with increasing complexity, timescale and costs as the EAL number rises
• Common Criteria is supported by a Mutual Recognition Agreement (MRA), which, at the lower EAL numbers at
least, means that products tested in one country will normally be accepted in other markets
• The CCT Mark is aimed at the same market as the lower CC EAL numbers (currently EAL1/2), and has been
specifically designed for timescale and cost efficiency
CESG Claims Tested Mark
As of September 2010, CESG have announced that the product assurance element of CCT Mark will be overtaken by
the new Commercial Product Assurance (CPA) approach. It is unclear as yet whether CCT Mark will remain in
existence for assurance of Information Security services.
External links
• The official website of the CESG Claims Tested Mark
[1] FAQs About CCTM (http:/ / www.cctmark. gov. uk/ FAQs/ tabid/ 56/ Default.aspx)
[2] Central Sponsor for Information Assurance (CSIA) (http:/ / www. csia. gov.uk)
[3] http:// www. cctmark.gov. uk/
Commercial Product Assurance
Commercial Product Assurance (abbreviated as CPA) is (as of September 2010) an emergent UK Government
Standard for computer security.
It is intended to supplant other approaches such as Common Criteria (CC) and CCT Mark for UK government use.
CPA is being developed under the auspices of the UK Government's CESG
as the UK National Technical
Authority (NTA) for Information Security.
In comparison to other schemes:
• Unlike Common Criteria, there is no Mutual Recognition Agreement (MRA) for CPA, which means that products
tested in the UK will not normally be accepted in other markets
• Unlike the CCT Mark, the coverage of CPA is limited to Information Security products, and therefore excludes
services. The target audience for CPA also appears to be focused on Central Government ("I'm protecting
Government data")
rather than including the Wider Public Sector (WPS) and Critical National Infrastructure
(CNI) segments that were target customers for CCT Mark
[1] CESG Home Page (http:// www. cesg. gov. uk/ )
[2] CESG CPA Home Page (http:// www.cesg. gov. uk/ products_services/ iacs/ cpa/ index. shtml)
Common Criteria
Common Criteria
The Common Criteria for Information Technology Security Evaluation (abbreviated as Common Criteria or
CC) is an international standard (ISO/IEC 15408) for computer security certification. It is currently in version 3.1.
Common Criteria is a framework in which computer system users can specify their security functional and assurance
requirements, vendors can then implement and/or make claims about the security attributes of their products, and
testing laboratories can evaluate the products to determine if they actually meet the claims. In other words, Common
Criteria provides assurance that the process of specification, implementation and evaluation of a computer security
product has been conducted in a rigorous and standard manner.
Key concepts
Common Criteria evaluations are performed on computer security products and systems.
• Target Of Evaluation (TOE) - the product or system that is the subject of the evaluation.
The evaluation serves to validate claims made about the target. To be of practical use, the evaluation must verify the
target's security features. This is done through the following:
• Protection Profile (PP) - a document, typically created by a user or user community, which identifies security
requirements for a class of security devices (for example, smart cards used to provide digital signatures, or
network firewalls) relevant to that user for a particular purpose. Product vendors can choose to implement
products that comply with one or more PPs, and have their products evaluated against those PPs. In such a case, a
PP may serve as a template for the product's ST (Security Target, as defined below), or the authors of the ST will
at least ensure that all requirements in relevant PPs also appear in the target's ST document. Customers looking
for particular types of products can focus on those certified against the PP that meets their requirements.
• Security Target (ST) - the document that identifies the security properties of the target of evaluation. It may
refer to one or more PPs. The TOE is evaluated against the SFRs (see below) established in its ST, no more and
no less. This allows vendors to tailor the evaluation to accurately match the intended capabilities of their product.
This means that a network firewall does not have to meet the same functional requirements as a database
management system, and that different firewalls may in fact be evaluated against completely different lists of
requirements. The ST is usually published so that potential customers may determine the specific security features
that have been certified by the evaluation.
• Security Functional Requirements (SFRs) - specify individual security functions which may be provided by a
product. The Common Criteria presents a standard catalogue of such functions. For example, an SFR may state
how a user acting a particular role might be authenticated. The list of SFRs can vary from one evaluation to the
next, even if two targets are the same type of product. Although Common Criteria does not prescribe any SFRs to
be included in an ST, it identifies dependencies where the correct operation of one function (such as the ability to
limit access according to roles) is dependent on another (such as the ability to identify individual roles).
The evaluation process also tries to establish the level of confidence that may be placed in the product's security
features through quality assurance processes:
• Security Assurance Requirements (SARs) - descriptions of the measures taken during development and
evaluation of the product to assure compliance with the claimed security functionality. For example, an evaluation
may require that all source code is kept in a change management system, or that full functional testing is
performed. The Common Criteria provides a catalogue of these, and the requirements may vary from one
evaluation to the next. The requirements for particular targets or types of products are documented in the ST and
PP, respectively.
Common Criteria
• Evaluation Assurance Level (EAL) - the numerical rating describing the depth and rigor of an evaluation. Each
EAL corresponds to a package of security assurance requirements (SARs, see above) which covers the complete
development of a product, with a given level of strictness. Common Criteria lists seven levels, with EAL 1 being
the most basic (and therefore cheapest to implement and evaluate) and EAL 7 being the most stringent (and most
expensive). Normally, an ST or PP author will not select assurance requirements individually but choose one of
these packages, possibly 'augmenting' requirements in a few areas with requirements from a higher level. Higher
EALs do not necessarily imply "better security", they only mean that the claimed security assurance of the TOE
has been more extensively verified.
So far, most PPs and most evaluated STs/certified products have been for IT components (e.g., firewalls, operating
systems, smart cards). Common Criteria certification is sometimes specified for IT procurement. Other standards
containing, e.g., interoperation, system management, user training, supplement CC and other product standards.
Examples include the ISO/IEC 17799 (Or more properly BS 7799-1, which is now ISO/IEC 27002) or the German
Details of cryptographic implementation within the TOE are outside the scope of the CC. Instead, national standards,
like FIPS 140-2 give the specifications for cryptographic modules, and various standards specify the cryptographic
algorithms in use.
CC originated out of three standards:
• ITSEC - The European standard, developed in the early 1990s by France, Germany, the Netherlands and the UK.
It too was a unification of earlier work, such as the two UK approaches (the CESG UK Evaluation Scheme aimed
at the defence/intelligence market and the DTI Green Book aimed at commercial use), and was adopted by some
other countries, e.g. Australia.
• CTCPEC - The Canadian standard followed from the US DoD standard, but avoided several problems and was
used jointly by evaluators from both the U.S. and Canada. The CTCPEC standard was first published in May
• TCSEC - The United States Department of Defense DoD 5200.28 Std, called the Orange Book and parts of the
Rainbow Series. The Orange Book originated from Computer Security work including the Ware Report, done by
the National Security Agency and the National Bureau of Standards (the NBS eventually became NIST) in the
late 1970s and early 1980s. The central thesis of the Orange Book follows from the work done by Dave Bell and
Len LaPadula for a set of protection mechanisms.
CC was produced by unifying these pre-existing standards, predominantly so that companies selling computer
products for the government market (mainly for Defence or Intelligence use) would only need to have them
evaluated against one set of standards. The CC was developed by the governments of Canada, France, Germany, the
Netherlands, the UK, and the U.S.
Common Criteria
Testing organizations
All testing laboratories must comply with ISO 17025, and certification bodies will normally be approved against
either ISO/IEC Guide 65 or BS EN 45011.
The compliance with ISO 17025 is typically demonstrated to a National approval authority:
• In Canada, the Standards Council of Canada (SCC) accredits Common Criteria Evaluation Facilities
• In France, the comité français d’accréditation (COFRAC) accredits Common Criteria evaluation facilities,
commonly called Centres d’Evaluation de la Sécurité des Technologies de l’Information (CESTI). Evaluations are
done according to norms and standards specified by the Agence nationale de la sécurité des systemes
d’information (ANSSI).
• In the UK the United Kingdom Accreditation Service (UKAS) accredits Commercial Evaluation Facilities
• In the US, the National Institute of Standards and Technology (NIST) National Voluntary Laboratory
Accreditation Program (NVLAP) accredits Common Criteria Testing Laboratories (CCTL)
• In Germany, the Bundesamt für Sicherheit in der Informationstechnik (BSI)
Characteristics of these organizations were examined and presented at ICCC 10.
Mutual recognition arrangement
As well as the Common Criteria standard, there is also a sub-treaty level Common Criteria MRA (Mutual
Recognition Arrangement), whereby each party thereto recognizes evaluations against the Common Criteria standard
done by other parties. Originally signed in 1998 by Canada, France, Germany, the United Kingdom and the United
States, Australia and New Zealand joined 1999, followed by Finland, Greece, Israel, Italy, the Netherlands, Norway
and Spain in 2000. The Arrangement has since been renamed Common Criteria Recognition Arrangement
(CCRA) and membership continues to expand
. Within the CCRA only evaluations up to EAL 4 are mutually
recognized (Including augmentation with flaw remediation). The European countries within the former ITSEC
agreement typically recognize higher EALs as well. Evaluations at EAL5 and above tend to involve the security
requirements of the host nation's government.
List of Abbreviations
• CC: Common Criteria
• EAL: Evaluation Assurance Level
• IT: Information Technology
• PP: Protection Profile
• SF: Security Function
• SFP: Security Function Policy
• SOF: Strength of Function
• ST: Security Target
• TOE: Target of Evaluation
• TSP: TOE Security Policy
• TSF: TOE Security Functions
• TSC: TSF Scope of Control
• TSFI: TSF Interface
Common Criteria
Common Criteria is very generic; it does not directly provide a list of product security requirements or features for
specific (classes of) products: this follows the approach taken by ITSEC, but has been a source of debate to those
used to the more prescriptive approach of other earlier standards such as TCSEC and FIPS 140-2.
Value of certification
If a product is Common Criteria certified, it does not necessarily mean it is completely secure. For example, various
Microsoft Windows versions, including Windows Server 2003 and Windows XP, have been certified at EAL4+
but regular security patches for security vulnerabilities are still published by Microsoft for these Windows systems.
This is possible because the process of obtaining a Common Criteria certification allows a vendor to restrict the
analysis to certain security features and to make certain assumptions about the operating environment and the
strength of threats, if any, faced by the product in that environment. In this case, the assumptions include A.PEER:
Any other systems with which the TOE communicates are assumed to
be under the same management control and operate under the same
security policy constraints. The TOE is applicable to networked or
distributed environments only if the entire network operates under the
same constraints and resides within a single management domain. There
are no security requirements that address the need to trust external
systems or the communications links to such systems.
as contained in the Controlled Access Protection Profile (CAPP)
to which their STs refer. Based on this and other
assumptions, which are not realistic for the common use of general-purpose operating systems, the claimed security
functions of the Windows products are evaluated. Thus they should only be considered secure in the assumed,
specified circumstances, also known as the evaluated configuration, specified by Microsoft.
Whether you run Microsoft Windows in the precise evaluated configuration or not, you should apply Microsoft's
security patches for the vulnerabilities in Windows as they continue to appear. If any of these security vulnerabilities
are exploitable in the product's evaluated configuration, the product's Common Criteria certification should be
voluntarily withdrawn by the vendor. Alternatively, the vendor should re-evaluate the product to include application
of patches to fix the security vulnerabilities within the evaluated configuration. Failure by the vendor to take either of
these steps would result in involuntary withdrawal of the product's certification by the certification body of the
country in which the product was evaluated.
The certified Microsoft Windows versions remain at EAL4+ without including the application of any Microsoft
security vulnerability patches in their evaluated configuration. This shows both the limitation and strength of an
evaluated configuration.
Common Criteria
In August 2007, Government Computing News (GCN) columnist William Jackson critically examined Common
Criteria methodology and its US implementation by the Common Criteria Evaluation and Validation Scheme
In the column executives from the security industry, researchers, and representatives from the National
Information Assurance Partnership (NIAP) were interviewed. Objections outlined in the article include:
• Evaluation is a costly process (often measured in hundreds of thousands of US dollars) -- and the vendor's return
on that investment is not necessarily a more secure product
• Evaluation focuses primarily on assessing the evaluation documentation, not on the actual security, technical
correctness or merits of the product itself. For U.S. evaluations, only at EAL5 and higher do experts from the
National Security Agency participate in the analysis; and only at EAL7 is full source code analysis required.
• The effort and time necessary to prepare evaluation evidence and other evaluation-related documentation is so
cumbersome that by the time the work is completed, the product in evaluation is generally obsolete
• Industry input, including that from organizations such as the Common Criteria Vendor's Forum, generally has
little impact on the process as a whole
In a 2006 research paper, computer specialist David A. Wheeler suggested that the Common Criteria process
discriminates against Free and Open Source Software (FOSS)-centric organizations and development models.
Common Criteria assurance requirements tend to be inspired by the traditional waterfall software development
methodology. In contrast, much FOSS software is produced using modern agile paradigms. Although some have
argued that both paradigms do not align well,
others have attempted to reconcile both paradigms.
Alternative approaches
Throughout the lifetime of CC, it has not been universally adopted even by the creator nations, with, in particular,
cryptographic approvals being handled separately, such as by the Canadian / US implementation of FIPS-140, and
the CESG Assisted Products Scheme (CAPS)
in the UK.
The UK has also produced a number of alternative schemes when the timescales, costs and overheads of mutual
recognition have been found to be impeding the operation of the market:
• The CESG System Evaluation (SYSn) and Fast Track Approach (FTA) schemes for assurance of government
systems rather than generic products and services, which have now been merged into the CESG Tailored
Assurance Service (CTAS)
• The CESG Claims Tested Mark (CCT Mark), which is aimed at handling less exhaustive assurance requirements
for products and services in a cost and time efficient manner
In early 2011, NSA/CSS published a paper by Chris Salter, which proposed a Protection Profile oriented approach
towards evaluation. In this approach, communities of interest form around technology types which in turn develop
protection profiles that define the evaluation methodology for the technology type.
The objective is a more robust
evaluation. There is some concern that this may have a negative impact on mutual recognition.
Common Criteria
[1] "The Common Criteria" (http:/ / www.commoncriteriaportal.org/thecc.html). .
[2] "Common Criteria Schemes Around the World" (http:// www.yourcreativesolutions.nl/ ICCC10/ proceedings/ doc/ pp/ Eve_Pierre.pdf). .
[3] http:// www. commoncriteriaportal.org/ members. html
[4] http:// www. nist. org/news. php?extend. 37
[5] http:/ / www. niap-ccevs. org/ cc-scheme/pp/ PP_OS_CA_V1. d.pdf
[6] Under Attack: Common Criteria has loads of critics, but is it getting a bum rap (http:// www.gcn.com/ print/ 26_21/ 44857-1.html)
Government Computer News, retrieved 2007-12-14
[7] Free-Libre / Open Source Software (FLOSS) and Software Assurance (http:// www.dwheeler.com/ essays/ oss_software_assurance. pdf)
[8] Wäyrynen, J., Bodén, M., and Boström, G., Security Engineering and eXtreme Programming: An Impossible Marriage?
[9] Beznosov, Konstantin (http:/ / konstantin. beznosov. net/ professional/ ) and Kruchten, Philippe, Towards Agile Security Assurance (http:/ /
lersse-dl. ece.ubc. ca/ record/87), retrieved 2007-12-14
[10] CAPS: CESG Assisted Products Scheme (http:// www.cesg. gov.uk/ site/ caps/ index.cfm)
[11] Infosec Assurance and Certification Services (IACS) (http:// www. cesg. gov.uk/ site/ iacs/ index.cfm?menuSelected=3&displayPage=3)
[12] "Common Criteria Reforms: Better Security Products Through Increased Cooperation with Industry" (http:// www. niap-ccevs.org/
cc_docs/ CC_Community_Paper_10_Jan_2011. pdf). .
[13] "Common Criteria "Reforms"—Sink or Swim-- How should Industry Handle the Revolution Brewing with Common Criteria?" (http:/ /
community.ca.com/ blogs/ iam/ archive/2011/ 03/ 11/
common-criteria-reforms-sink-or-swim-how-should-industry-handle-the-revolution-brewing-with-common-criteria.aspx). .
External links
• The official website of the Common Criteria Project (http:// www.commoncriteriaportal.org/)
• The Common Criteria standard documents (http:/ / www.niap-ccevs. org/ cc-scheme/ cc_docs/ )
• Compliance evaluation in the United States (http:/ / www.niap-ccevs. org/cc-scheme/ )
• List of Common Criteria evaluated products (http:// www.commoncriteriaportal.org/ products. html)
• Towards Agile Security Assurance (http:/ / lersse-dl. ece.ubc.ca/ record/87)
• You may download ISO/IEC 15408 (http:/ / isotc.iso. org/livelink/ livelink/ fetch/ 2000/ 2489/ Ittf_Home/
PubliclyAvailableStandards. htm) for free because it is a publicly available standard.
• Important Common Criteria Acronyms (http:// www.corsec. com/ index. php?option=com_content&
task=blogcategory& id=20&Itemid=65)
• Common Criteria Vendors Forum (http:// www. ccvendorforum.org)
• Additional Common Criteria Information on Google Knol (http:/ / knol.google. com/ k/ hussain-shah/
common-criteria-iso-15408/ 3qj8qq7sspvdc/ 1)
Common Criteria Testing Laboratory
Common Criteria Testing Laboratory
A Common Criteria Testing Laboratory (CCTL) is an information technology (IT) computer security testing
laboratory that is accredited to conduct IT security evaluations for conformance to the Common Criteria international
In the United States the National Institute of Standards and Technology (NIST) National Voluntary Laboratory
Accreditation Program (NVLAP) accredits CCTLs to meet National Information Assurance Partnership (NIAP)
Common Criteria Evaluation and Validation Scheme requirements and conduct IT security evaluations for
conformance to the Common Criteria.
CCTL requirements
These laboratories must meet the following requirements:
• NIST Handbook 150, NVLAP Procedures and General Requirements
• NIST Handbook 150-20, NVLAP Information Technology Security Testing — Common Criteria
• NIAP specific criteria for IT security evaluations and other NIAP defined requirements
CCTLs enter into contractual agreements with sponsors to conduct security evaluations of IT products and Protection
Profiles which use the CCEVS, other NIAP approved test methods derived from the Common Criteria, Common
Methodology and other technology based sources. CCTLs must observe the highest standards of impartiality,
integrity and commercial confidentiality. CCTLs must operate within the guidelines established by the CCEVS.
To become a CCTL, a testing laboratory must go through a series of steps that involve both the NIAP Validation
Body and NVLAP. NVLAP accreditation is the primary requirement for achieving CCTL status. Some scheme
requirements that cannot be satisfied by NVLAP accreditation are addressed by the NIAP Validation Body. At
present, there are only three scheme-specific requirements imposed by the Validation Body.
NIAP approved CCTLs must agree to the following:
• Located in the U.S. and be a legal entity, duly organized and incorporated, validly existing and in good standing
under the laws of the state where the laboratory intends to do business
• Accept U.S. Government technical oversight and validation of evaluation-related activities in accordance with the
policies and procedures established by the CCEVS
• Accept U.S. Government participants in selected Common Criteria evaluations.
CCTL accreditation
A testing laboratory becomes a CCTL when the laboratory is approved by the NIAP Validation Body and is listed on
the Approved Laboratories List
To avoid unnecessary expense and delay in becoming a NIAP-approved testing laboratory, it is strongly
recommended that prospective CCTLs ensure that they are able to satisfy the scheme-specific requirements prior to
seeking accreditation from NVLAP. This can be accomplished by sending a letter of intent
to the NIAP prior to
entering the NVLAP process.
Additional laboratory-related information can be found in CCEVS publications:
• #1 Common Criteria Evaluation and Validation Scheme for Information Technology Security — Organization,
Management, and Concept of Operations and Scheme Publication
• #4 Common Criteria Evaluation and Validation Scheme for Information Technology Security — Guidance to
Common Criteria Testing Laboratories
Common Criteria Testing Laboratory
External links
• NIAP Common Criteria Evaluation and Validation Scheme
• Common Criteria Testing Laboratories
• The Common Criteria standard documents
• Common Criteria Recognition Agreement
• List of Common Criteria evaluated products
• ISO/IEC 15408
— available free as a public standard
[1] http:/ / www. niap-ccevs. org/ cctls/
[2] http:// www. niap-ccevs. org/ forms/ltr-of-intent.cfm
[3] http:/ / www. niap-ccevs. org
[4] http:/ / www. niap-ccevs. org/ cctls
[5] http:// www. niap-ccevs. org/ cc_docs
[6] http:// www. commoncriteriaportal.org
[7] http:/ / www. commoncriteriaportal.org/ products. html
[8] http:// isotc. iso. org/ livelink/ livelink/fetch/ 2000/ 2489/ Ittf_Home/PubliclyAvailableStandards. htm
Competency evaluation
In applied linguistics and educational psychology, competency evaluation is a means for teachers to determine the
ability of their students in other ways besides the standardized test.
Usually this includes portfolio assessment. In language testing, it may also include student interviews and checklists
(e.g., on a scale from 1 to 5 the student or teacher rates his or her ability to do such tasks as introducing one's
While various governments in Europe and Canada often employ competency evaluation to evaluate bilingual
abilities of public employees, it has only recently begun to receive attention from the United States where large
testing corporations have dominated language evaluation with financially lucrative performance tests such as the
TOEFL, MCAT and SAT tests.
In psychology, competency can be evaluation to cover civil affairs, such as competency to handle personal finances,
and competency to handle personal affairs (such as signing contracts).
In regards to psychology and the law, a competency evaluation is an assessment of whether a defendant is of sound
mind to stand trial. The Dusky Standard is the current U.S. Standard of competence, which states that a person must
be able to work with their defending council, and have rational as well as factual understanding of the trial and the
charges brought against them.
[1] BehaveNet Clinical Capsule (http:/ / behavenet. com/ capsules/ forensic/ Duskystandard.htm)
Continuous assessment
Continuous assessment
Continuous Assessment is the educational policy in which students are examined continuously over most of the
duration of their education, the results of which are taken into account after leaving school. It is often proposed or
used as an alternative to a final examination system.
There are several types of continuous assessment including daily in class work, course related projects and papers,
and practical work
[1] Types of Continuous Assessment (http:/ / www. cdtl. nus. edu.sg/ handbook/ assess/ types-cont.htm)
Cryptographic Module Testing Laboratory
A Cryptographic Module Testing Laboratory (CMTL) is an information technology (IT) computer security
testing laboratory that is accredited to conduct cryptographic module evaluations for conformance to the FIPS 140-2
U.S. Government standard.
The National Institute of Standards and Technology (NIST) National Voluntary Laboratory Accreditation Program
(NVLAP) accredits CMTLs to meet Cryptographic Module Validation Program (CMVP) standards and procedures.
CMTL requirements
These laboratories must meet the following requirements:
• NIST Handbook 150, NVLAP Procedures and General Requirements
• NIST Handbook 150-17 Information Technology Security Testing - Cryptographic Module Testing
• NVLAP Specific Operations Checklist for Cryptographic Module Testing
FIPS 140-2 in relation to the Common Criteria
A CMTL can also be a Common Criteria (CC) Testing Laboratory (CCTL). The CC and FIPS 140-2 are different in
the abstractness and focus of tests. FIPS 140-2 testing is against a defined cryptographic module and provides a suite
of conformance tests to four FIPS 140 security levels. FIPS 140-2 describes the requirements for cryptographic
modules and includes such areas as physical security, key management, self tests, roles and services, etc. The
standard was initially developed in 1994 - prior to the development of the CC. The CC is an evaluation against a
Protection Profile (PP), usually created by the user, or security target (ST). Typically, a PP covers a broad range of
• A CC evaluation does not supersede or replace a validation to either FIPS 140-1 or FIPS 140-2. The four security
levels in FIPS 140-1 and FIPS 140-2 do not map directly to specific CC EALs or to CC functional requirements.
A CC certificate cannot be a substitute for a FIPS 140-1 or FIPS 140-2 certificate.
If the operational environment is a modifiable operational environment, the operating system requirements of the
Common Criteria are applicable at FIPS Security Levels 2 and above.
• FIPS 140-1 required evaluated operating systems that referenced the Trusted Computer System Evaluation
Criteria (TCSEC) classes C2, B1 and B2. However, TCSEC is no longer in use and has been replaced by the
Common Criteria. Consequently, FIPS 140-2 now references the Common Criteria.
Cryptographic Module Testing Laboratory
External links
• List of CMTLs
from NIST
[1] http:/ / csrc.nist. gov/ cryptval/
Defence Evaluation and Research Agency
The Defence Evaluation and Research Agency (normally known as DERA) was a part of the UK Ministry of
Defence (MoD) until July 2, 2001. At the time it was the United Kingdom's largest science and technology
organisation. DERA was split into two organisations: a commercial firm, QinetiQ, and the Defence Science and
Technology Laboratory (Dstl).
At the split, QinetiQ was formed from the majority (about 3/4 of the staff and most of the facilities) of DERA, with
Dstl assuming responsibility for those aspects which were best done in government. A few examples of the work
undertaken by Dstl include nuclear, chemical, and biological research. In the time since the split both organisations
have undergone significant change programmes. QinetiQ has increased its focus on overseas research with a number
of US and other foreign acquisitions, whereas Dstl has a major rationalisation programme aimed at changing many
aspects of its operations.
DERA was formed in April 1995 as an amalgamation of the following organisations:
• Defence Research Agency (DRA) which was set up in April 1991 and comprised the Royal Aerospace
Establishment (RAE); Admiralty Research Establishment (ARE); Royal Armament Research and Development
Establishment (RARDE); and, Royal Signals and Radar Establishment (RSRE)
• Defence Test and Evaluation Organisation (DTEO)
• Chemical and Biological Defence Establishment (CBDE at Porton Down), which became part of the Protection
and Life Sciences Division (PLSD)
• Centre for Defence Analysis (CDA).
The chief executive throughout DERA's existence was John Chisholm. DERA's staffing level was around 9000
scientists, technologists and support staff.
External links
• The former DERA website
(Internet Archive link
• QinetiQ website
• Dstl website
[1] http:/ / www. dera.gov. uk
[2] http:// web.archive. org/web/ */ http:/ / www.dera.gov. uk/
[3] http:/ / www. QinetiQ. com
[4] http:/ / www. dstl. gov. uk
Stewart Donaldson
Stewart Donaldson
Stewart I. Donaldson is a psychologist, specializing in evaluation science and optimal human and organizational
functioning. He holds appointments as Professor and Chair of Psychology, Director of the Institute of Organizational
and Program Evaluation Research,
and Dean of the School of Behavioral and Organizational Sciences, Claremont
Graduate University in Claremont, California.
He was born in West Bromwich, England on February 28, 1961 but was raised and educated in California. He
received a B.A. in Behavioral Science with a minor in Marketing Management from California State Polytechnic
University, Pomona, M.A. in General Experimental Psychology from California State University, Fullerton, and his
Ph.D. in Psychology specializing in Organizational Behavior and Evaluation Research from Claremont Graduate
University. Before joining the faculty at Claremont Graduate University in 1995, he was on the faculty at the
University of Southern California (USC), 1990-1995.
He works with a wide range of colleagues and graduate students on applied research and evaluation projects focused
on promoting optimal human, program, community, and organizational functioning. His work has been funded by
The National Institute of Mental Health; The National Institute on Alcohol Abuse and Alcoholism; National Science
Foundation; U.S. Department of Education; National Office of Justice Programs; Office of Juvenile Justice Planning;
Center for Substance Abuse Prevention; National Institute of Allergy and Infectious Diseases; The Rockefeller
Foundation; The California Wellness Foundation; The Howard Hughes Foundation; The David and Lucille Packard
Founation; The Hillcrest Foundation; The Weingart Foundation; The Robert Ellis Simon Foundation; The Irvine
Foundation; The Fletcher Jones Foundation; The John Randolph Haynes and Dora Haynes Foundation; Riverside
County Department of Mental Health; State of California Tobacco-Related Disease Research Program; and First 5
Los Angeles among many others.
As Director of the Institute of Organizational and Program Evaluation Research
he has provided organizational
consulting, research, and evaluation services to more than 100 different organizations during the past decade.
Professional activities
Professor Donaldson has been chair or member of more than 40 doctoral dissertation committees
at Claremont,
and he also works with professionals enrolled in the Non-residential Certificate of Advanced Study in Evaluation
program there. He is currently serving a 3 year term on the Board of the American Evaluation Association.  He also
serves on the Editorial Boards of the American Journal of Evaluation, New Directions for Evaluation, Journal of
Multidisciplinary Evaluation, and SAGE Research Methods Online, is co-founder and leads the Southern California
Evaluation Association,  and served as Co-Chair of the Theory-Driven Evaluation and Program Theory Topical
Interest Group of the American Evaluation Association (AEA) from 1994-2002. Dr. Donaldson was a 1996 recipient
of the AEA's Marcia Guttentag Early Career Achievement Award, in recognition of his work on theory and method
and for accomplishments in teaching and practice of program evaluation. In 2001, he was honored with Western
Psychological Association's Outstanding Research Award.
Stewart Donaldson
His work has appeared in a broad range of peer reviewed journals and books.
Selected Books
• Donaldson, Stewart I., Csikszentmihalyi, Mihaly, & Nakamura, Jeanne (2011).  Applied Positive Psychology:
Improving Everyday Life, Health, Schools, Work and Society.  Routledge Academic.
• Mark, Melvin, Donaldson, Stewart I., & Campbell, Bernadette (2011).  Social Psychology and Evaluation.
• Donaldson, Stewart I., Christie, Christina A., & Mark, Melvin M. (2008). What Counts as Credible Evidence in
Applied Research and Evaluation Practice? Newbury Park, CA: Sage.
ISBN 1-4129-5707-9
• Donaldson, Stewart I. (2007). Program Theory-Driven Evaluation Science: Strategies and Applications. Mahwah,
NJ: Erlbaum. ISBN 0-8058-4671-9
• Donaldson, Stewart I., Berger, Dale E., & Pezdek, Kathy (2006). Applied Psychology: New Frontiers and
Rewarding Careers. Mahwah, NJ: Erlbaum. ISBN 0-8058-5349-9
• Donaldson, Stewart I. & Scriven, Michael (2003). Evaluating Social Programs and Problems: Visions for the
New Millennium. Mahwah, NJ: Erlbaum. ISBN 0-8058-4185-7
selected peer-reviewed journal articles
• Donaldson, Stewart I. & Ko, Ia. (2010).  Positive organizational psychology, behavior, and scholarship: A review
of the emerging literature and evidence base.  Journal of Positive Psychology, 5 (3), 177-191.
• LaVelle, John & Donaldson, Stewart I. (2010). University-based evaluation training programs in the United States
1980-2008: An empirical examination.  American Journal of Evaluation, 31 (1), 9-23.
• Preskill, Hallie, & Donaldson, Stewart I. (2008). Improving the evidence base for career development programs:
Making use of the evaluation profession and positive psychology movement. Advances in Developing Human
Resources, 10(1), 104-121.
• Donaldson, Stewart I. (2005). Using program theory-driven evaluation science to crack the Da Vinci Code. New
Directions for Evaluation, 106, 65-84.
• Donaldson, Stewart I., & Gooler, Laura E. (2003). Theory-driven evaluation in action: Lessons from a $20
million statewide work and health initiative. Evaluation and Program Planning, 26, 355-366.
• Donaldson, Stewart  I., Gooler, Laura E., & Scriven, Michael (2002).  Strategies for managing evaluation anxiety:
Toward a psychology of program evaluation.  American Journal of Evaluation, 23(3), 261-273.
• Donaldson, S.I., & Grant-Vallone, Elisa J. (2002). Understanding self-report bias in organizational behavior
research.  Journal of Business and Psychology, 17(2), 245-262.
[1] (http:// www. cgu.edu/ pages/ 506. asp)
[2] (http:/ / www. cgu.edu/ pages/ 154. asp)
[3] (http:/ / www. cgu.edu/ pages/ 506. asp)
[4] (http:/ / www. cgu.edu/ pages/ 3238. asp)
[5] WorldCat (http:/ / www. worldcat. org/title/ what-counts-as-credible-evidence-in-applied-research-and-evaluation-practice/oclc/
226038201& referer=brief_results)
External links
• faculty page at Claremont Graduate University (http:// www.cgu. edu/ pages/ 904. asp)
Ecological indicator
Ecological indicator
Ecological indicators are used to communicate information about ecosystems and the impact human activity has on
ecosystems to groups such as the public or government policy makers. Ecosystems are complex and ecological
indicators can help describe them in simpler terms that can be understood and used by non-scientists to make
management decisions. For example, the number of different beetle taxa found in a field can be used as an indicator
of biodiversity. 


Many different types of indicators have been developed. They can be used to reflect a variety of aspects of
ecosystems, including biological, chemical and physical. Due to this variety, the development and selection of
ecological indicators is a complex process.  
Using ecological indicators is a pragmatic approach since direct documentation of changes in ecosystems as related
to management measures, is cost and time intensive.

For example, it would be expensive and time consuming
to count every bird, plant and animal in a newly restored wetland to see if the restoration was a success. Instead a
few indicator species can be monitored to determine success of the restoration.
“It is difficult and often even impossible to characterize the functioning of a complex system, such as an
eco-agrosystem, by means of direct measurements. The size of the system, the complexity of the interactions
involved, or the difficulty and cost of the measurements needed are often crippling” 
The terms ecological indicator and environmental indicator are often used interchangeably. However, ecological
indicators are actually a sub-set of environmental indicators. Generally, environmental indicators provide
information on pressures on the environment, environmental conditions and societal responses. Ecological indicators
refer only to ecological processes.
Policy evaluation
Ecological indicators play an important role in evaluating policy regarding the environment.
Indicators contribute to evaluation of policy development by:
• Providing decision-makers and the general public with relevant information on the current state and trends in the
• Helping decision-makers better understand cause and effect relationships between the choices and practices of
businesses and policy-makers versus the environment.
• Assisting to monitor and assess the effectiveness of measures taken to increase and enhance ecological goods and
Based on the United Nations convention to combat desertification and convention for biodiversity, indicators are
planned to be built in order to evaluate the evolution of the factors. For instance, for the CCD, the Unesco-funded
Observatoire du Sahara et du Sahel (OSS) has created the Réseau d'Observatoires du Sahara et du Sahel (ROSELT)
(website [1]) as a network of cross-Saharan observatories to establish ecological indicators.
Ecological indicator
There are limitations and challenges to using indicators for evaluating policy programs.
For indicators to be useful for policy analysis, it is necessary to be able to use and compare indicator results on
different scales (local, regional, national and international). Currently, indicators face the following spatial
limitations and challenges:
1. Variable availability of data and information on local, regional and national scales.
2. Lack of methodological standards on an international scale.
3. Different ranking of indicators on an international scale which can result in different legal treatment.
4. Averaged values across a national level may hide regional and local trends.
5. When compiled, local indicators may be too diverse to provide a national result.
Indicators also face other limitations and challenges, such as:
1. Lack of reference levels, therefore it is unknown if trends in environmental change are strong or weak.
2. Indicator measures can overlap, causing over estimation of single parameters.
3. Long-term monitoring is necessary to identify long-term environmental changes.
4. Attention to more easily handled measurable indicators distracts from indicators less quantifiable such as
aesthetics, ethics or cultural values.
1. Bertollo, P. (1998). "Assessing ecosystem health in governed landscapes: A framework for developing core
indicators". Ecosystem Health 4: 33–51. doi:10.1046/j.1526-0992.1998.00069.x.
2. Girardin, P., Bockstaller, C. & Van der Werf, H. (1999). "Indicators: Tools to evaluate the environmental impacts
of farming systems". Journal of Sustainable Agriculture 13 (4): 6–21. doi:10.1300/J064v13n04_03.
3. Kurtz, J.C., Jackson, L.E. & Fisher, W.S.. (2001). "Strategies for evaluating indicators based on guidelines from
the Environmental Protection Agency’s Office of Research and Development". Ecological Indicators 1: 49–60.
4. Niemeijer, D. (2002). "Developing indicators for environmental policy: data-driven and theory-driven approaches
examined by example". Environmental Science and Policy 5 (2): 91–103. doi:10.1016/S1462-9011(02)00026-6.
5. Osinski, E., Meier, U., Büchs, W., Weickel, J., & Matzdorf, B. (2003). "Application of biotic indicators for
evaluation of sustainable land use – current procedures and future developments". Agriculture, Ecosystems and
Environment 98: 407–421. doi:10.1016/S0167-8809(03)00100-2.
6. Piorr, H.P. (2003). "Environmental policy, agri-environmental indicators and landscape indicators". Agriculture,
Ecosystems and Environment 98: 17–33. doi:10.1016/S0167-8809(03)00069-0.
External Links
• Journal of Political Ecology
• Journals of the British Ecological Society
• Institute of Ecology and Environmental Management
• Ecology and Society
• Ecology Consultants
at the Open Directory Project
• U.S. EPA's Report on the Environment
Ecological indicator
[1] http:/ / www. roselt-oss. teledection. fr
[2] http:// dizzy.library.arizona.edu/ ej/ jpe/
[3] http:/ / www. britishecologicalsociety. org/articles/ publications/ journals/
[4] http:/ / www. ieem. org.uk/
[5] http:// www. ecologyandsociety. org/
[6] http:// www. dmoz.org/Science/ Biology/ Ecology/Consultants/
[7] http:/ / www. epa. gov/ roe/
Educational assessment
Educational assessment is the process of documenting, usually in measurable terms, knowledge, skills, attitudes
and beliefs. Assessment can focus on the individual learner, the learning community (class, workshop, or other
organized group of learners), the institution, or the educational system as a whole. According to the Academic
Exchange Quarterly: "Studies of a theoretical or empirical nature (including case studies, portfolio studies,
exploratory, or experimental work) addressing the assessment of learner aptitude and preparation, motivation and
learning styles, learning outcomes in achievement and satisfaction in different educational contexts are all welcome,
as are studies addressing issues of measurable standards and benchmarks".
It is important to notice that the final purposes and assessment practices in education depends on the theoretical
framework of the practitioners and researchers, their assumptions and beliefs about the nature of human mind, the
origin of knowledge and the process of learning.
Alternate meanings
According to the Merriam-Webster online dictionary the word assessment comes from the root word assess which is
defined as:
1. to determine the rate or amount of (as a tax)
2. to impose (as a tax) according to an established rate b: to subject to a tax, charge, or levy
3. to make an official valuation of (property) for the purposes of taxation
4. to determine the importance, size, or value of (assess a problem)
5. to charge (a player or team) with a foul or penalty
Assessment in education is best described as an action "to determine the importance, size, or value of."
The term assessment is generally used to refer to all activities teachers use to help students learn and to gauge
student progress.
Though the notion of assessment is generally more complicated than the following categories
suggest, assessment is often divided for the sake of convenience using the following distinctions:
1. formative and summative
2. objective and subjective
3. referencing (criterion-referenced, norm-referenced, and ipsative)
4. informal and formal.
Educational assessment
Formative and summative
Assessment is often divided into formative and summative categories for the purpose of considering different
objectives for assessment practices.
• Summative assessment - Summative assessment is generally carried out at the end of a course or project. In an
educational setting, summative assessments are typically used to assign students a course grade. Summative
assessments are evaluative.
• Formative assessment - Formative assessment is generally carried out throughout a course or project. Formative
assessment, also referred to as "educative assessment," is used to aid learning. In an educational setting, formative
assessment might be a teacher (or peer) or the learner, providing feedback on a student's work, and would not
necessarily be used for grading purposes. Formative assessments are diagnostic.
Educational researcher Robert Stake
explains the difference between formative and summative assessment with
the following analogy:
When the cook tastes the soup, that's formative. When the guests taste the soup, that's summative.
Summative and formative assessment are often referred to in a learning context as assessment of learning and
assessment for learning respectively. Assessment of learning is generally summative in nature and intended to
measure learning outcomes and report those outcomes to students, parents, and administrators. Assessment of
learning generally occurs at the conclusion of a class, course, semester, or academic year. Assessment for learning is
generally formative in nature and is used by teachers to consider approaches to teaching and next steps for individual
learners and the class.
A common form of formative assessment is diagnostic assessment. Diagnostic assessment measures a student's
current knowledge and skills for the purpose of identifying a suitable program of learning. Self-assessment is a form
of diagnostic assessment which involves students assessing themselves. Forward-looking assessment asks those
being assessed to consider themselves in hypothetical future situations.
Performance-based assessment is similar to summative assessment, as it focuses on achievement. It is often aligned
with the standards-based education reform and outcomes-based education movement. Though ideally they are
significantly different from a traditional multiple choice test, they are most commonly associated with
standards-based assessment which use free-form responses to standard questions scored by human scorers on a
standards-based scale, meeting, falling below, or exceeding a performance standard rather than being ranked on a
curve. A well-defined task is identified and students are asked to create, produce, or do something, often in settings
that involve real-world application of knowledge and skills. Proficiency is demonstrated by providing an extended
response. Performance formats are further differentiated into products and performances. The performance may
result in a product, such as a painting, portfolio, paper, or exhibition, or it may consist of a performance, such as a
speech, athletic skill, musical recital, or reading.
Objective and subjective
Assessment (either summative or formative) is often categorized as either objective or subjective. Objective
assessment is a form of questioning which has a single correct answer. Subjective assessment is a form of
questioning which may have more than one correct answer (or more than one way of expressing the correct answer).
There are various types of objective and subjective questions. Objective question types include true/false answers,
multiple choice, multiple-response and matching questions. Subjective questions include extended-response
questions and essays. Objective assessment is well suited to the increasingly popular computerized or online
assessment format.
Some have argued that the distinction between objective and subjective assessments is neither useful nor accurate
because, in reality, there is no such thing as "objective" assessment. In fact, all assessments are created with inherent
biases built into decisions about relevant subject matter and content, as well as cultural (class, ethnic, and gender)
Educational assessment
Basis of comparison
Test results can be compared against an established criterion, or against the performance of other students, or against
previous performance:
Criterion-referenced assessment, typically using a criterion-referenced test, as the name implies, occurs when
candidates are measured against defined (and objective) criteria. Criterion-referenced assessment is often, but not
always, used to establish a person's competence (whether s/he can do something). The best known example of
criterion-referenced assessment is the driving test, when learner drivers are measured against a range of explicit
criteria (such as "Not endangering other road users").
Norm-referenced assessment (colloquially known as "grading on the curve"), typically using a norm-referenced test,
is not measured against defined criteria. This type of assessment is relative to the student body undertaking the
assessment. It is effectively a way of comparing students. The IQ test is the best known example of norm-referenced
assessment. Many entrance tests (to prestigious schools or universities) are norm-referenced, permitting a fixed
proportion of students to pass ("passing" in this context means being accepted into the school or university rather
than an explicit level of ability). This means that standards may vary from year to year, depending on the quality of
the cohort; criterion-referenced assessment does not vary from year to year (unless the criteria change).
Ipsative assessment is self comparison either in the same domain over time, or comparative to other domains within
the same student.
Informal and formal
Assessment can be either formal or informal. Formal assessment usually implies a written document, such as a test,
quiz, or paper. A formal assessment is given a numerical score or grade based on student performance, whereas an
informal assessment does not contribute to a student's final grade such as this copy and pasted discussion question.
An informal assessment usually occurs in a more casual manner and may include observation, inventories,
checklists, rating scales, rubrics, performance and portfolio assessments, participation, peer and self evaluation, and
Internal and external
Internal assessment is set and marked by the school (i.e. teachers). Students get the mark and feedback regarding the
assessment. External assessment is set by the governing body, and is marked by non-biased personnel. With external
assessment, students only receive a mark. Therefore, they have no idea how they actually performed (i.e. which
questions they answered correctly.)
Standards of quality
In general, high-quality assessments are considered those with a high level of reliability and validity. Approaches to
reliability and validity vary, however.
Educational assessment
Reliability relates to the consistency of an assessment. A reliable assessment is one which consistently achieves the
same results with the same (or similar) cohort of students. Various factors affect reliability—including ambiguous
questions, too many options within a question paper, vague marking instructions and poorly trained markers.
Traditionally, the reliability of an assessment is based on the following:
1. Temporal stability: Performance on a test is comparable on two or more separate occasions.
2. Form equivalence: Performance among examinees is equivalent on different forms of a test based on the same
3. Internal consistency: Responses on a test are consistent across questions. For example: In a survey that asks
respondents to rate attitudes toward technology, consistency would be expected in responses to the following
• "I feel very negative about computers in general."
• "I enjoy using computers."
Reliability can also be expressed in mathematical terms as: Rx = VT/Vx where Rx is the reliability in the observed
(test) score, X; Vt and Vx are the variability in ‘true’ (i.e., candidate’s innate performance) and measured test scores
respectively. The Rx can range from 0 (completely unreliable), to 1 (completely reliable). An Rx of 1 is rarely
achieved, and an Rx of 0.8 is generally considered reliable.
A valid assessment is one which measures what it is intended to measure. For example, it would not be valid to
assess driving skills through a written test alone. A more valid way of assessing driving skills would be through a
combination of tests that help determine what a driver knows, such as through a written test of driving knowledge,
and what a driver is able to do, such as through a performance assessment of actual driving. Teachers frequently
complain that some examinations do not properly assess the syllabus upon which the examination is based; they are,
effectively, questioning the validity of the exam.
Validity of an assessment is generally gauged through examination of evidence in the following categories:
1. Content – Does the content of the test measure stated objectives?
2. Criterion – Do scores correlate to an outside reference? (ex: Do high scores on a 4th grade reading test accurately
predict reading skill in future grades?)
3. Construct – Does the assessment correspond to other significant variables? (ex: Do ESL students consistently
perform differently on a writing exam than native English speakers?)
4. Face – Does the item or theory make sense, and is it seemingly correct to the expert reader?
A good assessment has both validity and reliability, plus the other quality attributes noted above for a specific
context and purpose. In practice, an assessment is rarely totally valid or totally reliable. A ruler which is marked
wrong will always give the same (wrong) measurements. It is very reliable, but not very valid. Asking random
individuals to tell the time without looking at a clock or watch is sometimes used as an example of an assessment
which is valid, but not reliable. The answers will vary between individuals, but the average answer is probably close
to the actual time. In many fields, such as medical research, educational testing, and psychology, there will often be a
trade-off between reliability and validity. A history test written for high validity will have many essay and
fill-in-the-blank questions. It will be a good measure of mastery of the subject, but difficult to score completely
accurately. A history test written for high reliability will be entirely multiple choice. It isn't as good at measuring
knowledge of history, but can easily be scored with great precision. We may generalize from this. The more reliable
our estimate is of what we purport to measure, the less certain we are that we are actually measuring that aspect of
attainment. It is also important to note that there are at least thirteen sources of invalidity, which can be estimated for
individual students in test situations. They never are. Perhaps this is because their social purpose demands the
Educational assessment
absence of any error, and validity errors are usually so high that they would destabilize the whole assessment
It is well to distinguish between "subject-matter" validity and "predictive" validity. The former, used widely in
education, predicts the score a student would get on a similar test but with different questions. The latter, used
widely in the workplace, predicts performance. Thus, a subject-matter-valid test of knowledge of driving rules is
appropriate while a predictively-valid test would assess whether the potential driver could follow those rules.
Testing standards
In the field of psychometrics, the Standards for Educational and Psychological Testing
place standards about
validity and reliability, along with errors of measurementand related considerations under the general topic of test
construction, evaluation and documentation. The second major topic covers standards related to fairness in testing,
including fairness in testing and test use, the rights and responsibilities of test takers, testing individuals of diverse
linguistic backgrounds, and testing individuals with disabilities. The third and final major topic covers standards
related to testing applications, including the responsibilities of test users, psychological testing and assessment,
educational testing and assessment, testing in employment and credentialing, plus testing in program evaluation and
public policy.
Evaluation standards
In the field of evaluation, and in particular educational evaluation, the Joint Committee on Standards for Educational
has published three sets of standards for evaluations. "The Personnel Evaluation Standards"
published in 1988, The Program Evaluation Standards (2nd edition)
was published in 1994, and The Student
Evaluation Standards
was published in 2003.
Each publication presents and elaborates a set of standards for use in a variety of educational settings. The standards
provide guidelines for designing, implementing, assessing and improving the identified form of evaluation. Each of
the standards has been placed in one of four fundamental categories to promote educational evaluations that are
proper, useful, feasible, and accurate. In these sets of standards, validity and reliability considerations are covered
under the accuracy topic. For example, the student accuracy standards help ensure that student evaluations will
provide sound, accurate, and credible information about student learning and performance.
Summary table of the main theoretical frameworks
The following table summarizes the main theoretical frameworks behind almost all the theoretical and research
work, and the instructional practices in education (one of them being, of course, the practice of assessment). These
different frameworks have given rise to interesting debates among scholars.
Hume: British empiricism Kant, Descartes: Continental
Hegel, Marx: cultural dialectic
Mechanistic/Operation of a Machine or
Organismic/Growth of a Plant Contextualist/Examination of a Historical
B. F. Skinner (behaviorism)/ Herb
Simon, John Anderson, Robert Gagné:
Jean Piaget/Robbie Case Lev Vygotsky, Luria, Bruner/Alan Collins,
Jim Greeno, Ann Brown, John Bransford
Nature of Mind Initially blank device that detects
patterns in the world and operates on
them. Qualitatively identical to lower
animals, but quantitatively superior.
Organ that evolved to acquire
knowledge by making sense of the
world. Uniquely human, qualitatively
different from lower animals.
Unique among species for developing
language, tools, and education.
Educational assessment
Nature of
Hierarchically organized associations
that present an accurate but incomplete
representation of the world. Assumes
that the sum of the components of
knowledge is the same as the whole.
Because knowledge is accurately
represented by components, one who
demonstrates those components is
presumed to know
General and/or specific cognitive and
conceptual structures, constructed by
the mind and according to rational
criteria. Essentially these are the
higher-level structures that are
constructed to assimilate new info to
existing structure and as the structures
accommodate more new info.
Knowledge is represented by ability to
solve new problems.
Distributed across people, communities,
and physical environment. Represents
culture of community that continues to
create it. To know means to be attuned to
the constraints and affordances of systems
in which activity occurs. Knowledge is
represented in the regularities of successful
Nature of
Learning (the
process by which
knowledge is
increased or
Forming and strengthening cognitive or
S-R associations. Generation of
knowledge by (1) exposure to pattern,
(2) efficiently recognizing and
responding to pattern (3) recognizing
patterns in other contexts.
Engaging in active process of making
sense of ("rationalizing") the
environment. Mind applying existing
structure to new experience to
rationalize it. You don't really learn the
components, only structures needed to
deal with those components later.
Increasing ability to participate in a
particular community of practice. Initiation
into the life of a group, strengthening
ability to participate by becoming attuned
to constraints and affordances.
Features of
Assess knowledge components. Focus
on mastery of many components and
fluency. Use psychometrics to
Assess extended performance on new
problems. Credit varieties of
Assess participation in inquiry and social
practices of learning (e.g. portfolios,
observations) Students should participate
in assessment process. Assessments should
be integrated into larger environment.
Concerns over how best to apply assessment practices across public school systems have largely focused on
questions about the use of high stakes testing and standardized tests, often used to gauge student progress, teacher
quality, and school-, district-, or state-wide educational success.
No Child Left Behind
For most researchers and practitioners, the question is not whether tests should be administered at all—there is a
general consensus that, when administered in useful ways, tests can offer useful information about student progress
and curriculum implementation, as well as offering formative uses for learners.
The real issue, then, is whether
testing practices as currently implemented can provide these services for educators and students.
In the U.S., the No Child Left Behind Act mandates standardized testing nationwide. These tests align with state
curriculum and link teacher, student, district, and state accountability to the results of these tests. Proponents of
NCLB argue that it offers a tangible method of gauging educational success, holding teachers and schools
accountable for failing scores, and closing the achievement gap across class and ethnicity.
Opponents of standardized testing dispute these claims, arguing that holding educators accountable for test results
leads to the practice of "teaching to the test." Additionally, many argue that the focus on standardized testing
encourages teachers to equip students with a narrow set of skills that enhance test performance without actually
fostering a deeper understanding of subject matter or key principles within a knowledge domain.
High-stakes testing
The assessments which have caused the most controversy in the U.S. are the use of high school graduation
examinations, which are used to deny diplomas to students who have attended high school for four years, but cannot
demonstrate that they have learned the required material. Opponents say that no student who has put in four years of
seat time should be denied a high school diploma merely for repeatedly failing a test, or even for not knowing the
required material.


Educational assessment
High-stakes tests have been blamed for causing sickness and test anxiety in students and teachers, and for teachers
choosing to narrow the curriculum towards what the teacher believes will be tested. In an exercise designed to make
children comfortable about testing, a Spokane, Washington newspaper published a picture of a monster that feeds on
The published image is purportedly the response of a student who was asked to draw a picture of what she
thought of the state assessment.
Other critics, such as Washington State University's Don Orlich, question the use of test items far beyond standard
cognitive levels for students' age.
Compared to portfolio assessments, simple multiple-choice tests are much less expensive, less prone to disagreement
between scorers, and can be scored quickly enough to be returned before the end of the school year. Standardized
tests (all students take the same test under the same conditions) often use multiple-choice tests for these reasons.
Orlich criticizes the use of expensive, holistically graded tests, rather than inexpensive multiple-choice "bubble
tests", to measure the quality of both the system and individuals for very large numbers of students.
prominent critics of high-stakes testing include Fairtest and Alfie Kohn.
The use of IQ tests has been banned in some states for educational decisions, and norm-referenced tests, which rank
students from "best" to "worst", have been criticized for bias against minorities. Most education officials support
criterion-referenced tests (each individual student's score depends solely on whether he answered the questions
correctly, regardless of whether his neighbors did better or worse) for making high-stakes decisions.
21st century assessment
It has been widely noted that with the emergence of social media and Web 2.0 technologies and mindsets, learning is
increasingly collaborative and knowledge increasingly distributed across many members of a learning community.
Traditional assessment practices, however, focus in large part on the individual and fail to account for
knowledge-building and learning in context. As researchers in the field of assessment consider the cultural shifts that
arise from the emergence of a more participatory culture, they will need to find new methods of applying
assessments to learners.
Assessment in a democratic school
Sudbury model of democratic education schools do not perform and do not offer assessments, evaluations,
transcripts, or recommendations, asserting that they do not rate people, and that school is not a judge; comparing
students to each other, or to some standard that has been set is for them a violation of the student's right to privacy
and to self-determination. Students decide for themselves how to measure their progress as self-starting learners as a
process of self-evaluation: real life-long learning and the proper educational assessment for the 21st century, they
According to Sudbury schools, this policy does not cause harm to their students as they move on to life outside the
school. However, they admit it makes the process more difficult, but that such hardship is part of the students
learning to make their own way, set their own standards and meet their own goals.
The no-grading and no-rating policy helps to create an atmosphere free of competition among students or battles for
adult approval, and encourages a positive cooperative environment amongst the student body.
The final stage of a Sudbury education, should the student choose to take it, is the graduation thesis. Each student
writes on the topic of how they have prepared themselves for adulthood and entering the community at large. This
thesis is submitted to the Assembly, who reviews it. The final stage of the thesis process is an oral defense given by
the student in which they open the floor for questions, challenges and comments from all Assembly members. At the
end, the Assembly votes by secret ballot on whether or not to award a diploma.
Educational assessment
[1] "Educational Assessment". Academic Exchange Quarterly, available at Rapidintellect.com (http:/ / rapidintellect.com/ AEQweb/ ontass).
Retrieved January 28, 2009.
[2] Merriam-Webster Dictionary (2005). Available at Dictionary.reference.com (http:// dictionary. reference.com/ browse/ assess). Retrieved on
[3] Black, Paul, & William, Dylan (October 1998). "Inside the Black Box: Raising Standards Through Classroom Assessment."Phi Beta Kappan.
Available at PDKintl.org (http:// www. pdkintl. org/kappan/ kbla9810. htm). Retrieved January 28, 2009.
[4] http:/ / www. ed. uiuc. edu/ circe/ Robert_Stake.html
[5] Scriven, M. (1991). Evaluation thesaurus. 4th ed. Newbury Park, CA:Sage Publications. ISBN 0-8039-4364-4.
[6] Earl, Lorna (2003). Assessment as Learning: Using Classroom Assessment to Maximise Student Learning. Thousand Oaks, CA, Corwin
Press. ISBN 0-7619-4626-8. Available at (http:// www. wyoaac.org/ Lit/assessment for learning of learning as learning - Earl.
pdfWYOAAC.org), Accessed January 23, 2009.
[7] Reed, Daniel. "Diagnostic Assessment in Language Teaching and Learning." Center for Language Education and Research, available at
Google.com (http:// www. google. com/ url?sa=t& source=web& ct=res&cd=2& url=http:// clear.msu. edu/ clear/ newsletter/ files/
fall2006. pdf& ei=HNKBSeOuHYH8tgfS7rwZ&usg=AFQjCNFPkla4C_1Uyr1EOvg-nCLX0I9Pgw&sig2=_f3pOANBQc1cO6s7ZPexBg).
Retrieved January 28, 2009.
[8] Joint Information Systems Committee (JISC). "What Do We Mean by e-Assessment?" JISC InfoNet, available at (http:/ / www.jiscinfonet.
ac. uk/ InfoKits/ effective-use-of-VLEs/e-assessment/ assess-overviewJISCinfonet. ac.uk). Retrieved January 29, 2009.
[9] Educational Technologies at Virginia Tech. "Assessment Purposes." VirginiaTech DesignShop: Lessons in Effective Teaching, available at
Edtech.vt.edu (http:/ / www.edtech. vt. edu/ edtech/ id/ assess/ purposes. html). Retrieved January 29, 2009.
[10] Valencia, Sheila W. "What Are the Different Forms of Authentic Assessment?" Understanding Authentic Classroom-Based Literacy
Assessment (1997), available at Eduplace.com (http:/ / www. eduplace.com/ rdg/ res/ litass/ forms.html). Retrieved January 29, 2009.
[11] Yu, Chong Ho (2005). "Reliability and Validity." Educational Assessment. Available at Creative-wisdom.com (http:// www.
creative-wisdom. com/ teaching/ assessment/ reliability.html). Retrieved January 29, 2009.
[12] Vergis A, Hardy K (2010). "Principles of Assessment: A Primer for Medical Educators in the Clinical Years" (http:// www. ispub.com/
journal/ the_internet_journal_of_medical_education/volume_1_number_1_74/article_printable/
principles-of-assessment-a-primer-for-medical-educators-in-the-clinical-years-4. html). The Internet Journal of Medical Education 1 (1). .
[13] Moskal, Barbara M., & Leydens, Jon A (2000). "Scoring Rubric Development: Validity and Reliability." Practical Assessment, Research &
Evaluation, 7(10). Retrieved January 30, 2009 from (http:// PAREonline.net/ getvn.asp?v=7& n=10PAREonline.net)
[14] Vergis A, Hardy K (2010). "Principles of Assessment: A Primer for Medical Educators in the Clinical Years" (http:/ / www. ispub.com/
journal/ the_internet_journal_of_medical_education/volume_1_number_1_74/article_printable/
principles-of-assessment-a-primer-for-medical-educators-in-the-clinical-years-4. html). The Internet Journal of Medical Education 1 (1). .
[15] The Standards for Educational and Psychological Testing (http:// www.apa.org/ science/ standards. html#overview)
[16] Joint Committee on Standards for Educational Evaluation (http:/ / www.wmich.edu/ evalctr/ jc/ )
[17] Joint Committee on Standards for Educational Evaluation. (1988). " The Personnel Evaluation Standards: How to Assess Systems for
Evaluating Educators. (http:/ / www.wmich. edu/ evalctr/ jc/ PERSTNDS-SUM.htm)" Newbury Park, CA: Sage Publications.
[18] Joint Committee on Standards for Educational Evaluation. (1994). The Program Evaluation Standards, 2nd Edition. (http:/ / www.wmich.
edu/ evalctr/ jc/ PGMSTNDS-SUM.htm) Newbury Park, CA: Sage Publications.
[19] Committee on Standards for Educational Evaluation. (2003). The Student Evaluation Standards: How to Improve Evaluations of Students.
(http:/ / www. wmich. edu/ evalctr/ jc/ briefing/ses/ ) Newbury Park, CA: Corwin Press.
[20] American Psychological Association. "Appropriate Use of High-Stakes Testing in Our Nation's Schools." APA Online, available at APA.org
(http:// www. apa. org/ pubs/ info/ brochures/testing. aspx), Retrieved January 24, 2010
[21] (nd) Reauthorization of NCLB (http:// www.ed. gov/ nclb/ landing.jhtml). Department of Education. Retrieved 1/29/09.
[22] (nd) What's Wrong With Standardized Testing? (http:/ / www.fairtest. org/facts/ whatwron.htm) FairTest.org. Retrieved January 29, 2009.
[23] Dang, Nick (18 March 2003). "Reform education, not exit exams" (http:/ / dailybruin.ucla. edu/ stories/ 2003/ mar/18/
reform-education-not-exit-exam/). Daily Bruin. . "One common complaint from failed test-takers is that they weren't taught the tested
material in school. Here, inadequate schooling, not the test, is at fault. Blaming the test for one's failure is like blaming the service station for a
failed smog check; it ignores the underlying problems within the 'schooling vehicle.'"
[24] Weinkopf, Chris (2002). "Blame the test: LAUSD denies responsibility for low scores" (http:// www. thefreelibrary.com/ BLAME+ THE+
TEST+LAUSD+DENIES+RESPONSIBILITY+FOR+ LOW+SCORES-a086659557). Daily News. . "The blame belongs to 'high-stakes
tests' like the Stanford 9 and California's High School Exit Exam. Reliance on such tests, the board grumbles, 'unfairly penalizes students that
have not been provided with the academic tools to perform to their highest potential on these tests'."
[25] "Blaming The Test" (http:// old. investors. com/ editorial/editorialcontent.asp?secid=1501& status=article& id=155734&secure=3598).
Investor's Business Daily. 11 May 2006. . "A judge in California is set to strike down that state's high school exit exam. Why? Because it's
working. It's telling students they need to learn more. We call that useful information. To the plaintiffs who are suing to stop the use of the test
as a graduation requirement, it's something else: Evidence of unequal treatment... the exit exam was deemed unfair because too many students
who failed the test had too few credentialed teachers. Well, maybe they did, but granting them a diploma when they lack the required
knowledge only compounds the injustice by leaving them with a worthless piece of paper.""
Educational assessment
[26] ASD.wednet.edu (http:// www2. asd. wednet. edu/ Pioneer/barnard/projects/ 04-05/ art/WhatsaWASL/ index. html)
[27] Bach, Deborah, & Blanchard, Jessica (April 19, 2005). "WASL worries stress kids, schools." Seattle Post-Intelligencer. Retrieved January
30, 2009 from Seattlepi.nwsource.com (http:// www.seattlepi. com/ local/ 220713_wasl19. html).
[28] Fadel, Charles, Honey, Margaret, & Pasnik, Shelley (May 18, 2007). "Assessment in the Age of Innovation." Education Week. Retrieved
January 29, 2009 from (http:/ / www.edweek. org/ login. html?source=http:// www.edweek.org/ ew/ articles/ 2007/ 05/ 23/ 38fadel.h26.
html&destination=http:/ / www.edweek. org/ ew/ articles/ 2007/ 05/ 23/ 38fadel.h26. html& levelId=2100Edweek.org).
[29] Greenberg, D. (2000). 21st Century Schools, (http:/ / sudburyvalleyschool. org/essays/ 102008. shtml) edited transcript of a talk delivered at
the April 2000 International Conference on Learning in the 21st Century.
[30] Greenberg, D. (1987). Chapter 20,Evaluation, Free at Last — The Sudbury Valley School.
[31] Graduation Thesis Procedure (http:/ / mountainlaurelsudbury. org/thesis-procedure.asp), Mountain Laurel Sudbury School.
External links
• Edutopia: Assessment Overview (http:/ / edutopia.org/php/ keyword.php?id=005) A collection of media and
articles on the topic of assessment from The George Lucas Educational Foundation
• The Standards for Educational and Psychological Testing (http:// www.apa.org/science/ standards. html)
• Joint Committee on Standards for Educational Evaluation (http:/ / www.wmich. edu/ evalctr/jc/ )
• Creating Good MCQs (http:/ / focalworks.in/ resources/ white_papers/ creating_assessments/ 1-1.html) A
whitepaper by Focalworks
• Assessment 2.0 (http:// www. scribd. com/ doc/ 461041/ Assessment-20) Modernizing assessment
Educational evaluation
Educational evaluation is the evaluation process of characterizing and appraising some aspect/s of an educational
There are two common purposes in educational evaluation which are, at times, in conflict with one another.
Educational institutions usually require evaluation data to demonstrate effectiveness to funders and other
stakeholders, and to provide a measure of performance for marketing purposes. Educational evaluation is also a
professional activity that individual educators need to undertake if they intend to continuously review and enhance
the learning they are endeavoring to facilitate.
Standards for educational evaluation
The Joint Committee on Standards for Educational Evaluation published three sets of standards for educational
evaluations. The Personnel Evaluation Standards
was published in 1988, The Program Evaluation
Standards (2nd edition)
was published in 1994, and The Student Evaluations Standards
was published
in 2003.
Each publication presents and elaborates a set of standards for use in a variety of educational settings. The standards
provide guidelines for designing, implementing, assessing and improving the identified form of evaluation. Each of
the standards has been placed in one of four fundamental categories to promote evaluations that are proper, useful,
feasible, and accurate.
Educational evaluation
The Personnel Evaluation Standards
• The propriety standards require that evaluations be conducted legally, ethically, and with due regard for the
welfare of evaluatees and clients involved in.
• The utility standards are intended to guide evaluations so that they will be informative, timely, and influential.
• The feasibility standards call for evaluation systems that are as easy to implement as possible, efficient in their
use of time and resources, adequately funded, and viable from a number of other standpoints.
• The accuracy standards require that the obtained information be technically accurate and that conclusions be
linked logically to the data.
The Program Evaluation Standards
• The utility standards are intended to ensure that an evaluation will serve the information needs of intended users.
• The feasibility standards are intended to ensure that an evaluation will be realistic, prudent, diplomatic, and
• The propriety standards are intended to ensure that an evaluation will be conducted legally, ethically, and with
due regard for the welfare of those involved in the evaluation, as well as those affected by its results.
• The accuracy standards are intended to ensure that an evaluation will reveal and convey technically adequate
information about the features that determine worth or merit of the program being evaluated.
The Student Evaluation Standards
• The Propriety standards help ensure that student evaluations are conducted lawfully, ethically, and with regard to
the rights of students and other persons affected by student evaluation.
• The Utility standards promote the design and implementation of informative, timely, and useful student
• The Feasibility standards help ensure that student evaluations are practical; viable; cost-effective; and culturally,
socially, and politically appropriate.
• The Accuracy standards help ensure that student evaluations will provide sound, accurate, and credible
information about student learning and performance.
Criticism of educational evaluation
Evaluation in a democratic school
Sudbury model of democratic education schools do not perform and do not offer evaluations, assessments,
transcripts, or recommendations, asserting that they do not rate people, and that school is not a judge; comparing
students to each other, or to some standard that has been set is for them a violation of the student's right to privacy
and to self-determination. Students decide for themselves how to measure their progress as self-starting learners as a
process of self-evaluation: real life-long learning and the proper educational evaluation for the 21st Century, they
According to Sudbury schools, this policy does not cause harm to their students as they move on to life outside the
school. However, they admit it makes the process more difficult, but that such hardship is part of the students
learning to make their own way, set their own standards and meet their own goals.
The no-grading and no-rating policy helps to create an atmosphere free of competition among students or battles for
adult approval, and encourages a positive co-operative environment amongst the student body.
The final stage of a Sudbury education, should the student choose to take it, is the graduation thesis. Each student
writes on the topic of how they have prepared themselves for adulthood and entering the community at large. This
thesis is submitted to the Assembly, who reviews it. The final stage of the thesis process is an oral defense given by
Educational evaluation
the student in which they open the floor for questions, challenges and comments from all Assembly members. At the
end, the Assembly votes by secret ballot on whether or not to award a diploma.
1. Joint Committee on Standards for Educational Evaluation. (1988). The Personnel Evaluation Standards: How to
Assess Systems for Evaluating Educators.
Newbury Park, CA: Sage Publications.
2. Joint Committee on Standards for Educational Evaluation. (1994). The Program Evaluation Standards, 2nd
Newbury Park, CA: Sage Publications.
3. Committee on Standards for Educational Evaluation. (2003). The Student Evaluation Standards: How to Improve
Evaluations of Students.
Newbury Park, CA: Corwin Press.
[1] Greenberg, D. (2000). 21st Century Schools, (http:// sudburyvalleyschool. org/essays/ 102008.shtml) edited transcript of a talk delivered at
the April 2000 International Conference on Learning in the 21st Century.
[2] Greenberg, D. (1987). Chapter 20, Evaluation, (http:/ / books. google.com/ books?id=es2nOuZE0rAC& pg=PA95&dq=Greenberg+
Evaluation,+ Free+at+ Last+ + The+Sudbury+Valley+School#v=onepage& q=& f=false) Free at Last — The Sudbury Valley School.
[3] Graduation Thesis Procedure, (http:// mountainlaurelsudbury. org/thesis-procedure.asp) Mountain Laurel Sudbury School.
[4] http:// www. wmich. edu/ evalctr/ jc/ PERSTNDS-SUM.htm
[5] http:/ / www. wmich. edu/ evalctr/ jc/ PGMSTNDS-SUM.htm
[6] http:/ / www. wmich. edu/ evalctr/ jc/ briefing/ses/
External links
• American Evaluation Association (http:// www. eval. org)
• Topical interest groups (TIGs) (http:// www. eval.org/ TIGs/tig. html)
• Assessment in Higher Education (http:/ / www.tamu. edu/ marshome/ Forum.html)
• Distance Education and Other Educational Technologies (http:// www.courses. dsu. edu/ TIG/)
• Extension Education Evaluation (http:/ / danr.ucop. edu/ eee-aea/)
• Graduate Student and New Evaluators (http:/ / evaluation. wmich. edu/ evalgrad/)
• PreK-12 Educational Evaluation (http:// www.evaluand.com/ aeaprek12/ )
• Teaching of Evaluation (http:/ / home. okstate. edu/ homepages. nsf/ toc/ tigtoe2)
• American Educational Research Association (http:/ / www. aera. net)
• Division H School Evaluation & Program Development (http:// www.aera.net/ divisions/ ?id=73)
• Standards for Educational and Psychological Testing (http:// www.aera.net/ AERAShopper/ProductDetails.
• Assessment in Higher Education (http:/ / ahe. cqu. edu. au) web site.
• Joint Committee on Standards for Educational Evaluation (http:// www.wmich. edu/ evalctr/jc/ )
• The EvaluationWiki (http:/ / www. evaluationwiki. org) - The mission of EvaluationWiki is to make freely
available a compendium of up-to-date information and resources to everyone involved in the science and practice
of evaluation. The EvaluationWiki is presented by the non-profit Evaluation Resource Institute (http:// www.
evaluationwiki. org/wiki/ index.php/ Evaluation_Resource_Institute).
• Wisconsin Center for Education Research (http:/ / www.wcer.wisc. edu)
Encomium is a Latin word deriving from the Classical Greek ἐγκώμιον (encomion) meaning the praise of a person
or thing. "Encomium" also refers to several distinct aspects of rhetoric:
• A general category of oratory
• A method within rhetorical pedagogy
• A figure of speech. As a figure, encomium means praising a person or thing, but occurring on a smaller scale than
an entire speech.
• The eighth exercise in the progymnasmata series
• A literary genre that included five elements: prologue, birth and upbringing, acts of the person's life, comparisons
used to praise the subject, and an epilogue.
• Gorgias's Encomium of Helen is one of the most famous historical encomia. In it, Gorgias offers several
justifications for excusing Helen of Troy's adultery—notably, that she was persuaded by speech, which is a
"powerful lord" or "powerful drug" depending on the translation.
• In Erasmus's Praise of Folly, Folly composes an encomium to herself. It is an ironic encomium because being
praised by Folly is backwards praise; therefore, Folly praising herself is an ironic conundrum.
• De Pippine regis Victoria Avarica, a medieval encomium of victory of Pepin of Italy over the Avars
• Encomium Emmae, a medieval encomium of Queen Emma of Normandy
• Versum de Mediolano civitate, a medieval encomium of Milan
• Versus de Verona, a medieval encomium of Verona
• Polychronion, chanted in the liturgy of Churches which follow the Byzantine Rite
• A kind of encomium is used by the Christian writer Paul in his praise of love in 1 Corinthians 13. The prologue is
verses 1-3, acts are v. 4-7, comparison is v. 8-12, and epilogue is 13:13-14:1.
[1] David E. Garland, Baker Exegetical Commentary, 1 Corinthians, 606, based on the work of Sigountos.
Evaluation approaches
Evaluation approaches
Evaluation approaches are conceptually distinct ways of thinking about, designing and conducting evaluation
efforts. Many of the evaluation approaches in use today make unique contributions to solving important problems,
while others refine existing approaches in some way. Classification systems intended to sort out unique approaches
from variations on a theme are presented here to help identify some basic schools of thought for conducting an
evaluation. After these approaches are identified, they are summarized in terms of a few important attributes.
Since the mid 1960s, the number of alternative approaches to conducting evaluation efforts has increased
dramatically. Factors such as the United States Elementary and Secondary Education Act of 1965 that required
educators to evaluate their efforts and results, and the growing public concern for accountability of human service
programs contributed to this growth. In addition, over this period of time there has been an international movement
towards encouraging evidence based practice in all professions and in all sectors. Evidence Based Practice (EBP)
requires evaluations to deliver the information needed to determine what is the best way of achieving results.
Classification of approaches
Two classifications of evaluation approaches by House
and Stufflebeam & Webster
were combined
by Frisbie
into a manageable number of approaches in terms of their unique and important underlying
principles. The general structures of these classification systems are discussed first. The structures then are combined
to present a more detailed classification of fifteen evaluation approaches.
House considers all major evaluation approaches to be based on a common ideology, liberal democracy. Important
principles of this ideology include freedom of choice, the uniqueness of the individual, and empirical inquiry
grounded in objectivity. He also contends they all are based on subjectivist ethics, in which ethical conduct is based
on the subjective or intuitive experience of an individual or group. One form of subjectivist ethics is utilitarian, in
which “the good” is determined by what maximizes some single, explicit interpretation of happiness for society as a
whole. Another form of subjectivist ethics is intuitionist / pluralist, in which no single interpretation of “the good” is
assumed and these interpretations need not be explicitly stated nor justified.
These ethical positions have corresponding epistemologies—philosophies of obtaining knowledge. The objectivist
epistemology is associated with the utilitarian ethic. In general, it is used to acquire knowledge capable of external
verification (intersubjective agreement) through publicly inspectable methods and data. The subjectivist
epistemology is associated with the intuitionist/pluralist ethic. It is used to acquire new knowledge based on existing
personal knowledge and experiences that are (explicit) or are not (tacit) available for public inspection.
House further divides each epistemological approach by two main political perspectives. Approaches can take an
elite perspective, focusing on the interests of managers and professionals. They also can take a mass perspective,
focusing on consumers and participatory approaches.
Stufflebeam and Webster place approaches into one of three groups according to their orientation toward the role of
values, an ethical consideration. The political orientation promotes a positive or negative view of an object
regardless of what its value might actually be. They call this pseudo-evaluation. The questions orientation includes
approaches that might or might not provide answers specifically related to the value of an object. They call this
quasi-evaluation. The values orientation includes approaches primarily intended to determine the value of some
object. They call this true evaluation.
Table 1 is used to classify fifteen evaluation approaches in terms of epistemology, major perspective (from House),
and orientation (from Stufflebeam & Webster). When considered simultaneously, these three dimensions produce
twelve cells. Only seven of the cells contain approaches, although all four true evaluation cells contain at least one
Evaluation approaches
Table 1
Classification of approaches for conducting evaluations
based on epistemology, major perspective, and orientation
Major perspective Orientation
(True evaluation)
Politically controlled
Public relations
Experimental research
Management information
Testing programs
Content analysis
Policy studies
Accountability Consumer-oriented
Note. Epistemology and major perspective from House (1978). Orientation from Stufflebeam & Webster (1980).
Two pseudo-evaluation approaches, politically controlled and public relations studies, are represented. They are
based on an objectivist epistemology from an elite perspective.
Six quasi-evaluation approaches use an objectivist epistemology. Five of them—experimental research, management
information systems, testing programs, objectives-based studies, and content analysis—take an elite perspective.
Accountability takes a mass perspective.
Seven true evaluation approaches are included. Two approaches, decision-oriented and policy studies, are based on
an objectivist epistemology from an elite perspective. Consumer-oriented studies are based on an objectivist
epistemology from a mass perspective. Two approaches—accreditation/certification and connoisseur studies—are
based on a subjectivist epistemology from an elite perspective. Finally, adversary and client-centered studies are
based on a subjectivist epistemology from a mass perspective.
Summary of approaches
The preceding section was used to distinguish between fifteen evaluation approaches in terms of their epistemology,
major perspective, and orientation to values. This section is used to summarize each of the fifteen approaches in
enough detail so that those placed in the same cell of Table 1 can be distinguished from each other.
Table 2 is used to summarize each approach in terms of four attributes—organizer, purpose, strengths, and
weaknesses. The organizer represents the main considerations or cues practitioners use to organize a study. The
purpose represents the desired outcome for a study at a very general level. Strengths and weaknesses represent other
attributes that should be considered when deciding whether to use the approach for a particular study. The following
narrative highlights differences between approaches grouped into the same cell of Table 1.
Evaluation approaches
Table 2
Summary of approaches for conducting evaluations
Approach Attribute
Organizer Purpose Key strengths Key weaknesses
Politically controlled Threats Get, keep or increase influence,
power or money.
Secure evidence advantageous
to the client in a conflict.
Violates the principle of full &
frank disclosure.
Public relations Propaganda needs Create positive public image. Secure evidence most likely to
bolster public support.
Violates the principles of balanced
reporting, justified conclusions, &
Determine causal relationships
between variables.
Strongest paradigm for
determining causal
Requires controlled setting, limits
range of evidence, focuses
primarily on results.
information systems
Continuously supply evidence
needed to fund, direct, &
control programs.
Gives managers detailed
evidence about complex
Human service variables are rarely
amenable to the narrow,
quantitative definitions needed.
Testing programs Individual
Compare test scores of
individuals & groups to
selected norms.
Produces valid & reliable
evidence in many performance
areas. Very familiar to public.
Data usually only on testee
performance, overemphasizes
test-taking skills, can be poor
sample of what is taught or
Objectives-based Objectives Relates outcomes to objectives. Common sense appeal, widely
used, uses behavioral
objectives & testing
Leads to terminal evidence often
too narrow to provide basis for
judging to value of a program.
Content analysis Content of a
Describe & draw conclusion
about a communication.
Allows for unobtrusive
analysis of large volumes of
unstructured, symbolic
Sample may be unrepresentative
yet overwhelming in volume.
Analysis design often overly
simplistic for question.
Accountability Performance
Provide constituents with an
accurate accounting of results.
Popular with constituents.
Aimed at improving quality of
products and services.
Creates unrest between
practitioners & consumers. Politics
often forces premature studies.
Decision-oriented Decisions Provide a knowledge & value
base for making & defending
Encourages use of evaluation
to plan & implement needed
programs. Helps justify
decisions about plans &
Necessary collaboration between
evaluator & decision-maker
provides opportunity to bias
Policy studies Broad issues Identify and assess potential
costs & benefits of competing
Provide general direction for
broadly focused actions.
Often corrupted or subverted by
politically motivated actions of
Consumer-oriented Generalized needs
& values, effects
Judge the relative merits of
alternative goods & services.
Independent appraisal to
protect practitioners &
consumers from shoddy
products & services. High
public credibility.
Might not help practitioners do a
better job. Requires credible &
competent evaluators.
Accreditation /
Standards &
Determine if institutions,
programs, & personnel should
be approved to perform
specified functions.
Helps public make informed
decisions about quality of
organizations & qualifications
of personnel.
Standards & guidelines typically
emphasize intrinsic criteria to the
exclusion of outcome measures.
Evaluation approaches
Connoisseur Critical guideposts Critically describe, appraise, &
illuminate an object.
Exploits highly developed
expertise on subject of interest.
Can inspire others to more
insightful efforts.
Dependent on small number of
experts, making evaluation
susceptible to subjectivity, bias,
and corruption.
Adversary “Hot” issues Present the pro & cons of an
Ensures balances presentations
of represented perspectives.
Can discourage cooperation,
heighten animosities.
Client-centered Specific concerns
& issues
Foster understanding of
activities & how they are
valued in a given setting &
from a variety of perspectives.
Practitioners are helped to
conduct their own evaluation.
Low external credibility,
susceptible to bias in favor of
Note. Adapted and condensed primarily from House (1978) and Stufflebeam & Webster (1980).
Politically controlled and public relations studies are based on an objectivist epistemology from an elite perspective.
Although both of these approaches seek to misrepresent value interpretations about some object, they go about it a
bit differently. Information obtained through politically controlled studies is released or withheld to meet the special
interests of the holder.
Public relations information is used to paint a positive image of an object regardless of the actual situation. Neither
of these approaches is acceptable evaluation practice, although the seasoned reader can surely think of a few
examples where they have been used.
Objectivist, elite, quasi-evaluation
As a group, these five approaches represent a highly respected collection of disciplined inquiry approaches. They are
considered quasi-evaluation approaches because particular studies can legitimately focus only on questions of
knowledge without addressing any questions of value. Such studies are, by definition, not evaluations. These
approaches can produce characterizations without producing appraisals, although specific studies can produce both.
Each of these approaches serves its intended purpose well. They are discussed roughly in order of the extent to
which they approach the objectivist ideal.
Experimental research is the best approach for determining causal relationships between variables. The potential
problem with using this as an evaluation approach is that its highly controlled and stylized methodology may not be
sufficiently responsive to the dynamically changing needs of most human service programs.
Management information systems (MISs) can give detailed information about the dynamic operations of complex
programs. However, this information is restricted to readily quantifiable data usually available at regular intervals.
Testing programs are familiar to just about anyone who has attended school, served in the military, or worked for a
large company. These programs are good at comparing individuals or groups to selected norms in a number of
subject areas or to a set of standards of performance. However, they only focus on testee performance and they might
not adequately sample what is taught or expected.
Objectives-based approaches relate outcomes to prespecified objectives, allowing judgments to be made about their
level of attainment. Unfortunately, the objectives are often not proven to be important or they focus on outcomes too
narrow to provide the basis for determining the value of an object.
Content analysis is a quasi-evaluation approach because content analysis judgments need not be based on value
statements. Instead, they can be based on knowledge. Such content analyses are not evaluations. On the other hand,
when content analysis judgments are based on values, such studies are evaluations.
Evaluation approaches
Objectivist, mass, quasi-evaluation
Accountability is popular with constituents because it is intended to provide an accurate accounting of results that
can improve the quality of products and services. However, this approach quickly can turn practitioners and
consumers into adversaries when implemented in a heavy-handed fashion.
Objectivist, elite, true evaluation
Decision-oriented studies are designed to provide a knowledge base for making and defending decisions. This
approach usually requires the close collaboration between an evaluator and decision-maker, allowing it to be
susceptible to corruption and bias.
Policy studies provide general guidance and direction on broad issues by identifying and assessing potential costs
and benefits of competing policies. The drawback is these studies can be corrupted or subverted by the politically
motivated actions of the participants.
Objectivist, mass, true evaluation
Consumer-oriented studies are used to judge the relative merits of goods and services based on generalized needs
and values, along with a comprehensive range of effects. However, this approach does not necessarily help
practitioners improve their work, and it requires a very good and credible evaluator to do it well.
Subjectivist, elite, true evaluation
Accreditation / certification programs are based on self-study and peer review of organizations, programs, and
personnel. They draw on the insights, experience, and expertise of qualified individuals who use established
guidelines to determine if the applicant should be approved to perform specified functions. However, unless
performance-based standards are used, attributes of applicants and the processes they perform often are
overemphasized in relation to measures of outcomes or effects.
Connoisseur studies are designed to provide a knowledge base for making and defending decisions. This approach
usually requires the close collaboration between an evaluator and decision-maker, allowing it to be susceptible to
corruption and bias.
Subjectivist, mass, true evaluation
The adversary approach focuses on drawing out the pros and cons of controversial issues through quasi-legal
proceedings. This helps ensure a balanced presentation of different perspectives on the issues, but it is also likely to
discourage later cooperation and heighten animosities between contesting parties if “winners” and “losers” emerge.
Client-centered studies address specific concerns and issues of practitioners and other clients of the study in a
particular setting. These studies help people understand the activities and values involved from a variety of
perspectives. However, this responsive approach can lead to low external credibility and a favorable bias toward
those who participated in the study.
Evaluation approaches
Notes and references
1. House, E. R. (1978). Assumptions underlying evaluation models. Educational Researcher. 7(3), 4-12.
2. Stufflebeam, D. L., & Webster, W. J. (1980). An analysis of alternative approaches to evaluation. Educational
Evaluation and Policy Analysis. 2(3), 5-19.
3. Frisbie, R. D. (1986). The use of microcomputer programs to improve the reliability and validity of content
analysis in evaluation.
Doctoral dissertation, Western Michigan University.
[1] http:/ / web.ics. purdue.edu/ ~rfrisbie/Professional/ index. htm#ContentAnalysisEvaluation
Evaluation Assurance Level
The Evaluation Assurance Level (EAL1 through EAL7) of an IT product or system is a numerical grade assigned
following the completion of a Common Criteria security evaluation, an international standard in effect since 1999.
The increasing assurance levels reflect added assurance requirements that must be met to achieve Common Criteria
certification. The intent of the higher levels is to provide higher confidence that the system's principal security
features are reliably implemented. The EAL level does not measure the security of the system itself, it simply states
at what level the system was tested.
To achieve a particular EAL, the computer system must meet specific assurance requirements. Most of these
requirements involve design documentation, design analysis, functional testing, or penetration testing. The higher
EALs involve more detailed documentation, analysis, and testing than the lower ones. Achieving a higher EAL
certification generally costs more money and takes more time than achieving a lower one. The EAL number assigned
to a certified system indicates that the system completed all requirements for that level.
Although every product and system must fulfill the same assurance requirements to achieve a particular level, they
do not have to fulfill the same functional requirements. The functional features for each certified product are
established in the Security Target document tailored for that product's evaluation. Therefore, a product with a higher
EAL is not necessarily "more secure" in a particular application than one with a lower EAL, since they may have
very different lists of functional features in their Security Targets. A product's fitness for a particular security
application depends on how well the features listed in the product's Security Target fulfill the application's security
requirements. If the Security Targets for two products both contain the necessary security features, then the higher
EAL should indicate the more trustworthy product for that application.
Assurance levels
EAL1: Functionally Tested
EAL1 is applicable where some confidence in correct operation is required, but the threats to security are not viewed
as serious. It will be of value where independent assurance is required to support the contention that due care has
been exercised with respect to the protection of personal or similar information. EAL1 provides an evaluation of the
TOE (Target of Evaluation) as made available to the customer, including independent testing against a specification,
and an examination of the guidance documentation provided. It is intended that an EAL1 evaluation could be
successfully conducted without assistance from the developer of the TOE, and for minimal cost. An evaluation at
this level should provide evidence that the TOE functions in a manner consistent with its documentation, and that it
provides useful protection against identified threats.
Evaluation Assurance Level
EAL2: Structurally Tested
EAL2 requires the cooperation of the developer in terms of the delivery of design information and test results, but
should not demand more effort on the part of the developer than is consistent with good commercial practice. As
such it should not require a substantially increased investment of cost or time. EAL2 is therefore applicable in those
circumstances where developers or users require a low to moderate level of independently assured security in the
absence of ready availability of the complete development record. Such a situation may arise when securing legacy
EAL3: Methodically Tested and Checked
EAL3 permits a conscientious developer to gain maximum assurance from positive security engineering at the
design stage without substantial alteration of existing sound development practices. EAL3 is applicable in those
circumstances where developers or users require a moderate level of independently assured security, and require a
thorough investigation of the TOE and its development without substantial re-engineering.
EAL4: Methodically Designed, Tested, and Reviewed
EAL4 permits a developer to gain maximum assurance from positive security engineering based on good
commercial development practices which, though rigorous, do not require substantial specialist knowledge, skills,
and other resources. EAL4 is the highest level at which it is likely to be economically feasible to retrofit to an
existing product line. EAL4 is therefore applicable in those circumstances where developers or users require a
moderate to high level of independently assured security in conventional commodity TOEs and are prepared to incur
additional security-specific engineering costs.
Commercial operating systems that provide conventional, user-based security features are typically evaluated at
EAL4. Examples of such operating systems are AIX,
FreeBSD, Novell NetWare, Solaris,
Linux Enterprise Server 9,

SUSE Linux Enterprise Server 10,
Red Hat Enterprise Linux 5,
Windows 2000
Service Pack 3, Windows 2003,

Windows XP

, Windows 7,

and Windows Server 2008 R2

Operating systems that provide multilevel security are evaluated at a minimum of EAL4. Examples include Trusted
Solaris, Solaris 10 Release 11/06 Trusted Extensions,
an early version of the XTS-400, and VMware ESXi version
3.5 and 4.0 (EAL 4+).
EAL5: Semiformally Designed and Tested
EAL5 permits a developer to gain maximum assurance from security engineering based upon rigorous commercial
development practices supported by moderate application of specialist security engineering techniques. Such a TOE
will probably be designed and developed with the intent of achieving EAL5 assurance. It is likely that the additional
costs attributable to the EAL5 requirements, relative to rigorous development without the application of specialized
techniques, will not be large. EAL5 is therefore applicable in those circumstances where developers or users require
a high level of independently assured security in a planned development and require a rigorous development
approach without incurring unreasonable costs attributable to specialist security engineering techniques.
Numerous smart card devices have been evaluated at EAL5, as have multilevel secure devices such as the Tenix
Interactive Link. XTS-400 (STOP 6) is a general-purpose operating system which has been evaluated at EAL5
LPAR on IBM System z is EAL5 Certified.
Evaluation Assurance Level
EAL6: Semiformally Verified Design and Tested
EAL6 permits developers to gain high assurance from application of security engineering techniques to a rigorous
development environment in order to produce a premium TOE for protecting high value assets against significant
risks. EAL6 is therefore applicable to the development of security TOEs for application in high risk situations where
the value of the protected assets justifies the additional costs.
Green Hills Software's INTEGRITY-178B RTOS has been certified to EAL6 augmented.
EAL7: Formally Verified Design and Tested
EAL7 is applicable to the development of security TOEs for application in extremely high risk situations and/or
where the high value of the assets justifies the higher costs. Practical application of EAL7 is currently limited to
TOEs with tightly focused security functionality that is amenable to extensive formal analysis. The Tenix Interactive
Link Data Diode Device and the Fox Data Diode
have been evaluated at EAL7 augmented.
Open Kernel Labs has also performed formal verification of their seL4 microkernel OS,
allowing devices running
seL4 to achieve EAL7.
Implications of assurance levels
Technically speaking, a higher EAL means nothing more, or less, than that the evaluation completed a more
stringent set of quality assurance requirements. It is often assumed that a system that achieves a higher EAL will
provide its security features more reliably (and the required third-party analysis and testing performed by security
experts is reasonable evidence in this direction), but there is little or no published evidence to support that
Impact on cost and schedule
In 2006, the US Government Accountability Office published a report on Common Criteria evaluations that
summarized a range of costs and schedules reported for evaluations performed at levels EAL2 through EAL4.
Range of completion times and costs for
Common Criteria evaluations at EAL2 through
In the mid to late 1990s, vendors reported spending US$1 million and even US$2.5 million on evaluations
comparable to EAL4. There have been no published reports of the cost of the various Microsoft Windows security
Evaluation Assurance Level
Augmentation of EAL requirements
In some cases, the evaluation may be augmented to include assurance requirements beyond the minimum required
for a particular EAL. Officially this is indicated by following the EAL number with the word augmented and
usually with a list of codes to indicate the additional requirements. As shorthand, vendors will often simply add a
"plus" sign (as in EAL4+) to indicate the augmented requirements.
EAL notation
The Common Criteria standards denote EALs as shown in this article: the prefix "EAL" concatenated with a digit 1
through 7 (Examples: EAL1, EAL3, EAL5). In practice, some countries place a space between the prefix and the
digit (EAL 1, EAL 3, EAL 5). The use of a plus sign to indicate augmentation is an informal shorthand used by
product vendors (EAL4+ or EAL 4+).
[1] Common Criteria certified product list (http:// www.commoncriteriaportal.org/products_OS. html#OS)
[2] Certification Report for SUSE Linux Enterprise Server 9 (http:// www.commoncriteriaportal.org/ files/ epfiles/ 0256a. pdf)
[3] SUSE Linux Enterprise Server 10 EAL4 Certificate (http:/ / www. niap-ccevs. org/cc-scheme/ st/ ?vid=10271)
[4] Red Hat Enterprise Linux Version 5 EAL4 Certificate (http:/ / www.niap-ccevs.org/cc-scheme/ st/ ?vid=10125)
[5] Windows Platform Products Awarded Common Criteria EAL 4 Certification (http:/ / www.microsoft.com/ presspass/ press/ 2005/ dec05/
12-14CommonCriteriaPR. mspx#Microsoft)
[6] Microsoft Windows 7, Windows Server 2008 R2 and SQL Server 2008 SP2 Now Certified as Common Criteria Validated Products (http://
technet.microsoft.com/ en-us/ library/dd229319. aspx)
[7] Solaris 10 Release 11/06 Trusted Extensions EAL 4+ Certification Report (http:// www.sun. com/ software/security/ securitycert/ docs/
Solaris_10_TX_CR_v1. 0_11_june_PDF. pdf)
[8] VMware Infrastructure Earns Security Certification for Stringent Government Standards (http:/ / www.vmware.com/ company/ news/
[9] IBM System z Security (http:// www-03.ibm. com/ systems/ z/ security/ ccs_certification.html); IBM System z partitioning achieves highest
certification (http:// www-03.ibm. com/ systems/ z/ security/ certification.html)
[10] Fox Data Diode Certifications (http:// www. datadiode. eu/ technology/ certifications)
[11] http:// www. ok-labs. com/ whitepapers/ sample/ sel4-formal-verification-of-an-os-kernel
[12] http:/ / www. ok-labs. com/ releases/ release/ ok-labs-and-galois-partner-in-original-research-for-ultra-secure-systems
External links
• GAO (March 2006) (PDF). INFORMATION ASSURANCE: National Partnership Offers Benefits, but Faces
Considerable Challenges (http:/ / www. gao. gov/ new. items/ d06392.pdf). Report GAO-06-392. United States
Government Accountability Office. Retrieved 2006-07-10.
• Smith, Richard (October 2000). "Trends in Government Endorsed Security Product Evaluations" (http:// www.
csrc. nist. gov/ nissc/ 2000/ proceedings/ papers/ 032. pdf) (PDF). Proc. 20th National Information Systems
Security Conference. Retrieved 2006-07-10.
• CCEVS Validated Products List (http:// www.niap-ccevs. org/vpl/ )
• Common Criteria Assurance Level information from IACS (http:// www.cesg. gov. uk/ site/ iacs/ index.
cfm?menuSelected=1& displayPage=13)
• IBM AIX operating system certifications (http:/ / www-03.ibm. com/ servers/ aix/ products/ aixos/ certifications/
• Microsoft Windows and the Common Criteria Certification (http:/ / www.windowsecurity. com/ articles/
• SUSE Linux awarded government security cert (http:/ / www.linuxsecurity. com/ content/ view/ 118374/ 65/ )
• XTS-400 information (http:/ / www. baesystems. com/ ProductsServices/ bae_prod_csit_xts400. html)
• Understanding the Windows EAL4 Evaluation (http:// web.archive.org/web/ 20060527063317/ http:/ / eros. cs.
jhu. edu/ ~shap/ NT-EAL4.html)
Evaluation Assurance Level
• Charu Chaubal (February 2007) (PDF). Security Design of the VMware Infrastructure 3 Architecture (http:/ /
www.vmware.com/ pdf/ vi3_security_architecture_wp.pdf). 20070215 Item: WP-013-PRD-01-01. VMware,
Inc.. Retrieved 2008-11-19.
Expression (mathematics)
In mathematics, an expression is a finite combination of symbols that are well-formed according to the rules
applicable in the context at hand. Symbols can designate values (constants), variables, operations, relations, or can
constitute punctuation or other syntactic entities. The use of expressions can range from simple arithmetic operations
to more complicated constructs that can include variables, functions, factorials, summations, derivatives and
integrals, like
However a construction that violates the syntactic rules like
is not well-formed, and therefore not an expression.
In algebra an expression may be used to designate a value, which value might depend on values assigned to variables
occurring in the expression; the determination of this value depends on the semantics attached to the symbols of the
expression. These semantic rules may declare that certain expressions do not designate any value; such expressions
are said to have an undefined value, but they are well-formed expressions nonetheless. In general the meaning of
expressions is not limited to designating values; for instance, an expression might designate a condition, or an
equation that is to be solved, or it can be viewed as an object in its own right that can be manipulated according to
certain rules. Certain expressions that designate a value simultaneously express a condition that is assumed to hold,
for instance those involving the operator to designate an internal direct sum.
Being an expression is a syntactic concept; although different mathematical fields have different notions of valid
expressions, the values associated to variables does not play a role. See formal language for general considerations
on how expressions are constructed, and formal semantics for questions concerning attaching meaning (values) to
Many mathematical expressions include letters called variables. Any variable can be classified as being either a free
variable or a bound variable.
For a given combination of values for the free variables, an expression may be evaluated, although for some
combinations of values of the free variables, the value of the expression may be undefined. Thus an expression
represents a function whose inputs are the value assigned the free variables and whose output is the resulting value of
the expression.
For example, the expression
evaluated for x = 10, y = 5, will give 2; but is undefined for y = 0.
Expression (mathematics)
The evaluation of an expression is dependent on the definition of the mathematical operators and on the system of
values that is its context.
Two expressions are said to be equivalent if, for each combination of values for the free variables, they have the
same output, i.e., they represent the same function. Example:
The expression
has free variable x, bound variable n, constants 1, 2, and 3, two occurrences of an implicit multiplication operator,
and a summation operator. The expression is equivalent with the simpler expression 12x. The value for x = 3 is 36.
The '+' and '−' (addition and subtraction) symbols have their usual meanings. Division can be expressed either with
the '/' or with a horizontal dash. Thus
are perfectly valid. Also, for multiplication one can use the symbols '×' or a '·' (mid dot), or else simply omit it
(multiplication is implicit); so:
are all acceptable. However, notice in the first example above how the "times" symbol resembles the letter 'x' and
also how the '·' symbol resembles a decimal point, so to avoid confusion it's best to use one of the later two forms.
An expression must be well-formed. That is, the operators must have the correct number of inputs, in the correct
places. The expression 2 + 3 is well formed; the expression * 2 + is not, at least, not in the usual notation of
Expressions and their evaluation were formalised by Alonzo Church and Stephen Kleene
in the 1930s in their
lambda calculus. The lambda calculus has been a major influence in the development of modern mathematics and
computer programming languages.
One of the more interesting results of the lambda calculus is that the equivalence of two expressions in the lambda
calculus is in some cases undecidable. This is also true of any expression in any system that has power equivalent to
the lambda calculus.
[1] Introduction to Algebra (http:// www.mathleague. com/ help/ algebra/algebra.htm)
[2] TalkTalk Reference Encyclopedia (http:/ / www. talktalk. co.uk/ reference/encyclopaedia/ hutchinson/ m0006748. html)
[3] Biographical Memoir of Stephen Kleene (http:// www.nap.edu/ html/ biomems/ skleene. html)
[4] Programming Languages and Lambda Calculi (http:// www.cs. utah.edu/ plt/ publications/ pllc. pdf)
Expression (computer science)
Expression (computer science)
An expression in a programming language is a combination of explicit values, constants, variables, operators, and
functions that are interpreted according to the particular rules of precedence and of association for a particular
programming language, which computes and then produces (returns, in a stateful environment) another value. This
process, like for mathematical expressions, is called evaluation. The value can be of various types, such as
numerical, string, and logical.
For example, 2+3 is an arithmetic and programming expression which evaluates to 5. A variable is an expression
because it is a pointer to a value in memory, so y+6 is an expression. An example of a relational expression is 4==4
which evaluates to true.

In C and most C-derived languages, a call to a function with a void return type is a valid expression, of type void.
Values of type void cannot be used, so the value of such an expression is always thrown away.
A function, and hence an expression containing a function, may have side effects. An expression with side effects
does not normally have the property of referential transparency. In many languages (e.g. C++) statements may be
ended with a semicolon ';' to turn the expression into an expression statement. This asks the implementation to
evaluate the expression for its side-effects only, and disregard the result of the expression.
[1] Javascript expressions, Mozilla (https:/ / developer. mozilla. org/en/ Core_JavaScript_1.5_Guide/ Expressions) Accessed July 6, 2009
[2] Programming in C (https:// www.cs. drexel.edu/ ~rweaver/COURSES/ ISTC-2/TOPICS/expr.html) Accessed July 6, 2009
[3] ISO/IEC 9899:1999 (http:/ / www.open-std. org/JTC1/ SC22/ WG14/ www/ docs/ n1256. pdf) section, accessed August 31, 2009
External links
• Expression (http:/ / foldoc.org/index.cgi?Expression), from the Free On-line Dictionary of Computing
Formation evaluation
Formation evaluation
In petroleum exploration and development, formation evaluation is used to determine the ability of a borehole to
produce petroleum. Essentially, it is the process of "recognizing a commercial well when you drill one".
Modern rotary drilling usually uses a heavy mud as a lubricant and as a means of producing a confining pressure
against the formation face in the borehole, preventing blowouts. Only in rare, catastrophic cases and in Hollywood
movies, do oil and gas wells come in with a fountain of gushing oil. In real life, that is a blowout—and usually also a
financial and environmental disaster. But controlling blowouts has drawbacks—mud filtrate soaks into the formation
around the borehole and a mud cake plasters the sides of the hole. These factors obscure the possible presence of oil
or gas in even very porous formations. Further complicating the problem is the widespread occurrence of small
amounts of petroleum in the rocks of many sedimentary provinces. In fact, if a sedimentary province is absolutely
barren of traces of petroleum, one is probably foolish to continue drilling there.
The formation evaluation problem is a matter of answering two questions:
1. What are the lower limits for porosity, permeability and upper limits for water saturation that permit profitable
production from a particular formation or pay zone; in a particular geographic area; in a particular economic
2. Do any of the formations in the well under consideration exceed these lower limits.
It is complicated by the impossibility of directly examining the formation. It is, in short, the problem of looking at
the formation indirectly.
Formation Evaluation Definition
"What is Formation Evaluation?
Formation Evaluation (FE) is the process of interpreting a combination of measurements taken inside a
wellbore to detect and quantify oil and gas reserves in the rock adjacent to the well. FE data can be
gathered with wireline logging instruments [...] or logging-while-drilling tools [...]. Data are organized
and interpreted by depth and represented on a graph called a log."
Formation evaluation tools
Tools to detect oil and gas have been evolving for over a century. The simplest and most direct tool is well cuttings
examination. Some older oilmen ground the cuttings between their teeth and tasted to see if crude oil was present.
Today, a wellsite geologist or mudlogger uses a low powered stereoscopic microscope to determine the lithology of
the formation being drilled and to estimate porosity and possible oil staining. A portable ultraviolet light chamber or
"Spook Box" is used to examine the cuttings for fluorescence. Fluorescence can be an indication of crude oil
staining, or of the presence of fluorescent minerals. They can be differentiated by placing the cuttings in a solvent
filled watchglass or dimple dish. The solvent is usually carbon tetrachlorethane. Crude oil dissolves and then
redeposits as a fluorescent ring when the solvent evaporates. The written strip chart recording of these examinations
is called a sample log or mudlog.
Well cuttings examination is a learned skill. During drilling, chips of rock, usually less than about 1/8 inch (6 mm)
across, are cut from the bottom of the hole by the bit. Mud, jetting out of holes in the bit under high pressure, washes
the cuttings away and up the hole. During their trip to the surface they may circulate around the turning drillpipe,
mix with cuttings falling back down the hole, mix with fragments caving from the hole walls and mix with cuttings
travelling faster and slower in the same upward direction. They then are screened out of the mudstream by the shale
shaker and fall on a pile at its base. Determining the type of rock being drilled at any one time is a matter of knowing
the 'lag time' between a chip being cut by the bit and the time it reaches the surface where it is then examined by the
wellsite geologist (or mudlogger as they are sometimes called). A sample of the cuttings taken at the proper time will
Formation evaluation
contain the current cuttings in a mixture of previously drilled material. Recognizing them can be very difficult at
times, for example after a "bit trip" when a couple of miles of drill pipe has been extracted and returned to the hole in
order to replace a dull bit. At such a time there is a flood of foreign material knocked from the borehole walls
(cavings), making the mudloggers task all the more difficult.
One way to get more detailed samples of a formation is by coring. Two techniques commonly used at present. The
first is the "whole core", a cylinder of rock, usually about 3" to 4" in diameter and up to 50 feet (15 m) to 60 feet
(18 m) long. It is cut with a "core barrel", a hollow pipe tipped with a ring-shaped diamond chip-studded bit that can
cut a plug and bring it to the surface. Often the plug breaks while drilling, usually in shales or fractures and the core
barrel jams, slowly grinding the rocks in front of it to powder. This signals the driller to give up on getting a full
length core and to pull up the pipe.
Taking a full core is an expensive operation that usually stops or slows drilling for at least the better part of a day. A
full core can be invaluable for later reservoir evaluation. Once a section of well has been drilled, there is, of course,
no way to core it without drilling another well.
The other, cheaper, technique for obtaining samples of the formation is "Sidewall Coring". In this method, a steel
cylinder—a coring gun—has hollow-point steel bullets mounted along its sides and moored to the gun by short steel
cables. The coring gun is lowered to the bottom of the interval of interest and the bullets are fired individually as the
gun is pulled up the hole. The mooring cables ideally pull the hollow bullets and the enclosed plug of formation
loose and the gun carries them to the surface. Advantages of this technique are low cost and the ability to sample the
formation after it has been drilled. Disadvantages are possible non-recovery because of lost or misfired bullets and a
slight uncertainty about the sample depth. Sidewall cores are often shot "on the run" without stopping at each core
point because of the danger of differential sticking. Most service company personnel are skilled enough to minimize
this problem, but it can be significant if depth accuracy is important.
A serious problem with cores is the change they undergo as they are brought to the surface. It might seem that
cuttings and cores are very direct samples but the problem is whether the formation at depth will produce oil or gas.
Sidewall cores are deformed and compacted and fractured by the bullet impact. Most full cores from any significant
depth expand and fracture as they are brought to the surface and removed from the core barrel. Both types of core
can be invaded or even flushed by mud, making the evaluation of formation fluids difficult. The formation analyst
has to remember that all tools give indirect data.
Mud logging
Mud logging (or Wellsite Geology) is a well logging process in which drilling mud and drill bit cuttings from the
formation are evaluated during drilling and their properties recorded on a strip chart as a visual analytical tool and
stratigraphic cross sectional representation of the well. The drilling mud which is analyzed for hydrocarbon gases, by
use of a gas chromatograph, contains drill bit cuttings which are visually evaluated by a mudlogger and then
described in the mud log. The total gas, chromatograph record, lithological sample, pore pressure, shale
density,D-exponent, etc. (all lagged parameters because they are circulated up to the surface from the bit) are plotted
along with surface parameters such as rate of penetration (ROP), Weight On Bit (WOB),rotation per minute etc. on
the mudlog which serve as a tool for the mudlogger, drilling engineers, mud engineers, and other service personnel
charged with drilling and producing the well.
• Also See Mudlogger and Wellsite Geologist
Formation evaluation
Electric logs
In 1928, the Schlumberger brothers in France developed the workhorse of all formation evaluation tools: the electric
log. Electric logs have been improved to a high degree of precision and sophistication since that time, but the basic
principle has not changed. Most underground formations contain water, often salt water, in their pores. The
resistance to electric current of the total formation—rock and fluids—around the borehole is the sum of the
volumetric proportions of mineral grains and conductive water-filled pore space. If the pores are partially filled with
gas or oil, which are resistant to the passage of electrical current, the bulk formation resistance is higher than for
water filled pores. For the sake of a convenient comparison from measurement to measurement, the electrical
logging tools measure the resistance of a cubic meter of formation. This measurement is called resistivity.
Modern resistivity logging tools fall into two categories, Laterolog and Induction, with various commercial names,
depending on the company providing the logging services.
Laterolog tools send an electric current from an electrode on the sonde directly into the formation. The return
electrodes are located either on surface or on the sonde itself. Complex arrays of electrodes on the sonde (guard
electrodes) focus the current into the formation and prevent current lines from fanning out or flowing directly to the
return electrode through the borehole fluid. Most tools vary the voltage at the main electrode in order to maintain a
constant current intensity. This voltage is therefore proportional to the resistivity of the formation. Because current
must flow from the sonde to the formation, these tools only work with conductive borehole fluid. Actually, since the
resistivity of the mud is measured in series with the resistivity of the formation, laterolog tools give best results when
mud resistivity is low with respect to formation resistivity, i.e., in salty mud.
Induction logs use an electric coil in the sonde to generate an alternating current loop in the formation by induction.
This is the same physical principle as is used in electric transformers. The alternating current loop, in turn, induces a
current in a receiving coil located elsewhere on the sonde. The amount of current in the receiving coil is proportional
to the intensity of current loop, hence to the conductivity (reciprocal of resistivity) of the formation. Multiple
transmitting and receiving coils are used to focus formation current loops both radially (depth of investigation) and
axially (vertical resolution). Until the late 80’s, the workhorse of induction logging has been the 6FF40 sonde which
is made up of six coils with a nominal spacing of 40 inches (1000 mm). Since the 90’s all major logging companies
use so-called array induction tools. These comprise a single transmitting coil and a large number of receiving coils.
Radial and axial focusing is performed by software rather than by the physical layout of coils. Since the formation
current flows in circular loops around the logging tool, mud resistivity is measured in parallel with formation
resistivity. Induction tools therefore give best results when mud resistivity is high with respect to formation
resistivity, i.e., fresh mud or non-conductive fluid. In oil-base mud, which is non conductive, induction logging is the
only option available.
Until the late 1950s electric logs, mud logs and sample logs comprised most of the oilman's armamentarium.
Logging tools to measure porosity and permeability began to be used at that time. The first was the microlog. This
was a miniature electric log with two sets of electrodes. One measured the formation resistivity about 1/2" deep and
the other about 1"-2" deep. The purpose of this seemingly pointless measurement was to detect permeability.
Permeable sections of a borehole wall develop a thick layer of mudcake during drilling. Mud liquids, called filtrate,
soak into the formation, leaving the mud solids behind to -ideally- seal the wall and stop the filtrate "invasion" or
soaking. The short depth electrode of the microlog sees mudcake in permeable sections. The deeper 1" electrode sees
filtrate invaded formation. In nonpermeable sections both tools read alike and the traces fall on top of each other on
the stripchart log. In permeable sections they separate.
Also in the late 1950s porosity measuring logs were being developed. The two main types are: nuclear porosity logs
and sonic logs.
Formation evaluation
Porosity logs
The two main nuclear porosity logs are the Density and the Neutron log.
Density logging tools contain a Caesium-137 gamma ray source which irradiates the formation with 662 keV gamma
rays. These gamma rays interact with electrons in the formation through Compton scattering and lose energy. Once
the energy of the gamma ray has fallen below 100 keV, photolectric absorption dominates: gamma rays are
eventually absorbed by the formation. The amount of energy loss by Compton scattering is related to the number
electrons per unit volume of formation. Since for most elements of interest (below Z = 20) the ratio of atomic
weight, A, to atomic number, Z, is close to 2, gamma ray energy loss is related to the amount of matter per unit
volume, i.e., formation density.
A gamma ray detector located some distance from the source, detects surviving gamma rays and sorts them into
several energy windows. The number of high-energy gamma rays is controlled by compton scattering, hence by
formation density. The number of low-energy gamma rays is controlled by photoelectric absorption, which is
directly related to the average atomic number, Z, of the formation, hence to lithology. Modern density logging tools
include two or three detectors, which allow compensation for some borehole effects, in particular for the presence of
mud cake between the tool and the formation.
Since there is a large contrast between the density of the minerals in the formation and the density of pore fluids,
porosity can easily be derived from measured formation bulk density if both mineral and fluid densities are known.
Neutron porosity logging tools contain an Americium-Beryllium neutron source, which irradiates the formation with
neutrons. These neutrons lose energy through elastic collisions with nuclei in the formation. Once their energy has
decreased to thermal level, they diffuse randomly away from the source and are ultimately absorbed by a nucleus.
Hydrogen atoms have essentially the same mass as the neutron; therefore hydrogen is the main contributor to the
slowing down of neutrons. A detector at some distance from the source records the number of neutron reaching this
point. Neutrons that have been slowed down to thermal level have a high probability of being absorbed by the
formation before reaching the detector. The neutron counting rate is therefore inversely related to the amount of
hydrogen in the formation. Since hydrogen is mostly present in pore fluids (water, hydrocarbons) the count rate can
be converted into apparent porosity. Modern neutron logging tools usually include two detectors to compensate for
some borehole effects. Porosity is derived from the ratio of count rates at these two detectors rather than from count
rates at a single detector.
The combination of neutron and density logs takes advantage of the fact that lithology has opposite effects on these
two porosity measurements. The average of neutron and density porosity values is usually close to the true porosity,
regardless of lithology. Another advantage of this combination is the "gas effect." Gas, being less dense than liquids,
translates into a density-derived porosity that is too high. Gas, on the other hand, has much less hydrogen per unit
volume than liquids: neutron-derived porosity, which is based on the amount of hydrogen, is too low. If both logs are
displayed on compatible scales, they overlay each other in liquid-filled clean formations and are widely separated in
gas-filled formations.
Sonic logs use a pinger and microphone arrangement to measure the velocity of sound in the formation from one end
of the sonde to the other. For a given type of rock, acoustic velocity varies indirectly with porosity. If the velocity of
sound through solid rock is taken as a measurement of 0 % porosity, a slower velocity is an indication of a higher
porosity that is usually filled with formation water with a slower sonic velocity.
Both sonic and density-neutron logs give porosity as their primary information. Sonic logs read farther away from
the borehole so they are more useful where sections of the borehole are caved. Because they read deeper, they also
tend to average more formation than the density-neutron logs do. Modern sonic configurations with pingers and
microphones at both ends of the log, combined with computer analysis, minimize the averaging somewhat.
Averaging is an advantage when the formation is being evaluated for seismic parameters, a different area of
formation evaluation. A special log, the Long Spaced Sonic, is sometimes used for this purpose. Seismic signals (a
single undulation of a sound wave in the earth) average together tens to hundreds of feet of formation, so an
Formation evaluation
averaged sonic log is more directly comparable to a seismic waveform.
Density-neutron logs read the formation within about four to seven inches (178 mm) of the borehole wall. This is an
advantage in resolving thin beds. It is a disadvantage when the hole is badly caved. Corrections can be made
automatically if the cave is no more than a few inches deep. A caliper arm on the sonde measures the profile of the
borehole and a correction is calculated and incorporated in the porosity reading. However if the cave is much more
than four inches deep, the density-neutron log is reading little more than drilling mud.
Lithology logs - SP and Gamma Ray
There are two other tools, the SP log and the Gamma Ray log, one or both of which are almost always used in
wireline logging. Their output is usually presented along with the electric and porosity logs described above. They
are indispensable as additional guides to the nature of the rock around the borehole.
The SP log, known variously as a "Spontaneous Potential", "Self Potential" or "Shale Potential" log is a voltmeter
measurement of the voltage or electrical potential difference between the mud in the hole at a particular depth and a
copper ground stake driven into the surface of the earth a short distance from the borehole. A salinity difference
between the drilling mud and the formation water acts as a natural battery and will cause several voltage effects. This
"battery" causes a movement of charged ions between the hole and the formation water where there is enough
permeability in the rock. The most important voltage is set up as a permeable formation permits ion movement,
reducing the voltage between the formation water and the mud. Sections of the borehole where this occurs then have
a voltage difference with other nonpermeable sections where ion movement is restricted. Vertical ion movement in
the mud column occurs much more slowly because the mud is not circulating while the drill pipe is out of the hole.
The copper surface stake provides a reference point against which the SP voltage is measured for each part of the
borehole. There can also be several other minor voltages, due for example to mud filtrate streaming into the
formation under the effect of an overbalanced mud system. This flow carries ions and is a voltage generating current.
These other voltages are secondary in importance to the voltage resulting from the salinity contrast between mud and
formation water.
The nuances of the SP log are still being researched. In theory, almost all porous rocks contain water. Some pores are
completely filled with water. Others have a thin layer of water molecules wetting the surface of the rock, with gas or
oil filling the rest of the pore. In sandstones and porous limestones there is a continuous layer of water throughout
the formation. If there is even a little permeability to water, ions can move through the rock and decrease the voltage
difference with the mud nearby. Shales do not allow water or ion movement. Although they may have a large water
content, it is bound to the surface of the flat clay crystals comprising the shale. Thus mud opposite shale sections
maintains its voltage difference with the surrounding rock. As the SP logging tool is drawn up the hole it measures
the voltage difference between the reference stake and the mud opposite shale and sandstone or limestone sections.
The resulting log curve reflects the permeability of the rocks and, indirectly, their lithology. SP curves degrade over
time, as the ions diffuse up and down the mud column. It also can suffer from stray voltages caused by other logging
tools that are run with it. Older, simpler logs often have better SP curves than more modern logs for this reason. With
experience in an area, a good SP curve can even allow a skilled interpreter to infer sedimentary environments such as
deltas, point bars or offshore tidal deposits.
The gamma ray log is a measurement of naturally occurring gamma radiation from the borehole walls. Sandstones
are usually nonradioactive quartz and limestones are nonradioactive calcite. Shales however, are naturally
radioactive due to potassium isotopes in clays, and adsorbed uranium and thorium. Thus the presence or absence of
gamma rays in a borehole is an indication of the amount of shale or clay in the surrounding formation. The gamma
ray log is useful in holes drilled with air or with oil based muds, as these wells have no SP voltage. Even in
water-based muds, the gamma ray and SP logs are often run together. They comprise a check on each other and can
indicate unusual shale sections which may either not be radioactive, or may have an abnormal ionic chemistry. The
gamma ray log is also useful to detect coal beds, which, depending on the local geology, can have either low
Formation evaluation
radiation levels, or high radiation levels due to adsorption of uranium. In addition, the gamma ray log will work
inside a steel casing, making it essential when a cased well must be evaluated.
[1] Baker Hughes Solutions (http:/ / www.bakerhughesdirect.com/ cgi-bin/bhi/ resources/ ExternalFileHandler.jsp?bookmarkable=Yes&
path=private/BHI/ public/ bakerhughes/ integratedfe/fevaluation2.html& target=_blank)
Interpreting the tools
The immediate questions that have to be answered in deciding to complete a well or to plug and abandon (P&A) it
• Do any zones in the well contain producible hydrocarbons?
• How much?
• How much, if any, water will be produced with them?
The elementary approach to answering these questions uses the Archie Equation.
Formative assessment
Formative assessment is a reflective process that intends to promote student attainment
. Cowie and Bell
define it as the bidirectional process between teacher and student to enhance, recognize and respond to the learning.
Black and Wiliam
consider an assessment ‘formative’ when the feedback from learning activities is actually used
to adapt the teaching to meet the learner's needs. Nicol and Macfarlane-Dick
have re-interpreted research on
formative assessment and feedback and shown how these processes can help students take control of their own
learning (self-regulated learning).
In the training field, formative assessment is described as assessing the formation of the student. Facilitators do this
by observing students as they:
• Respond to questions
• Ask questions
• Interact with other students during activities, etc.
This enables the facilitator to evaluate own delivery, fog index and relevance of content.
Formative Assessments — Chronology and Intent
Michael Scriven (1967) coined the terms formative and summative evaluation and emphasized their differences both
in terms of the goals of the information they seek and how the information is used. Benjamin Bloom (1968) just a
year later made formative assessments a keystone of Learning for Mastery. He, along with Thomas Hasting and
George Madaus (1971) produced the Handbook of Formative and Summative Evaluation and showed how formative
assessments could be linked to instructional units in a variety of content areas. Almost 20 years ago, the Kentucky
high-stakes assessment (Kifer, 1994) initially included a major emphasis on instructionally embedded tests; i.e.,
formative assessments.
Formative assessment has a long history. Formative assessments have evolved as a means to adapt to student needs.
Historically formative assessments were of instructional units and diagnostic assessments were used for placement
purposes. Formative assessments are part of instruction designed to provide crucial feedback for teachers and
students. Assessment results inform the teacher of what has been taught well and not so well. They inform students
of what they have learned well and not learned so well. As opposed to a summative assessment designed to make
judgments about student performance and produce grades, the role of a formative assessment is to improve learning.
Formative assessment
As opposed to benchmark tests that are used to predict student performance on other tests (most often state
assessments), formative assessments are intimately connected to instruction.
Formative assessments are:
• For Learning — The purpose of formative assessment is to enhance learning not to allocate grades. Summative
assessments are designed to allocate grades. The goal of formative assessment is to improve; summative
assessment to prove.
• Embedded in Instruction — Formative assessments are considered a part of instruction and the instructional
sequence. What students are taught is reflected in what they are assessed.
They produce:
• Non-threatening Results — Formative assessments are scored but not graded. Students mark their own work
and are encouraged to raise questions about the assessment and the material covered by the assessment.
• Direct and Immediate Feedback — Results of formative assessments are produced “on the spot;” teachers and
students get them immediately. Teachers get a view of both individual and class performances while students
learn how well they have done.
• Structured Information — Teachers can judge success and plan improvements based on the formative results.
Students can see progress and experience success. Both teachers and students learn from the assessment results.
• Ways to Improve — Summarized formative results provide a basis for the teacher to re-visit topics in the unit if
necessary. Individual student responses provide a basis for giving students additional experiences in areas where
they performed less well.
Formative v. Summative
In 1967, Michael Scriven identified the differences between formative and summative assessments (as cited by many
people, including Marzano 2006)
. The difference between the two types of assessments is that classroom
formative assessments occur while content is being taught and learned and should continue throughout the period of
learning and are not meant to assign grades, thus its primary objective is to inform the teacher of what his or her
students know or do not know. More importantly, classroom formative assessments allow teachers to make decisions
and monitor their instruction based on student performance, while summative assessment occurs at the end of a
learning unit and determines if the content being taught was retained Ainsworth p. 23 (2006)
. A common
formative assessment is slightly different in that it is designed by a group of teachers who teach the same content
areas or grades, and it is directly aligned to the prioritized content standards. In addition, a common formative
assessment includes both a pre- and a post-unit assessment. Again, teachers are given the opportunity to determine
what students need to know and at the end, what they have learned from the learning unit Ainsworth pp. 23–24
. Marzano states:
"Recall the finding from Black and Wiliam’s (1998) synthesis of more than 250 studies that formative
assessments, as opposed to summative ones, produce the more powerful effect on student learning. In his
review of the research, Terrance Crooks (1988) reports that effects sizes for summative assessments are
consistently lower than effect sizes for formative assessments. In short, it is formative assessment that has a
strong research base supporting its impact on learning." (Marzano, 2006, p. 9)
Researchers have concluded that standards-based assessments are an effective way to “prescribe instruction and to
ensure that no child is left behind” (Marzano, 2006 p. 13)
. Because the administration of formative assessments is
ongoing the outcome is that standard-based assessments should be given frequently.
Evaluation done to improve or change a program while it is in progress is termed 'formative evaluation'. When
evaluation focuses on the results or outcomes of a program, it is called 'summative evaluation'.
Formative assessment
Formative Assessment in K–12
Formative assessment is more valuable for day-to-day teaching when it is used to adapt the teaching to meet
students’ needs. Formative assessment helps teachers to monitor their students’ progress and to modify the
instruction accordingly. It also helps students to monitor their own progress as they get feedback from their peers and
the teacher. Students also find opportunity to revise and refine their thinking by means of formative assessment.
Formative assessment is also called as educative assessment and classroom assessment.
Methods of Formative Assessment: There are many ways to integrate formative assessment into K-12 classrooms.
Although the key concepts of formative assessment such as constant feedback, modifying the instruction, and
information about students' progress do not vary among different disciplines or levels, the methods or strategies may
differ. For example, researchers developed generative activities (Stroup et al., 2004)
and model-eliciting activities
(Lesh et al., 2000)
that can be used as formative assessment tools in mathematics and science classrooms. Others
developed strategies computer-supported collaborative learning environments (Wang et al., 2004b)
. More
information about implication of formative assessment in specific areas is given below.
Purpose of Formative Assessment: The following are examples of application of formative assessment to content
Formative Assessment in Math Education:
In math education, it is really important for teachers to see how their students approach the problems and how much
mathematical knowledge and at what level students use when solving the problems. That is, knowing how students
think in the process of learning or problem solving makes it possible for teachers to help their students overcome
conceptual difficulties and, in turn, improve learning. In that sense, formative assessment is diagnostic. To employ
formative assessment in the classrooms, a teacher has to make sure that each student participates in the learning
process by expressing their ideas; there is a trustful environment -in which students can provide each other with
feedback; s/he (the teacher) provides students with feedback; and the instruction is modified according to students'
needs. In math classes, thought revealing activities such as model-eliciting activities (MEAs) and generative
activities provide good opportunities for covering these aspects of formative assessment.
Formative assessment in Second/ Foreign Language Education:
As an ongoing assessment it focuses on the process, it helps teachers to check the current status of their students’
language ability, that is, they can know what the students know and what the students do not know. It also gives
chances to students to participate in modifying or replanning the upcoming classes (Bachman & Palmer, 1996)
Participation in their learning grows students’ motivation to learn the target language. It also raises students’
awareness on their target languages, which results in resetting their own goals. In consequence, it helps students to
achieve their goals successfully as well as teachers be the facilitators to foster students’ target language ability.
In classroom, short quizzes, reflectional journals, or portfolios could be used as a formative assessment (Cohen,
Formative Assessment in Elementary Education:
In primary schools is used to inform the next steps of learning. Teacher and students both use Formative
Assessments as a tool to make decisions based on data. Formative assessment occurs when teachers feed information
back to students in ways that enable the student to learn better, or when students can engage in a similar, self-
reflective process. The evidence shows that high quality formative assessment does have a powerful impact on
student learning. Black and Wiliam (1998) report that studies of formative assessment show an effect size on
Standardized Tests of between 0.4 and 0.7, larger than most known educational interventions. (The effect size is the
ratio of the average improvement in test scores in the innovation to the range of scores of typical groups of pupils on
the same tests; Black and Wiliam recognize that standardized tests are very limited measures of learning.) Formative
assessment is particularly effective for students who have not done well in school, thus narrowing the gap between
low and high achievers while raising overall achievement. Research examined by Black and Wiliam supports the
Formative assessment
conclusion that summative assessments tend to have a negative effect on student learning.
Example of Formative Assessment in an Elementary Classroom
Activities that can be used as Formative Assessment Tools in Mathematics and Science Classrooms
Model-eliciting Activities (MEAs):
Model-eliciting activities are based on real-life situations where students, working in small groups, present a
mathematical model as a solution to a client’s need (Zawojewski & Carmona, 2001)
. The problem design enable
students to evaluate their solutions according to the needs of a client identified in the problem situation and sustain
themselves in productive, progressively effective cycles of conceptualizing and problem solving. Model-eliciting
activities (MEAs) are ideally structured to help students build their real-world sense of problem solving towards
increasingly powerful mathematical constructs. What is especially useful for mathematics educators and researchers
is the capacity of MEAs to make students’ thinking visible through their models and modeling cycles. Teachers do
not prompt the use of particular mathematical concepts or their representational counterparts when presenting the
problems. Instead, they choose activities that maximize the potential for students to develop the concepts that are the
focal point in the curriculum by building on their early and intuitive ideas. The mathematical models emerge from
the students’ interactions with the problem situation and learning is assessed via these emergent behaviors.
Generative Activities:
In a generative activity, students are asked to come up with outcomes that are mathematically same. Students can
arrive at the responses or build responses from this sameness in a wide range of ways. The sameness gives coherence
to the task and allows it to be an "organizational unit for performing a specific function." (Stroup et al., 2004)
Other activities can also be used as the means of formative assessment as long as they ensure the participation of
every student, make students' thoughts visible to each other and to the teacher, promote feedback to revise and refine
thinking. In addition, as a complementary to all of these is to modify and adapt instruction through the information
gathered by those activities.
Formative Assessment in Computer Supported Learning
Six strategies for web-based formative assessment
Many academics are seeking to diversify assessment tasks, broaden the range of skills assessed and provide students
with more timely and informative feedback on their progress. Others are wishing to meet student expectations for
more flexible delivery and to generate efficiencies in assessment that can ease academic staff workloads. The move
to on-line and computer based assessment is a natural outcome of the increasing use of information and
communication technologies to enhance learning. As more students seek flexibility in their courses, it seems
inevitable there will be growing expectations for flexible assessment as well.
Wang et al. (2004b), developed the Formative Assessment Module of the Web-based Assessment and Test Analysis
System (FAM-WATA), to help address this problem. This research not only applied FAM-WATA to assist teachers
in giving feedback and interacting with students in an e-learning environment but also explored the effectiveness of
FAM-WATA in facilitating student e-learning effectiveness. FAM-WATA offers six main strategies:
Strategy 1–3: ‘Repeat the test’, ‘correct answers are not given’, and ‘ask questions’strategies
The combination of two strategies, ‘repeat the test’ and ‘correct answers are not given’, in web-based formative
assessment will increase e-learning effectiveness (Buchanan, 2000)
. The major purpose of these strategies is to
provide students with opportunities to revise the mistakes they have made. In addition to these two strategies, the
FAM-WATA tries to stimulate student interest and desire for new challenges through the design of the Web
environment, as explained next.
When learners log in and perform a self-assessment, FAM-WATA will automatically choose some questions
randomly from the database. The order of questions and options are randomly arranged. This is to prevent learner
boredom with repeated tests. A given test item will not show up on the following test if a learner correctly answers
Formative assessment
the test item three times consecutively. Thus, the number of test items gradually decreases with each iteration of the
test. At some point, all questions will be answered correctly, and the system will tag the successful learner with a
‘pass the test’ mark. By the same token, if learners cannot answer the test item correctly three times consecutively,
then the answer count will be reset to zero and begun again. Answering a test item correctly three times
consecutively is necessary because the system judges that the learners may answer the question correctly simply by
guessing. The purpose of this design is for learners to actively take on the challenge of learning, not passively guess
their way through.
In the above design, ‘timely feedback’ is combined to form the strategy of ‘correct answers are not given’. After
learners submit their test papers, FAM-WATA will immediately give scores and present references to learners
without directly giving the correct answers of the questions. Meanwhile, learners may also asynchronously interact
with teachers by asking questions online. As for the function of ‘timely feedback’, the system offers learners
reference materials to help them find correct answers.
Strategy 4: ‘Monitor answering history’strategy
FAM-WATA provides an interface to check the answering history of the user and others who have taken the test,
available to learners after they pass the test. Through understanding their own progress, learners are expected to take
the initiative in monitoring their learning.
Strategy 5: ‘Query scores’strategy
FAM-WATA provides an interface for learners to look up peer scores and see the progress of others, to encourage
the learner to learn from peers, and motivate learning. Students may find out whether others have passed the test and
how many tries are required for others to answer and to pass the test. Students can query the answering history of
other students. The main purpose of these designs is to add the stimulus of competition. Those who perform well or
pass the test will be marked by special signs, increasing their sense of achievement. The special signs can also be
regarded as a form of encouragement.
Strategy 6: ‘All pass and then reward’strategy
FAM-WATA will generate a flash (Adobe Systems Inc., CA, USA) animation to congratulate learners on passing
the test. Animation effects can stimulate learner interest (Mayer & Moreno, 2002)
. This type of positive feedback
can provide encouragement for learners who pass a task, creating a sense of achievement.
Formative Assessment in UK education
In the UK education system, formative assessment (or assessment for learning) has been a key aspect of the agenda
for personalised learning. The Working Group on 14–19 Reform led by Sir Mike Tomlinson, recommended that
assessment of learners be refocused to be more teacher-led and less reliant on external assessment, putting learners at
the heart of the assessment process.
The UK government has stated
that personalised learning depends on teachers knowing the strengths and
weaknesses of individual learners, and that a key means of achieving this is through formative assessment, involving
high quality feedback to learners included within every teaching session.
The Assessment Reform Group has set out 10 principles for formative assessment.
These are that assessment for
learning should:
• be part of effective planning of teaching and learning
• focus on how students learn
• be recognised as central to classroom practice
• be regarded as a key professional skill for teachers
• be sensitive and constructive because any assessment has an emotional impact
• take account of the importance of learner motivation
• promote commitment to learning goals and a shared understanding of the criteria by which they are assessed
Formative assessment
• enable learners to receive constructive guidance about how to improve
• develop learners’ capacity for self-assessment so that they can become reflective and self-managing
• recognise the full range of achievements of all learners
Benefits of Formative Assessments for Teachers (Boston, 2002)
• Teachers are able to determine what standards students already know and to what degree.
• Teachers can decide what minor modifications or major changes in instruction they need to makes so that all
students can succeed in upcoming instruction and on subsequent assessments.
• Teachers can create appropriate lessons and activities for groups of learners or individual students.
• Teachers can inform students about their current progress in order to help them set goals for improvement.
• In 2008, Katy Bainbridge began work on Align Assess Achieve, a method of teaching formative assessment
to administrators and teachers.
Benefits of Formative Assessments for Students

• Students are more motivated to learn.
• Students take responsibility for their own learning.
• Students can become users of assessment alongside the teacher.
• Students learn valuable lifelong skills such as self-evaluation, self-assessment, and goal setting.
• Student achievement can improve from 21-41 percentile points.
[1] Crooks, T. (2001), The Validity of Formative Assessments, Paper presented to the British Educational Research Association Annual
Conference, University of Leeds, 13-15 September (http:// www. leeds. ac.uk/ educol/ documents/ 00001862.htm)
[2] Cowie, B., & Bell, B. (1999), A model of formative assessment in science education, Assessment in Education, 6: 101-116
[3] Black, P., & Wiliam, D. (1998), Inside the black box: Raising standards through classroom assessment. Phi Delta Kappan, 80(2): 139-149
[4] Nicol, D.J. & Macfarlane-Dick, D. (2006). Formative assessment and self-regulated learning: A model and seven principles of good feedback
practice. Studies in Higher Education, Vol 31(2), pp.199-218
[5] Bloom, B. S. (1968) Learning for mastery. Evaluation Comment. University of California, Los Angeles. Bloom, B.S., Hastings,T. and
Madaus, G. (1971) Handbook of formative and summative evaluation of student learning. New York: McGraw-Hill Book Company Kifer,
Edward (1994) The Kentucky instructional results information system, in Guskey, Thomas R. (Ed.) High Stakes Performance Assessment:
Perspectives on Kentucky's Educational Reform. Thousand Oaks, CA: Corwin Press Scriven, M. (1967) The methodology of evaluation. In R.
E. Stake (Ed.), Curriculum evaluation. American Educational Research Association monograph series on evaluation, no. 1, Chicago: Rand
McNally. Reprinted with revisions in B. R. Worthen & J. R. Sanders (Eds.) (1973), Educational evaluation: Theory and practice. Worthington,
OH: Charles A. Jones.
[6] Marzano, R. J. (2006). Classroom assessments and grading that work. Alexandria, VA: Association for Supervision and Curriculum
[7] Ainsworth, L., & Viegut, D. (2006). Common formative assessments. Thousand Oaks, CA: Corwin Press.
[8] Stroup, W. M., Ares, N., & Hurford, A. C. (2004). A taxonomy of generative activity design supported by next generation classroom
networks. Paper presented at the Proceedings of the twenty-sixth annual meeting of the North American Chapter of the International Group for
the Psychology of Mathematics Education, Toronto, Ontario, Canada.
[9] Lesh, R., Hoover, M., Hole, B., Kelly, E., & Post, T. (2000). Principles for developing thought-revealing activities for students and teachers.
In A. E. Kelly & R. A. Lesh (Eds.), Handbook of research design in mathematics and science education (pp. 591-645). Mahaway, NJ:
Lawrence Erlbaum.
[10] Wang, T.H. (2007). What strategies are effective for formative assessment in an e-learning environment? Journal of Computer Assisted
Learning. 23(3), 171–186.
[11] Bachman. L.F. & Palmer A.S. (1996). Language Testing in Practice. Oxford University Press.
[12] Cohen. A. (1994). Assessing Language Ability in the Classroom. Heinle & Heinle Publishers.
[13] Zawojewski, J., & Carmona, G. (2001). A developmental and social perspective on problem solving strategies. In R. Speiser & C. Walter
(Eds.), Proceedings of the twenty-third annual meeting of the North American chapter of the international group for the psychology of
Formative assessment
mathematics education. Columbus, OH: ERIC Clearinghouse for Science, Mathematics, and Environmental Education.
[14] Buchanan, T. (1998) Using the World Wide Web for formative assessment. Journal of Educational Technology Systems 27, 71–79.
[15] Mayer, R. E. & Moreno, R. (2002). Aids to computer-based multimedia learning. Learning and Instruction. 12. 107–119.
[16] Jones, Dr Cheryl A, Assessment for Learning, Learning and Skills Development Agency (now the Learning and Skills Network) (2005), p.1
[17] A national conversation about personalised learning – a summary of the DfES discussion pamphlet, Department for Education and Skills
(2005), p.8
[18] Duckett, Ian and Brooke, Di, Learning and Skills Network (2007), p.1
[19] Assessment for Learning: 10 research-based principles to guide classroom practice, Assessment Reform Group (2002), p.2
[20] Boston, Carol (2002). The concept of formative assessment. Practical Assessment, Research & Evaluation, 8(9).
[21] http:/ / www. qualityinstruction. org
[22] Marzano, Robert J. (2003). What works in schools: Translating research into action. Alexandria, VA: ASCD.
[23] Stiggins, R.J., Arter, J.A., Chappius, J. & Chappius, S. (2006). Classroom assessment for student learning: Doing it right-using it well.
Portland, OR: Educational Testing Service.
External links
• The Concept of Formative Assessment. ERIC Digest. (http:/ / www.ericdigests. org/ 2003-3/concept. htm)
• Qualifications and Curriculum Authority: assessment (http:/ / www.qca.org.uk/ qca_13581.aspx)
• Qualifications and Curriculum Authority: assessment for learning documents (http:// www.qca. org. uk/
qca_13440. aspx)
• Assessment for Learning (Learning and Skills Development Agency, now the Learning and Skills Network)
(PDF) (https:// www. lsneducation. org. uk/ user/ order.aspx?code=041723& src=XOWEB)
• Learning and Skills Network website (http:/ / www.lsneducation. org. uk/ )
• Assessment Reform Group website (http:/ / www.assessment-reform-group.org/ )
• The EvaluationWiki (http:// www. evaluationwiki. org) - The mission of EvaluationWiki is to make freely
available a compendium of up-to-date information and resources to everyone involved in the science and practice
of evaluation. The EvaluationWiki is presented by the non-profit Evaluation Resource Institute.
• Formative-Assessment.com - Comprehensive Site on Formative Assessment (http:/ / www.formative-assessment.
Formative evaluation
Formative evaluation
Formative evaluation is a type of evaluation which has the purpose of improving programs. It developed as a
discipline as a result of growing frustration with an exclusive emphasis on outcome evaluation as the only purpose
for evaluation activity.
Formative evaluation is a type of evaluation which has the purpose of improving programs. It goes under other
names such as developmental evaluation and implementation evaluation. It can be contrasted with other types of
evaluation which have other purposes, in particular process evaluation and outcome evaluation. An example of this
is its use in instructional design to assess ongoing projects during their construction to implement improvements.
Formative evaluation can use any of the techniques which are used in other types of evaluation: surveys, interviews,
data collection and experiments (where these are used to examine the outcomes of pilot projects).
Formative evaluation developed relatively late in the course of evaluation's emergence as a discipline as a result of
growing frustration with an exclusive emphasis on outcome evaluation as the only purpose for evaluation activity.
Outcome evaluation looks at the intended or unintended positive or negative consequences of a program, policy or
organization. While outcome evaluation is useful where it can be done, it is not always the best type of evaluation to
undertake. For instance, in many cases it is difficult or even impossible to undertake an outcome evaluation because
of either feasibility or cost. In other cases, even where outcome evaluation is feasible and affordable, it may be a
number of years before the results of an outcome evaluation become available. As a consequence, attention has
turned to using evaluation techniques to maximize the chances that a program will be successful instead of waiting
till the final results of a program are available to assess its usefulness. Formative evaluation therefore complements
outcome evaluation rather than being an alternative to it.
Formative evaluation is done with a small group of people to "test run" various aspects of instructional materials. For
example, you might ask a friend to look over your web pages to see if they are graphically pleasing, if there are
errors you've missed, if it has navigational problems. It's like having someone look over your shoulder during the
development phase to help you catch things that you miss, but a fresh set of eyes might not. At times, you might
need to have this help from a target audience. For example, if you're designing learning materials for third graders,
you should have a third grader as part of your Formative Evaluation.
The terms formative and summative evaluation were coined by Michael Scriven (1967)
Formative Evaluation has also recently become the recommended method of evaluation in U.S. education. In this
context, an educator would analyze the performance of a student during the teaching/intervention process and
compare this data to the baseline data. There are four visual criteria that can be applied
1. Change in mean,
2. Change in level or discontinuity of performance,
3. Change in trend or rate of change,
4. Latency of change
Another method of monitoring progress in formative evaluation is use of the number-point rule. In this method, if a
certain pre-specified number of data points collected during the intervention are above the goal, then the educators
need to consider raising the goal or discontinuing the intervention. If data points vary highly, educators can discuss
how to motivate a student to achieve more consistently
Formative evaluation
At the Faculty of Psychology of the University of Vienna Twitter was used for formative course evaluation.

[1] Scriven, Michael (1967). Gredler, M. E.. ed. "The methodology of evaluation". Program Evaluation (New Jersey: Prentice Hall, 1996): 16.
[2] Kazdin, A. E. (1982). Single-case research designs: Methods for clinical and applied settings. New York: Oxford university Press.
[3] Thomas, T; Grimes, J. (2008). Best practices in school psychology V (Bethesda, MD: National Association of School Psychologists (NASP))
2: 218.
[4] Burger, Christoph; Stefan Stieger (2009-02-18). "Using Web 2.0 application Twitter for formative course evaluation: a case study." (http://
www.mobileresearch09. com/ index. php/ page/ the-schedule#postertitles). 1st Mobile phone conference, London (UK). . Retrieved
[5] Burger, Christoph; Stefan Stieger (2009-04-08). "Let's go formative: Continuous student ratings with Web 2.0 application Twitter" (http://
wp1101040.wp137. webpack. hosteurope. de/ conftool09/ index.php?page=browseSessions& abstracts=show& presentations=show&
form_session=44). GOR09. . Retrieved 2009-01-07.
General Learning Ability
General Learning Ability is an Aptitude, commonly referred to as the "G" score, defined as the ability to "catch on"
or understand instructions and underlying principles; ability to reason and make judgements. Closely related to doing
well in school, one's "G" score is often used by Vocational Rehabilitation evaluators and counselors as a factor for
determining a person's likelihood of success in a two- or four-year college program. In assessments such as the U.S.
Department of Labor's General Aptitude Test Battery (G.A.T.B.) and the Vocational Research Institute's
Careerscope, a person's General Learning Ability score is determined by a combination of subsets of Pattern
Visualization, Word Meanings and Numerical Reasoning.
The aptitude General Learning Ability is related to the use of logic or scientific facts to define problems and draw
conclusions; make decisions and judgments; or plan and supervise the work of others. Specific job tasks that might
require high General Learning Ability include: diagnose and treat illnesses or injuries; use facts to solve a crime;
plan the layout of a computer network; inspect and test engine parts.
For the most common releases of General Ability Records are in tests. This helps children in gaining general
knowledge or as Brain Power. Without this you could not improve your stamina or other things you do. Also it is
used every where because you use it in your voice, your work etc. The voice or 'English' you are talking is made up
the parts of General Aptitute which consists of the main five(5) reasons. They are around us all so the Main five(5)
reasons are:
1. Thinking 2. Speaking 3. Work ( educational) 4. Emotions 5. Knowledge
All of these help you to have a life which is used in your brain. What ever information you usually hear from
someone e.g. (teacher) you would recall it in your brain. This is called General Knowledge.The more General
Knowledge you have the more clever you are but if you have to much you filled 15% instead of 10% of your brain
for normal humans.
Goddard College
Goddard College
Goddard College
Established 1863
Type Private coeducational Low Residency
President Dr. Barbara Vacarr
Academic staff 104
Students 804
Location Plainfield, Vermont, United States
Former names Green Mountain Central Institute & Goddard Seminary
Website http:/ / www. goddard.edu/
The shingle style clock house on the Greatwood Campus
appears on the college seal.
Goddard College is a private, liberal arts college located in
Plainfield, Vermont, offering undergraduate and graduate
degree programs. Goddard College currently operates on an
intensive low-residency model. Goddard is nationally and
internationally recognized for its leadership in educational
innovation, its deep commitment to the ideal of democracy
and for its active efforts to live consciously as stewards of
the earth. Students design their own curriculum; the college
currently uses a student self-directed, mentored system.
Residencies require the student's attendance every six
months for approximately eight days, during which time the
student engages in a variety of activities and lectures from
early morning until late in the evening, and creates a
detailed study plan outlining what learning they will engage
in once back in their home communities. During the semester students study independently, sending in "packets" to
their faculty mentor every three weeks. The content of the packets varies with each individual, but focuses on
research, writing, and reflection related to their study plan.
In the mid 2000's Goddard expanded to the west coast, creating Goddard West located in Port Townsend,
Washington and in July, 2011 will offer their education program (non-licensure only) in Seattle, Washington, as well
as Vermont.
Goddard College
To advance cultures of rigorous inquiry, collaboration, and life-long learning, where individuals take imaginative
and responsible action in the world.
Early History
Goddard College began in 1863. Goddard as an institution has traversed three centuries, and has consistently
transformed itself as a reflection of the times and of the needs of students. Royce S. "Tim" Pitkin, the first Goddard
president, a progressive educator and follower of John Dewey and other, similar proponents of educational
democracy earned a doctorate at Columbia and returned to Vermont where the seeds of Goddard College sprouted.
Pitkin conceived of the college as a place for "plain living and hard thinking." [1]
Having narrative transcripts instead of traditional letter grades, as well as learner-designed curricula, Goddard was
one of the Union for Experimenting Colleges and Universities, which also included Franconia, Nasson, Antioch, and
several other educational institutions.
Goddard College advocates innovation in higher education as its expressed objective; in 1963, Goddard introduced
the first Adult Degree Program for working adults.
In 2002, after fifty-four years, the college terminated its traditional age on-site experimental bachelor degree
program. Today its more than six hundred adult students attend residencies in either Plainfield, VT or Port
Townsend, WA. Only two programs are available at the Port Townsend site: the MFA in Creative Writing and the
MFA in Interdisciplinary Arts, which was new to Port Townsend in the fall of 2007. Also new for the fall of 2007
was the first low-residency Bachelor of Fine Arts program in creative writing.
The History of the Goddard Experiment Exhibit, 1949-1959
The History of Goddard College Exhibit, 1960-1969: An Era of Growth, Expansion, and Transitions
Programs of Study Goddard offers a Bachelor of Arts (BA), Bachelor of Fine Arts (BFA), Master of Arts (MA),
Master of Fine Arts (MFA), along with several concentrations and Licensures.
Bachelor of Arts
• BA in Education and Licensure (EDU)
• BA in Health Arts and Sciences (HAS)
• BA in Individualized Studies (IBA)
• BA in Sustainability (BAS)
* Bachelor of Fine Arts
• BFA in Creative Writing (BFA)
*Master of Arts
• MA in Individualized Studies (IMA)
• MA in Education and Licensure (EDU)
• MA in Health Arts and Sciences (HAS)
• MA in Psychology and Counseling (PSY)
• MA in Sustainable Business and Communities Program (SBC)
*Master of Fine Arts
• MFA in Creative Writing (MFA)
• MFA in Interdisciplinary Arts (MFAIA)
Goddard College
• Community Education (EDU)
• Consciousness Studies (IMA)
• Environmental Studies (IMA)
• School Counseling (EDU)
• Sexual Orientation (PSY)
• Transformative Language Arts (IMA)
• Goddard College
, Greatwood: Plainfield, Vermont
The campus in Plainfield was initially formed into a school in 1938 from the various shingle style buildings on a late
19th century model farm: The Greatwood Estate. Eleven new dormitory buildings were constructed adjacent to the
ensemble of renovated farm buildings in 1963 to accommodate an increasing student population. The Pratt Learning
Center, sited to be at the heart of a larger campus, was constructed in 1968. No other significant new construction
has been added to the campus since that time. Currently Goddard College offers residencies for:
• BA and MA in Education & Licensure Program
• BA and MA in Health Arts & Sciences Program
• BA in Individualized Studies Program
• BA in Sustainability Program
• BFA in Creative Writing Program
• MA in Individualized Studies Program
• MA in Psychology & Counseling Program
• MA in Sustainable Business & Communities Program
• MFA in Creative Writing Program
• MFA in Interdisciplinary Arts Program
• Goddard West: Fort Worden State Park, Port Townsend, Washington
A former nineteenth century army base, much of the Fort has been renovated and turned into a year-round, multi-use
facility which houses several organizations that comprise of The Fort Worden Collaborative
. The fort sits on a
bluff overlooking the Strait of Juan de Fuca and Admiralty Inlet. Currently Goddard West offers residencies for:
• MFA in Creative Writing
• MFA in Interdisciplinary Arts
• Individual Bachelor of Arts (expected 2012)
• Campus area: 175 acres (0.7 km²)
• Enrollment: approximately 800
• School type: Private
• Accreditation: The Commission on Institutions of Higher Education of the New England Association of Schools
and Colleges
• Year founded: 1863, as Green Mountain Central Institute; in 1870 became Goddard Seminary
• Address: 123 Pitkin Road, Plainfield, Vermont 05667
• Goddard College was host to the Bread and Puppet Theater in 1970-1971
Goddard College
Notable alumni
See also: Goddard College Alumni
• Mumia Abu-Jamal (activist)
• Ed Allen (writer)
• Trey Anastasio - rock band member (Phish)
• Piers Anthony - author
• Howard Ashman - actor, playwright (Little Shop of Horrors), lyricist (The Little Mermaid, Beauty and the Beast)
• Judith Arcana (writer)
• Daniel Boyarin - professor (Jewish Studies)
• Barry Bradford, teacher, author, public speaker, famous for helping reopen the Mississippi Burning Case, and the
Clyde Kennard Case. National Teacher Of The Year
, Illinois Teacher Of The Year, and winner of the Golden
Apple Award for Excellence In Teaching
, as well as a Presidential Citation For Civillian Service.
• Jared Carter - poet
• Mayme Agnew Clayton - Librarian, and the Founder of the Western States Black Research and Education Center
• Tim Costello (1945–2009), labor and anti-globalization advocate and author
• Jay Craven - Vermont Film Director, Screenwriter, and Professor
• Tony Curtis - Welsh poet
• Mark Doty, Poet, National Book Award winner, 2008
• Norman Dubie, Poet
• Larry Feign - cartoonist (The World of Lily Wong)
• Caroline Finkelstein - Poet
• Robert M. Fisher - Abstract artist
• Jon Fishman - rock band member (Phish)
• Oliver Foot - British Actor, Philanthropist, Charity Worker
• James Gahagan - Abstract artist
• David Gallaher - writer (High Moon)
• Ann Gillespie - Actress (Beverly Hills, 90210)
• Bradford Graves - sculptor, musician, professor (fine arts, sculpture)
• Peter Hannan - Artist, Writer, producer (CatDog)
• David Helvarg - Journalist and Environmental Activist
• Conrad Herwig - Jazz Trombonist
• Cara Hoffman - novelist So Much Pretty, Simon & Schuster, March 2011
• Susie Ibarra - Contemporary Composer and Percussionist
• Linnea Johnson - Poet.
• Wayne Karlin - Author
• Mary Karr - Author
• John Kasiewicz - Guitarist
• Jonathan Katz - Writer, Actor, Producer (Dr. Katz)
• Neil Landau - screenwriter, playwright, television producer
• Michael Lent - Visual Artist and Curator
• Geraldine Clinton Little - Poet
• William H. Macy - actor
• David Mamet - writer, director, Pulitzer prize winner in drama ("Glengarry Glen Ross")
• Linda McCarriston - Poet and Professor
• Page McConnell - rock band member (Phish)
• Laura McCullough - Poet and Writer
• Walter Mosley - Author
Goddard College
• Lisel Mueller - Poet
• Frances Olsen - Professor of Law at UCLA
• Jared Pappas-Kelley Curator, Writer, and Artist
• Russell Potter - Arctic historian, Author
• Tobias Schneebaum - artist, anthropologist, AIDS activist
• Archie Shepp - saxophonist
• Stephen C. Smith - economist, Professor, Author, Poverty Activist
• Jane Shore - (Poet)
• Pamela Stewart - Poet
• Elaine Terranova - Poet
• Kenneth R. Timmerman - correspondent, Author, Activist
• Donald Kofi Tucker - Politician
• Ellen Bryant Voigt - Poet
• Esther Wertheimer - Sculptor
• William L. White - Addiction Studies
• Suzi Wizowaty Author and Politician
• Thomas Yamamoto - Art Instructor, not technically an alumnus
• Paul Zaloom - puppeteer Bread & Puppet Theater
Goddard also is home to a community radio station it is a non-commercial, listener-supported educational station
with nearly 70 volunteer programmers who live and work in central and northern Vermont, and who range in age
from 12 to 78 years. In 2009 WDGR received My Source Community Impact Award for Engagement for the work it
has done in the Central Vermont community. Kris Gruen was appointed Station Manager in Nov. 2010
It was a student driven project started in the late 60's and was intended to be heard only on campus and was know as
WGOD or "the voice of God". It was realized shortly after the launch that the station reached into town and was soon
changed to WGDR.
In June 1970 Goddard College and WGDR hosted the first Alternative Media Conference,
WGDR, 91.1 FM, and its sister station WGDH, 91.7 are part of the Corporation for Public Broadcasting.
• Mission Statement
WGDR — Community Radio for Central Vermont at Goddard College — strives to inform, educate, entertain,
involve, motivate, and connect its diverse communities through independent non-commercial radio programming. As
a hybrid college–community radio station, WGDR is committed to education and training in the art and science of
community radio and to in-depth involvement in many forms by its geographic community and its communities of
interest. WGDR accomplishes its mission by: Engaging robust support from the stations communities; Integrating
WGDR programming and Goddard's low-residency academic programs; Experimenting with opportunities and
technology to expand beyond Central Vermont
Goddard College
Related resources
• List of colleges and universities in the United States
• List of colleges and universities in Vermont
[1] http:/ / catalog.vermonthistory.org/ vhsweb2/ tramp2. exe/ do_ccl_search/ guest?SETTING_KEY=vhs&servers=1home&index=)&
[2] http:/ / lits. goddard.edu/ 2011/ 01/ 20/ new-goddard-college-history-exhibit-on-display/
[3] http:/ / lits. goddard.edu/ wp-content/uploads/ 2011/ 06/ The_History_of_Goddard_College_Exhibit_1960-1969_Exhibition_Catalog-1.pdf
[4] http:// www. goddard.edu/ content52382. html
[5] http:// www. goddard.edu/ content52383. html
[6] http:// www. goddard.edu/ content52384. html
[7] http:// www. goddard.edu/ content52385. html
[8] http:// www. goddard.edu/ concentrations
[9] http:// www. goddard.edu
[10] http:/ / www. fwcollaborative.org/fwcollaborative/about-fwc.html
[11] http:/ / cihe.neasc. org/about_our_institutions/ roster_of_institutions/details/ 14124
[12] www.oah.org/awards/awards.tachau.winners.html
[13] www.goldenapple.org/pages/academy_directory/26.php
[14] http:// www. barrybradford.com/ 2. html.
[15] Greenhouse, Steve. "Tim Costello, Trucker-Author Who Fought Globalization, Dies at 64" (http:// www.nytimes. com/ 2009/ 12/ 26/ us/
26costello. html), The New York Times, December 26, 2009. Accessed December 28, 2009.
• Cappel, Constance, Utopian Colleges, New York: Peter Lang, 1999.
External links
• Official website (http:// www. goddard.edu/ ) - www.goddard.edu
• Stories from Goddard - Goddard College's blog (http:// stories. goddard.edu/ )
• 2008 Campus Master Plan (http:/ / www. goddard.edu/ stuff/contentmgr/files/
7bbc68e91542318e4d7ed112681e6435/ miscdocs/ goddard_master_plan.pdf) (PDF)
• Find Goddard on Facebook at www.facebook.com/GoddardCollege
Graphical Evaluation and Review Technique
Graphical Evaluation and Review Technique
Graphical Evaluation and Review Technique, commonly known as GERT, is a network analysis technique used
in project management that allows probabilistic treatment of both network logic and activity duration estimated. The
technique was first described in 1966 by Dr. Alan B. Pritsker of Purdue University and WW Happ.
Compared to other techniques, GERT is only rarely used in complex systems. However, despite this the GERT
approach address the majority of the limitations associated with PERT/CPM technique. GERT allows loops between
tasks. The fundamental drawback associated with the GERT technique is the complex programme (Monte Carlo
simulation) required to model the GERT system. Development in GERT includes Q-GERTS - allowing the user to
consider queuing within the system.
[1] Pritsker, A. A. B. (April 1966) (PDF). GERT: Graphical Evaluation and Review Technique (http:// ntrs.nasa. gov/ archive/nasa/ casi.ntrs.
nasa. gov/ 19670022025_1967022025. pdf). RM-4973-NASA. National Aeronautics and Space Administration under Contract No. NASr-21. .
Retrieved 2006-12-05.
Health technology assessment
Health Technology Assessment is a multi-disciplinary field of policy analysis that examines the medical, economic,
social and ethical implications of the incremental value, diffusion and use of a medical technology in health care.
It is intended to provide a bridge between the world of research and the world of decision-making.
technology assessment is an active field internationally and has seen continued growth fostered by the need to
support management, clinical, and policy decisions. It has also been advanced by the evolution of evaluative
methods in the social and applied sciences, including clinical epidemiology and health economics. Health policy
decisions are becoming increasingly important as the opportunity costs from making wrong decisions continue to
The growth of HTA internationally can be seen in the expanding membership of the International Network of
Agencies for Health Technology Assessment (INAHTA), a non-profit umbrella organization established in 1993.
Organizations and individuals involved in HTA research are also affiliated with societies such as the international
societies HTAI and the International Society for Pharmacoeconomics and Outcomes Research (ISPOR). An
international masters program in health technology assessment and management, ULYSSES, is also offered.
In the UK, the Multidisciplinary Assessment of Technology Centre for Healthcare carries out HTA in collaboration
with the health service, the NHS and various industrial partners. MATCH is organised into four themes addressing
key HTA topics including Health Economics, Tools for Industry, User Needs and Procurement and Supply chain.
Health technology can be defined broadly as:
Any intervention that may be used to promote health, to prevent, diagnose or treat disease or for rehabilitation
or long-term care. This includes the pharmaceuticals, devices, procedures and organizational systems used in
health care.
Health technology assessment
[1] INAHTA (International Network of Agencies for Health Technology Assessment). (May 15, 2009). "HTA resources." (http:// inahta.
episerverhotell.net/HTA/ ). INAHTA. .
[2] Battista, RN: The scientific basis of health services. BMJ Publishing Group, 1996.
[3] Menon D.; Marshall, D (1996). "The internationalization of heath technology assessment.". IJTAHC 12 (1): 45–51. PMID 8690561.
[4] INAHTA (International Network of Agencies for Health Technology Assessment). (June 8, 2009). "HTA glossary." (http:/ / www.inahta.
org/ HTA/ Glossary/ #_Health_technology). INAHTA. .
External links
• Multidisciplinary Assessment of Technology Centre for Healthcare (http:// www.match. ac. uk/ )
• Mapi Values (http:/ / www. mapivalues. com/ health_technology_assessment. asp)
• Hayes, Inc. (http:// www. hayesinc. com)
• International Network of Agencies for Health Technology Assessment (INAHTA) (http:// www.inahta. org)
• HTAi (http:/ /www. htai. org)
• International Society for Pharmacoeconomics and Outcomes Research (ISPOR) (http:// www.ispor. org)
• The Ulysses Program (http:/ / www. ulyssesprogram. net)
• Canadian Agency for Drugs and Technologies in Health ([[CADTH (http:// www.cadth. ca)])]
• Cedar Associates LLC - Clinical Effectiveness Decision Analytical Research (http:/ / www.cedarecon.com)
Immanent evaluation
Immanent evaluation is a philosophical concept used by Gilles Deleuze in Nietzsche and Philosophy (1962),
opposed to transcendent judgment.
Friedrich Nietzsche had argued, in On the Genealogy of Morals, that moral philosophy was nihilist in its judgment of
the world based on transcendent values: life was rejected by such philosophy, which Arthur Schopenhauer pushed to
its extreme meaning, to the profit of non-existent other worlds. Deleuze would start from this argumentation, linking
it with Antonin Artaud's Pour en finir avec le jugement de dieu ("To finish with god's judgment" - the absence of
capitals is purposeful).
Immanent evaluation, as opposed to transcendent judgment, evaluates forces according to two Nietzschean
categories: active and reactive. Apart from Nietzsche, a similar example of immanent evaluation can be found in
Benedict Spinoza's anomaly (Antonio Negri), where affects constitutes the only form of evaluation.
• Gilles Deleuze, Nietzsche and Philosophy (1962)
Impact assessment
Impact assessment
Impact assessment (IA) is "a process aimed at structuring and supporting the development of policies. It identifies
and assesses the problem at stake and the objectives pursued. It identifies the main options for achieving the
objective and analyses their likely impacts in the economic, environmental and social fields. It outlines advantages
and disadvantages of each option and examines possible synergies and trade-offs".
External links
• International Association for Impact Assessment
(main source)
• Impact Assessment and Project Appraisal
(a journal)
• Impact Assessment Page at the European Commission
• IAIA Wiki
[1] http:/ / ec.europa. eu/ governance/impact/ index_en. htm, Source: European commission
[2] http:/ / www. iaia.org/
[3] http:/ / www. scipol. co. uk/ iapa. htm
[4] http:/ / ec.europa. eu/ governance/impact/ index_en. htm
[5] http:/ / www. iaia.org/iaiawiki
Impact evaluation
Impact evaluation assesses the changes that can be attributed to a particular intervention, such as a project, program
or policy, both the intended ones, as well as ideally the unintended ones.
In contrast to outcome monitoring, which
examines whether targets have been achieved, impact evaluation is structured to answer the question: how would
outcomes such as participants’ well-being have changed if the intervention had not been undertaken? This involves
counterfactual analysis, that is, “a comparison between what actually happened and what would have happened in the
absence of the intervention.”
Impact evaluations seek to answer cause-and-effect questions. In orther words, they
look for the changes in outcome that are directly attributable to a program.
Impact Evaluation helps us to answer key questions for evidence-based policy making: what works, what doesn’t,
where, why and for how much? It has received increasing attention in policy making in recent years in both Western
and developing country contexts.
It is an important component of the armory of evaluation tools and approaches
and integral to global efforts to improve the effectiveness of aid delivery and public spending more generally in
improving living standards. Originally more oriented towards evaluation of social sector programs in developing
countries, notably conditional cash transfers, impact evaluation is now being increasingly applied in other areas such
as the agriculture, energy and transport.
Counterfactual Evaluation designs
Counterfactual analysis enables evaluators to attribute cause and effect between interventions and outcomes. The
‘counterfactual’ measures what would have happened to beneficiaries in the absence of the intervention, and impact
is estimated by comparing counterfactual outcomes to those observed under the intervention. The key challenge in
Impact Evaluation is that the counterfactual cannot be directly observed, but must be approximated with reference to
a comparison group. There are a range of accepted approaches to determining an appropriate comparison group for
counterfactual analysis, using either prospective (ex ante) or retrospective (ex post) evaluation design. Prospective
evaluations begin during the design phase of the intervention, involving collection of baseline and end-line data from
Impact evaluation
intervention beneficiaries (the ‘treatment group’) and non-beneficiaries (the ‘comparison group’), and may also
involve selection of individuals or communities into treatment and comparison groups. Retrospective evaluations are
usually conducted after the implementation phase, and may exploit existing survey data, although the best
evaluations will collect data as close to baseline as possible, to ensure comparability of intervention and comparison
There are five key principles relating to internal validity (study design) and external validity (generalizability) which
rigorous Impact Evaluations should address: confounding factors, selection bias, spillover effects, contamination,
and impact heterogeneity.
Confounding occurs where certain factors, typically relating to socio-economic status, are correlated with both
exposure to the intervention and, independent of exposure, are causally related to the outcome of interest.
Confounding factors are therefore alternate explanations for an observed (possibly spurious) relationship between
intervention and outcome.
Selection bias, a special case of confounding, occurs where intervention participants are non-randomly drawn from
the beneficiary population, and the criteria determining selection are correlated with outcomes. Unobserved factors,
which are associated with access to or participation in the intervention, and are causally related to the outcome of
interest, may lead to a spurious relationship between intervention and outcome if unaccounted for. Self-selection
occurs where, for example, more able or organized individuals or communities, who are more likely to have better
outcomes of interest, are also more likely to participate in the intervention. Endogenous program selection occurs
where individuals or communities are chosen to participate because they are seen to be more likely to benefit from
the intervention. Ignoring confounding factors can lead to a problem of omitted variable bias. In the special case of
selection bias, the endogeneity of the selection variables can cause simultaneity bias.
Spillover (referred to as contagion in the case of experimental evaluations) occurs when members of the comparison
(control) group are affected by the intervention. Contamination occurs when members of treatment and/or
comparison groups have access to another intervention which also affects the outcome of interest.
Impact heterogeneity refers to differences in impact due by beneficiary type and context. High quality Impact
Evaluations will assess both the extent to which different groups (e.g. the disadvantaged) benefit from an
intervention as well as the potential effect of context on impact. The degree that results are generalizable will
determine the applicability of lessons learned for interventions in other contexts.
Impact evaluation designs are identified by the type of methods used to generate the counterfactual and can be
broadly classified into three categories – experimental, quasi-experimental and non-experimental designs – that vary
in feasibility, cost, involvement during design or after implementation phase of the intervention, and degree of
selection bias. White (2006)
and Ravallion (2008)
discusses alternate Impact Evaluation approaches.
• Experimental design
Under experimental evaluations the treatment and comparison groups are selected randomly and isolated both from
the intervention, as well as any interventions which may affect the outcome of interest. These evaluation designs are
referred to as randomized control trials (RCTs). In experimental evaluations the comparison group is called a control
group. When randomization is implemented over a sufficiently large sample with no contagion by the intervention,
the only difference between treatment and control groups on average is that the latter does not receive the
intervention. Random sample surveys, in which the sample for the evaluation is chosen on a random basis, should
not be confused with experimental evaluation designs, which require the random assignment of the treatment.
The experimental approach is often held up as the ‘gold standard’ of evaluation, and it is the only evaluation design
which can conclusively account for selection bias in demonstrating a causal relationship between intervention and
outcomes. Randomization and isolation from interventions might not be practicable in the realm of social policy, and
may also be ethically difficult to defend,
although there may be opportunities to utilize natural experiments.
Bamberger and White (2007)
highlight some of the limitations to applying RCTs to development interventions.
Impact evaluation
Methodological critiques have been made by Scriven (2008)
on account of the biases introduced since social
interventions cannot be triple blinded, and Deaton (2009)
has pointed out that in practice analysis of RCTs falls
back on the regression-based approaches they seek to avoid, and so are subject to the same potential biases. Other
problems include the often heterogeneous and changing contexts of interventions, logistical and practical challenges,
difficulties with monitoring service delivery, access to the intervention by the comparison group and changes in
selection criteria and/or intervention over time. Thus, it is estimated that RCTs are only applicable to 5 per cent of
development finance.
• Quasi-experimental design
Quasi-experimental approaches can remove bias arising from selection on observables and, where panel data are
available, time invariant unobservables. Quasi-experimental methods include matching, differencing, instrumental
variables and the pipeline approach, and are usually carried out by multivariate regression analysis.
If selection characteristics are known and observed then they can be controlled for to remove the bias. Matching
involves comparing program participants with non-participants based on observed selection characteristics.
Propensity score matching (PSM) uses a statistical model to calculate the probability of participating on the basis of
a set of observable characteristics, and matches participants and non-participants with similar probability scores.
Regression discontinuity design exploits a decision rule as to who does and does not get the intervention to compare
outcomes for those just either side of this cut-off.
Difference-in-differences or double differences, which use data collected at baseline and end-line for intervention
and comparison groups, can be used to account for selection bias with under the assumption that unobservable
factors determining selection are fixed over time (time invariant).
Instrumental variables estimation accounts for selection bias by modelling participation using factors (‘instruments’)
that are correlated with selection but not the outcome, thus isolating the aspects of program participation which can
be treated as exogenous.
The pipeline approach (stepped-wedge design) uses beneficiaries already chosen to participate in a project at a later
stage as the comparison group. The assumption is that as they have been selected to receive the intervention in the
future they are similar to the treatment group, and therefore comparable in terms of outcome variables of interest.
However, in practice, it cannot be guaranteed that treatment and comparison groups are comparable and some
method of matching will need to be applied to verify comparability.
• Non-experimental design
Non-experimental Impact Evaluations are so-called because they do not involve a comparison group which does not
have access to the intervention. The method used in non-experimental evaluation is to compare intervention groups
before and after implementation of the intervention. Intervention interrupted time-series (ITS) evaluations require
multiple data points on treated individuals both before and after the intervention, while before versus after (or
pre-test post-test) designs simply require a single data point before and after. Post-test analyses include data after the
intervention from the intervention group only. Non-experimental designs are the weakest evaluation design, because
in order to show a causal relationship between intervention and outcomes convincingly, the evaluation must
demonstrate that any likely alternate explanations for the outcomes are irrelevant. However, there remain
applications to which this design is relevant, for example in calculating time-savings from an intervention which
improves access to amenities. In addition, there may be cases where non-experimental designs are the only feasible
impact evaluation design, such as universally-implemented programmes or national policy reforms in which no
isolated comparison groups are likely to exist.
Impact evaluation
Biases in estimating programme effects
Randomized field experiments are the strongest research designs for assessing program impact. This particular
research design is said to generally be the design of choice when it is feasible as it allows for a fair and accurate
estimate of the program’s actual effects (Rossi, Lipsey & Freeman, 2004). However with that said, randomized field
experiments are not always feasible to carry out and in these situations there are alternative research designs that are
at the disposal of an evaluator. The main problem though is that regardless of which design an evaluator chooses,
they are prone to a common problem; regardless of how well thought through or well implemented the design is,
each design is subject to yielding biased estimates of the program effects. These biases play the role of either
exaggerating or diminishing program effects. Not only that, but the direction the bias may take cannot usually be
known in advance (Rossi et al, 2004). These biases affect the interest of the stakeholder. Furthermore it is possible
that program participants are disadvantaged if the bias is in such a way that it contributes to making an ineffective or
harmful program seem effective. There is also the possibility that a bias can make an effective program seem
ineffective or even as far as harmful. This could possibly make the accomplishments of program seem small or even
insignificant therefore forcing the personnel and even cause the program’s sponsors to reduce or eliminate the
funding for the program (Rossi et al, 2004). So it is safe to say that if an inadequate design yields bias, the
stakeholders who are largely responsible for the funding of the program will be the ones most concerned as the
results of the evaluation will assist the stakeholders in making the decision of whether or not they choose to continue
funding the program because at the end of the day the final decision does lie with the funders and the sponsors. Not
only are the stakeholders mostly concerned but those taking part in the program or those the program is intended to
positively affect will be affected by the design chosen and the outcome rendered by that chosen design. Therefore the
evaluator’s concern is to minimize the amount of bias in the estimation of program effects (Rossi et al, 2004).
Biases are normally visible in two situations; when the measurement of the outcome with program exposure or the
estimate of what the outcome would have been without the program exposure is higher or lower than the
corresponding “true” value (p267). Unfortunately however, not all forms of bias that may compromise impact
assessment are obvious (Rossi et al, 2004). The most common form of impact assessment design is comparing two
groups of individuals or other units, an intervention group that receives the program and a control group that does
not. The estimate of program effect is then based on the difference between the groups on a suitable outcome
measure (Rossi et al, 2004). The random assignment of individuals to program and control groups allows for making
the assumption of continuing equivalence. Group comparisons that have not been formed through randomization are
known as non-equivalent comparison designs (Rossi et al, 2004).
• Selection bias
When there is an absence of the assumption of equivalence, the difference in outcome between the groups that would
have occurred regardless creates a form of bias in the estimate of program effects. This bias is known as selection
bias (Rossi et al, 2004). This particular bias creates a threat to the validity of the program effect estimate in any
impact assessment using a non-equivalent group comparison design and appears in situations where some process
responsible for influences that are not fully known selects which individuals will be in which group instead of the
assignment to groups being determined by pure chance (Rossi et al, 2004). Selection bias can also occur through
natural or deliberate processes that causing a loss of outcome data for members of intervention and control groups
that have already been formed, this is known as attrition and it can come about in two ways (Rossi et al, 2004).
These ways are namely (1) targets drop out of the intervention or control group cannot be reached or (2) targets
refuse to co-operate in outcome measurement. Differential attrition is assumed when attrition occurs as a result of
something either than explicit chance process (Rossi et al, 2004). This means that “those individuals that were from
the intervention group whose outcome data are missing cannot be assumed to have the same outcome-relevant
characteristics as those from the control group whose outcome data are missing” (Rossi et al, 2004, p271). However
random assignment designs are not safe from selection bias which is induced by attrition (Rossi et al, 2004).
• Other forms of bias
Impact evaluation
There are other factors that can be responsible for bias in the results of an impact assessment. These generally have
to do with events or experiences other than receiving the program that occur during the period of the intervention.
These biases include; secular trends, interfering events and maturation (Rossi et al, 2004).
• Secular trends or Secular drift
Secular trends can be defined as being relatively long-term trends in the community, region or country. These are
also termed secular drift and may produce changes that enhance or mask the apparent effects of a (Rossi et al, 2004).
For example, in a period when a community’s birth rate is declining, a program to reduce fertility may appear
effective because of bias stemming from that downward trend (Rossi et al, 2004, p273).
• Interfering events
Interfering events are similar to secular trends but in this case it is the short-term events that can produce changes
that may introduce bias into estimates of program effect, such as a power outage disrupting communications and
hampers the delivery of food supplements may interfere with a nutritional program (Rossi et al, 2004, p273).
• Maturation
Impact evaluation needs to be able to cope with the fact that natural maturational and developmental processes can
produce considerable change independently of the program. Including these changes in the estimates of program
effects would result in bias estimates. An example of this form of bias would be a program to improve preventative
health practices among adults may seem ineffective because health generally declines with age (Rossi et al, 2004,
“Careful maintenance of comparable circumstances for program and control groups between random assignment and
outcome measurement should prevent bias from the influence of other differential experiences or events on the
groups. If either of these conditions is absent from the design, there is potential for bias in the estimates of program
effect” (Rossi et al, 2004, p274).
Estimation methods
Estimation methods broadly follow evaluation designs. Different designs require different estimation methods to
measure changes in well-being from the counterfactual. In experimental and quasi-experimental evaluation, the
estimated impact of the intervention is calculated as the difference in mean outcomes between the treatment group
(those receiving the intervention) and the control or comparison group (those who don’t). The single difference
estimator compares mean outcomes at end-line and is valid where treatment and control groups have the same
outcome values at baseline. The difference-in-difference (or double difference) estimator calculates the difference in
the change in the outcome over time for treatment and comparison groups, thus utilizing data collected at baseline
for both groups and a second round of data collected at end-line, after implementation of the intervention, which may
be years later.
Impact Evaluations which compare average outcomes in the treatment group, irrespective of beneficiary
participation (also referred to as ‘compliance’ or ‘adherence’), to outcomes in the comparison group are referred to as
intention-to-treat (ITT) analyses. Impact Evaluations which compare outcomes among beneficiaries who comply or
adhere to the intervention in the treatment group to outcomes in the control group are referred to as
treatment-on-the-treated (TOT) analyses. ITT therefore provides a lower-bound estimate of impact, but is arguably
of greater policy relevance than TOT in the analysis of voluntary programs.
Impact evaluation
Debates in Impact Evaluation
While there is agreement on the importance of Impact Evaluation, and a consensus is emerging around the use of
counterfactual evaluation methods, there has also been widespread debate in recent years on both the definition of
Impact Evaluation and the use of appropriate methods (see White 2009
for an overview).
Definitions of Impact Evaluation
The International Initiative for Impact Evaluation (3ie) defines rigorous Impact Evaluations as: ”analyses that
measure the net change in outcomes for a particular group of people that can be attributed to a specific program
using the best methodology available, feasible and appropriate to the evaluation question that is being investigated
and to the specific context”.
According to the World Bank’s DIME Initiative, “Impact evaluations compare the outcomes of a program against a
counterfactual that shows what would have happened to beneficiaries without the program. Unlike other forms of
evaluation, they permit the attribution of observed changes in outcomes to the program being evaluated by following
experimental and quasi-experimental designs”.
Similarly, according to the US Environmental Protection Agency impact evaluation is a form of evaluation that
assesses the net effect of a program by comparing program outcomes with an estimate of what would have happened
in the absence of a program.
According to the World Bank's Independent Evaluation Group (IEG), impact evaluation is the systematic
identification of the effects positive or negative, intended or not on individual households, institutions, and the
environment caused by a given development activity such as a program or project.
Impact Evaluation has been defined differently over the past few decades.
Other interpretations of Impact
Evaluation include:
• An evaluation which looks at the impact of an intervention on final welfare outcomes, rather than only at project
outputs, or a process evaluation which focuses on implementation;
• An evaluation carried out some time (five to ten years) after the intervention has been completed so as to allow
time for impact to appear; and
• An evaluation considering all interventions within a given sector or geographical area.
Common definitions of ‘impact’ used in evaluation generally refer to the totality of longer-term consequences
associated with an intervention on quality-of-life outcomes. For example, the Organization for Economic
Cooperation and Development’s Development Assistance Committee (OECD-DAC) defines impact as the “positive
and negative, primary and secondary long-term effects produced by a development intervention, directly or
indirectly, intended or unintended”.
A number of international agencies have also adopted this definition of
impact. For example, UNICEF defines impact as “The longer term results of a program – technical, economic,
socio-cultural, institutional, environmental or other – whether intended or unintended. The intended impact should
correspond to the program goal.”
Similarly, Evaluationwiki.org defines impact evaluation as an evaluation that
looks beyond the immediate results of policies, instruction, or services to identify longer-term as well as unintended
program effects.
Technically, an evaluation could be conducted to assess ‘impact’ as defined here without reference to a
counterfactual. However, this would be more appropriately referred to as outcome monitoring. The NONIE
Guidelines on Impact Evaluation
adopt the OECD-DAC definition of impact while referring to the techniques
used to attribute impact to an intervention as necessarily based on counterfactual analysis.
Impact evaluation
Methodological debates
There is intensive debate in academic circles around the appropriate methodologies for Impact Evaluation, between
proponents of experimental methods on the one hand and proponents of more general methodologies on the other.
William Easterly has referred to this as ‘The Civil War in Development economics’
. Proponents of experimental
designs, sometimes referred to as ‘randomistas’,
argue randomization is the only means to ensure unobservable
selection bias is accounted for, and that building up the flimsy experimental evidence base should be developed as a
matter of priority.
In contrast, others argue that randomized assignment is seldom appropriate to development
interventions and even when it is, experiments provide us with information on the results of a specific intervention
applied to a specific context, and little of external relevance.
There has been criticism from evaluation bodies and
others that some donors and academics over-emphasize favoured methods for Impact Evaluation,
and that this
may in fact hinder learning and accountability.
Theory-Based Impact Evaluation
While knowledge of effectiveness is vital, it is also important to understand the reasons for effectiveness and the
circumstances under which results are likely to be replicated. In contrast with ‘black box’ Impact Evaluation
approaches, which only report mean differences in outcomes between treatment and comparison groups,
Theory-Based Impact Evaluation involves mapping out the causal chain from inputs to outcomes and impact and
testing the underlying assumptions.

Most interventions within the realm of public policy are of a voluntary,
rather than coercive (legally required) nature. In addition, interventions are often active rather than passive, requiring
a greater rather than lesser degree of participation among beneficiaries and therefore behavior change as a
pre-requisite for effectiveness. Public policy will therefore be successful to the extent that people are incentivized to
change their behaviour favourably. A Theory-Based approach enables policy-makers to understand the reasons for
differing levels of program participation (referred to as ‘compliance’ or ‘adherence’) and the processes determining
behavior change. Theory-Based approaches use both quantitative and qualitative data collection, and the latter can be
particularly useful in understanding the reasons for compliance and therefore whether and how the intervention may
be replicated in other settings. Methods of qualitative data collection include focus groups, in-depth interviews,
participatory rural appraisal (PRA) and field visits, as well as reading of anthropological and political literature.
White (2009b)
advocates more widespread application of a theory-based approach to impact evaluation as a
means to improve policy relevance of Impact Evaluations, outlining six key principles of the theory-based approach:
1. Map out the causal chain (program theory) which explains how the intervention is expected to lead to the intended
outcomes, and collect data to test the underlying assumptions of the causal links. 2. Understand context, including
the social, political and economic setting of the intervention. 3. Anticipate heterogeneity to help in identifying
sub-groups and adjusting the sample size to account for the levels of disaggregation to be used in the analysis. 4.
Rigorous evaluation of impact using a credible counterfactual (as discussed above). 5. Rigorous factual analysis of
links in the causal chain. 6. Use mixed methods (a combination of quantitative and qualitative methods).
Examples of impact evaluations
While experimental Impact Evaluation methodologies have been used to assess nutrition and water and sanitation
interventions in developing countries since the 1980s, the first, and best known, application of experimental methods
to a large-scale development program is the evaluation of the Conditional Cash Transfer (CCT) program Progresa
(now called Oportunidades) in Mexico, which examined a range of development outcomes, including schooling,
immunization rates and child work.

CCT programs have since been implemented by a number of
governments in Latin America and elsewhere, and a report released by the World Bank in February 2009 examines
the impact of CCTs across twenty countries.
More recently, Impact Evaluation has been applied to a range of interventions across social and productive sectors.
3ie has launched an online database of impact evaluations
covering studies conducted in low- and middle income
Impact evaluation
countries. Other organisations publishing Impact Evaluations include Innovations for Poverty Action
, the World
Bank's DIME Initiative
. The IEG of the World Bank has systematically assessed and
summarized the experience of ten impact evaluation of development programs in various sectors carried out over the
past 20 years.
Organizations promoting Impact Evaluation of Development Interventions
In 2006, the Evaluation Gap Working Group
argued for a major gap in the evidence on development
interventions, and in particular for an independent body to be set up to plug the gap by funding and advocating for
rigorous Impact Evaluation in low- and middle-income countries. The International Initiative for Impact Evaluation
was set up in response to this report. 3ie seeks to improve the lives of poor people in low- and
middle-income countries by providing, and summarizing, evidence of what works, when, why and for how much. 3ie
operates a grant program, financing impact studies in low- and middle-income countries and synthetic reviews of
existing evidence updated as new evidence appears, and supports quality impact evaluation through its quality
assurance services.
Another initiative devoted to the evaluation of impacts is the Committee on Sustainability Assessment (COSA)
COSA is a non-profit global consortium of institutions, sustained in partnership with the International Institute for
Sustainable Development (IISD) Sustainable Commodity Initiative, the United Nations Conference on Trade and
Development (UNCTAD), and the United Nations International Trade Centre (ITC). COSA is developing and
applying an independent measurement tool to analyze the distinct social, environmental and economic impacts of
agricultural practices, and in particular those associated with the implementation of specific sustainability programs
(Organic, Fairtrade etc.). The focus of the initiative is to establish global indicators and measurement tools which
farmers, policy-makers, and industry can use to understand and improve their sustainability with different crops or
agricultural sectors. COSA facilitates this by enabling them to accurately calculate the relative costs and benefits of
becoming involved in any given sustainability initiative. As of 2010 COSA is being applied in the coffee sector in:
Colombia, Guatemala, Tanzania, Honduras, Kenya, Peru, Costa Rica, and Vietnam, as well as in cocoa in: Ghana
and Côte d'Ivoire.
A number of additional organizations have been established to promote impact evaluation globally, including
Innovations for Poverty Action
, the World Bank’s Development Impact Evaluation (DIME) Initiative
, the
Institutional Learning and Change (ILAC) Initiative
of the CGIAR, and the Network of Networks on Impact
Evaluation (NONIE)
Systematic reviews of Impact evidence
A range of organizations are working to coordinate the production of systematic reviews. Systematic reviews aim to
bridge the research-policy divide by assessing the range of existing evidence on a particular topic, and presenting the
information in an accessible format. Like rigorous Impact Evaluations, they are developed from a study Protocol
which sets out a priori the criteria for study inclusion, search and methods of synthesis. Systematic reviews involve
five key steps: determination of interventions, populations, outcomes and study designs to be included; searches to
identify published and unpublished literature, and application of study inclusion criteria (relating to interventions,
populations, outcomes and study design), as set out in study Protocol; coding of information from studies;
presentation of quantitative estimates on intervention effectiveness using forest plots and, where interventions are
determined as appropriately homogeneous, calculation of a pooled summary estimate using meta-analysis; finally,
systematic reviews should be updated periodically as new evidence emerges. Systematic reviews may also involve
the synthesis of qualitative information, for example relating to the barriers to, or facilitators of, intervention
Organizations supporting the production of systematic reviews include the Cochrane Collaboration
, which has
been coordinating systematic reviews in the medical and public health fields since 1993, and publishes the Cochrane
Impact evaluation
which is definitive systematic review methodology guide. In addition, the Campbell Collaboration
has coordinated the production of systematic reviews of social interventions since 2000, and the International
Initiative for Impact Evaluation (in partnership with the Campbell Collaboration) is funding systematic reviews of
social programs in developing countries. Other organizations supporting systematic reviews include the Institute of
Education’s EPPI-Centre
and the University of York’s Centre for Reviews and Dissemination
The body of evidence from systematic reviews is large and available through various online portals including the
Cochrane library
, the Campbell library
, and the Centre for Reviews and Dissemination
. The available
evidence from Reviews of development interventions in low- and middle-income countries is being built up by
organisations such as the International Initiative for Impact Evaluation's synthetic reviews programme
Sources and external links
• Gertler, Martinez, Premand, Rawlings and Vermeersch (2011) Impact Evaluation in Practice, Washington,
DC:The World Bank
• World Bank Poverty Group World Bank Poverty Group
• World Bank Independent Evaluation Group
• Baker, Judy. 2000. Evaluating the Impact of Development Projects on Poverty: A Handbook for Practitioners.
Directions in Development, World Bank, Washington, D.C.
• International Initiative for Impact Evaluation
• Innovations for Poverty Action
• Cochrane Collaboration
• Campbell Collaboration
• Committee on Sustainability Assessment (COSA)
• International Institute for Sustainable Development (IISD)
• UN International Trade Centre (ITC)
[1] World Bank Poverty Group on Impact Evaluation (http:// web.worldbank.org/WBSITE/EXTERNAL/TOPICS/EXTPOVERTY/
EXTISPMA/ 0,,menuPK:384336~pagePK:149018~piPK:149093~theSitePK:384329,00. html), accessed on January 6, 2008
[2] White, H. (2006) Impact Evaluation: The Experience of the Independent Evaluation Group of the World Bank, World Bank, Washington,
D.C., p. 3 (http:// lnweb90.worldbank.org/ oed/ oeddoclib. nsf/ DocUNIDViewForJavaSearch/35BC420995BF58F8852571E00068C6BD/
$file/impact_evaluation. pdf)
[3] Gertler, Martinez, Premand, Rawlings and Vermeersch (2011) Impact Evaluation in Practice, Washington, DC:The World Bank (http://
publications.worldbank.org/index. php?main_page=product_info&cPath=1& products_id=23915)
[4] Briceno, B. and Gaarder, M. (2009) Institutionalizing Evaluation: A review of international experience, DFID/3ie, New Delhi (http:// www.
3ieimpact.org/ India_Report_DFID.pdf)
[5] International Initiative for Impact Evaluation (3ie), Principles for Impact Evaluation (http:/ / www.3ieimpact. org/doc/ principles for impact
[6] White, H. (2006) Impact Evaluation: The Experience of the Independent Evaluation Group of the World Bank, World Bank, Washington,
D.C. (http:// lnweb90. worldbank.org/ oed/ oeddoclib. nsf/ DocUNIDViewForJavaSearch/35BC420995BF58F8852571E00068C6BD/ $file/
impact_evaluation. pdf)
[7] Ravallion, M. (2008) Evaluating Anti-Poverty Programs (http:// siteresources. worldbank.org/INTISPMA/Resources/
[8] Ravallion, M. (2009) Should the Randomistas Rule? The Economists' Voice, Volume 6, Number 2 (http:// ideas. repec.org/a/ bpj/ evoice/
v6y2009i2n6. html)
[9] Bamberger, M. and White, H. (2007) Using Strong Evaluation Designs in Developing Countries: Experience and Challenges, Journal of
MultiDisciplinary Evaluation, Volume 4, Number 8, 58-73 (http:// www.eric. ed. gov/ ERICWebPortal/custom/ portlets/ recordDetails/
detailmini.jsp?_nfpb=true&_& ERICExtSearch_SearchValue_0=EJ800319&ERICExtSearch_SearchType_0=no&accno=EJ800319)
[10] Scriven (2008) A Summative Evaluation of RCT Methodology: & An Alternative Approach to Causal Research, Journal of
MultiDisciplinary Evaluation, Volume 5, Number 9, 11-24
[11] Deaton (2009) Instruments of Development: Randomization in the Tropics, and the Search for the Elusive Keys to Economic Development,
NBER Working Paper No. w14690 (http:// papers. ssrn. com/ sol3/ papers.cfm?abstract_id=1335715)
Impact evaluation
[12] Bamberger, M. and White, H. (2007) Using Strong Evaluation Designs in Developing Countries: Experience and Challenges, Journal of
MultiDisciplinary Evaluation, Volume 4, Number 8, 58-73 (http:// www.eric. ed. gov/ ERICWebPortal/custom/ portlets/ recordDetails/
detailmini.jsp?_nfpb=true&_& ERICExtSearch_SearchValue_0=EJ800319&ERICExtSearch_SearchType_0=no&accno=EJ800319)
[13] Bloom, H. (2006) The core analytics of randomized experiments for social research. MDRC Working Papers on Research Methodology.
MDRC, New York (http:// www.eric.ed. gov/ ERICDocs/ data/ ericdocs2sql/ content_storage_01/ 0000019b/ 80/ 1b/ e9/ b8.pdf)
[14] White, H. (2009) Some reflections on current debates in impact evaluation, Working paper 1, International Initiative for Impact Evaluation,
New Delhi (http:/ / www.3ieimpact. org/ admin/ pdfs_papers/ 11. pdf)
[15] International Initiative for Impact Evaluation (3ie) (2008) Principles for Impact Evaluation. 3ie, New Delhi (http:// www.3ieimpact. org/
doc/ principles for impact evaluation.pdf)
[16] World Bank (n.d.) The Development IMpact Evaluation (DIME) Initiative, Project Document, World Bank, Washington, D.C. (http://
siteresources.worldbank.org/ INTDEVIMPEVAINI/Resources/ DIME_project_document-rev.pdf)
[17] US Environmental Protection Agency Program Evaluation Glossary (http:// www.epa.gov/ evaluate/ glossary/ i-esd. htm), accessed on
January 6, 2008
[18] World Bank Independent Evaluation Group (http:/ / www.worldbank.org/ ieg/ie/ ), accessed on January 6, 2008
[19] White, H. (2006) Impact Evaluation: The Experience of the Independent Evaluation Group of the World Bank, World Bank, Washington,
D.C. (http:/ / lnweb90. worldbank.org/ oed/ oeddoclib. nsf/ DocUNIDViewForJavaSearch/35BC420995BF58F8852571E00068C6BD/ $file/
impact_evaluation. pdf)
[20] OECD-DAC (2002) Glossary of Key Terms in Evaluation and Results-Based Management Proposed Harmonized Terminology, OECD,
Paris (http:// www. oecd. org/ dataoecd/ 8/ 43/ 40501129. pdf)
[21] UNICEF (2004) UNICEF Evaluation Report Standards, Evaluation Office, UNICEF NYHQ, New York (http:// www. unicef. org/
evaldatabase/ files/ UNICEF_Eval_Report_Standards.pdf)
[22] Evaluation Definition: What is Evaluation? - EvaluationWiki (http:/ / www.evaluationwiki.org/index. php/
[23] http:// www. worldbank.org/ ieg/ nonie/ guidance. html
[24] Leeuw, F. and Vaessen, J. (2009) Impact Evaluations and Development: NONIE Guidance on Impact Evaluation, World Bank, Washington,
D.C. (http:/ / www. worldbank.org/ ieg/ nonie/ guidance. html)
[25] http:/ / aidwatchers.com/ 2009/ 12/ the-civil-war-in-development-economics/
[26] Ravallion, M. (2009) Should the Randomistas Rule? The Economists' Voice, Volume 6, Number 2 (http:/ / ideas. repec.org/a/ bpj/ evoice/
v6y2009i2n6. html)
[27] Banerjee, A. V. (2007) ‘Making Aid Work’ Cambridge, Boston Review Book, MIT Press, MA (http:// www.mdgoals.net/ wp-content/
[28] Bamberger, M. and White, H. (2007) Using Strong Evaluation Designs in Developing Countries: Experience and Challenges, Journal of
MultiDisciplinary Evaluation, Volume 4, Number 8, 58-73 (http:/ / www.eric. ed. gov/ ERICWebPortal/custom/ portlets/ recordDetails/
detailmini.jsp?_nfpb=true& _& ERICExtSearch_SearchValue_0=EJ800319&ERICExtSearch_SearchType_0=no&accno=EJ800319)
[29] http:// www. europeanevaluation. org/download/ ?noGzip=1&id=1969403 EES Statement on the importance of a methodologically diverse
approach to impact evaluation
[30] http:/ / www. odi. org.uk/ resources/ odi-publications/opinions/ 127-impact-evaluation.pdf The 'gold standard' is not a silver bullet for
[31] White, H. (2009b) Theory-based impact evaluation: Principles and practice, Working Paper 3, International Initiative for Impact Evaluation,
New Delhi (http:// www.3ieimpact. org/ admin/ pdfs_papers/ 51. pdf)
[32] Leeuw, F. and Vaessen, J. (2009) Impact Evaluations and Development: NONIE Guidance on Impact Evaluation, World Bank, Washington,
D.C. (http:/ / www. worldbank.org/ ieg/ nonie/ guidance. html)
[33] White, H. (2009b) Theory-based impact evaluation: Principles and practice, Working Paper 3, International Initiative for Impact Evaluation,
New Delhi (http:/ / www.3ieimpact. org/ admin/ pdfs_papers/ 51. pdf)
[34] Gertler, P. (2000) Final Report: The Impact of PROGRESA on Health. International Food Policy Research Institute, Washington, D.C.
(http:/ / www. ifpri.org/ sites/ default/files/ publications/ gertler_health.pdf)
[35] Behrman, J., Sengupta, P. and Todd, P. (2002) Progressing Through Progresa: An Impact assessment of a school subsidy experiment in
Mexico. (http:// athena. sas. upenn. edu/ ~petra/ papers/ trans18. pdf)
[36] Fiszbein, A. and Schady, N. (2009) Conditional Cash Transfers: Reducing present and future poverty: A World Bank Policy Research
Report, World Bank, Washington, D.C.
[37] http:// www. 3ieimpact. org/database_of_impact_evaluations. html
[38] http:// poverty-action.org/work/ publications
[39] http:// www. worldbank.org/ dime
[40] http:// www. worldbank.org/ ieg/ nonie/ papers. html
[41] Impact Evaluation: The Experience of the Independent Evaluation Group of the World Bank, 2006 (http:// lnweb18. worldbank.org/oed/
oeddoclib.nsf/ DocUNIDViewForJavaSearch/35BC420995BF58F8852571E00068C6BD/ $file/ impact_evaluation.pdf)
[42] Savedoff, W., Levine, R. and Birdsall, N. (2006), ‘When will we ever learn: Improving lives through impact evaluation’, Report of the
Evaluation Gap Working Group, May, Center for Global Development, Washington, D.C. (http:/ / www. cgdev.org/ content/ publications/
Impact evaluation
[43] http:/ / www. 3ieimpact. org
[44] http:/ / sustainablecommodities. org/cosa
[45] http:// poverty-action.org/
[46] http:/ / www. cgiar-ilac.org
[47] http:/ / www. worldbank.org/ ieg/ nonie/
[48] http:// www. cochrane. org/
[49] http:/ / www. cochrane-handbook.org
[50] http:/ / www. campbellcollaboration.org/
[51] http:/ / eppi.ioe.ac. uk/ cms/ Default.aspx
[52] http:/ / www. york.ac. uk/ inst/ crd/
[53] http:/ / www. thecochranelibrary.com/
[54] http:/ / www. campbellcollaboration.org/ library.php
[55] http:// www. crd.york. ac. uk/ crdweb/
[56] http:/ / www. 3ieimpact. org/syntheticreviews/
[57] http:// www. worldbank.org/ ieinpractice
[59] http:// www. worldbank.org/ ieg/ ie/
[61] http:// www. sustainablecommodities. org/cosa
[62] http:// www. iisd. org
[63] http:// www. intracen.org
Rossi, P. H., Lipsey, M. W., & Freeman, H. E. (2004). Evaluation: A systematic approach (7th Ed.). Thousand Oaks,
CA: Sage
Integrity is a concept of consistency of actions, values, methods, measures, principles, expectations, and outcomes.
In ethics, integrity is regarded as the honesty and truthfulness that is a verb or accuracy of one's actions. Integrity can
be regarded as the opposite of hypocrisy,
in that it regards internal consistency as a virtue, and suggests that parties
holding apparently conflicting values should account for the discrepancy or alter their beliefs.
The word "integrity" stems from the Latin adjective integer (whole, complete).
In this context, integrity is the
inner sense of "wholeness" deriving from qualities such as honesty and consistency of character. As such, one may
judge that others "have integrity" to the extent that they act according to the values, beliefs and principles they claim
to hold.
A value system's abstraction depth and range of applicable interaction may also function as significant factors in
identifying integrity due to their congruence or lack of congruence with observation. A value system may evolve
over time
while retaining integrity if those who espouse the values account for and resolve inconsistencies.
Testing of integrity
One can test a value system's integrity either:
1. subjectively, by human constructs of accountability and internal consistency, or
2. objectively, via the Scientific Method
Integrity in Relation to Value Systems
The actions of an entity (person or group) may be measured for consistency against that entity's espoused value
system to determine integrity. This type of measurement is subjective because its measures rely on the values of the
party doing the testing.
Where the measures of the test are consensual only to the party being measured, the test is created by the same value
system as the action in question and can result only in a positive proof. Thus, a neutral point of view requires testing
measures consensual to anyone expected to believe the results.
Subjective testing measures integrity in relationship to human constructs. While some constructs, such as
Mathematics, are considered to be very reliable, all human constructs are subject to humanity's assumptions of cause
and effect. To add causal testing of the greater universe, we employ the Scientific Method.
Testing Integrity via the Scientific Method
The Scientific Method assumes that a system with perfect integrity yields a singular extrapolation within its domain
that one can test against observed results. Where the results of measures, but all three of which produce different
extrapolated values when applied to real world situations. None of them claim to be absolute truth, but merely best
value systems for certain scenarios. Newtonian Physics demonstrates sufficiency for most activities on Earth, but
produced a calculation more than ten feet in error when applied to NASA's moon landings, whereas General
Relativity calculations were precise for that application. General Relativity, however, incorrectly predicts the results
of a broad body of scientific experiments where quantum mechanics proves its sufficiency. Thus integrity of all three
genres is applicable only to its domain.
Integrity in ethics
Ethical meanings of integrity used in medicine and law refer to a quality of "wholeness" that must be present in the
human body and in the body of law, respectively. Such wholeness is defined by "sacred" axioms such as unity,
consistency, purity, unspoiledness and uncorruptedness.
In discussions on behavior and morality, one view of the property of integrity sees it as the virtue of basing actions
on an internally-consistent framework of principles. This scenario may emphasize depth of principles and adherence
of each level of postulates or axioms to those it logically relies upon. One can describe a person as having ethical
integrity to the extent that everything that that person does or believes: actions, methods, measures and principles —
all of these derive from a single core group of values.
One essential aspect of a consistent framework is its avoidance of any unwarranted (arbitrary) exceptions for a
particular person or group — especially the person or group that holds the framework. In law, this principle of
universal application requires that even those in positions of official power be subject to the same laws as pertain to
their fellow citizens. In personal ethics, this principle requires that one should not act according to any rule that one
would not wish to see universally followed. For example, one should not steal unless one would want to live in a
world in which everyone was a thief. This was formally described by the philosopher Immanuel Kant in his
categorical imperative.
In the context of accountability, integrity serves as a measure of willingness to adjust a value system to maintain or
improve its consistency, when an expected result appears incongruent with observed outcome. Some regard integrity
as a virtue in that they see accountability and moral responsibility as necessary tools for maintaining such
In the context of value theory, integrity provides the expected causation from a base value to its extrapolated
implementation or other values. A value system emerges as a set of values and measures that one can observe as
consistent with expectations.
Some commentators stress the idea of integrity as personal honesty: acting according to one's beliefs and values at all
times. Speaking about integrity can emphasize the "wholeness" or "intactness" of a moral stance or attitude. Some
views of wholeness may also emphasize commitment and authenticity. Ayn Rand considered that integrity "does not
consist of loyalty to one's subjective whims, but of loyalty to rational principles".
Subjective interpretations
In common public usage, people sometimes use the word "integrity" in reference to a single "absolute" morality
rather than in reference to the assumptions of the value system in question. In an absolute context, the word
"integrity" conveys no meaning between people with differing definitions of absolute morality, and becomes nothing
more than a vague assertion of perceived political correctness or popularity, similar to using terms such as "good" or
"ethical" in a moralistic context.
One can also speak of "integrity" outside of its prescriptive meaning, in reference to a person or group of people of
which the speaker subjectively approves or disapproves. Thus a favored person can be described as "having
integrity", while an enemy can be regarded as "completely lacking in integrity". Such labeling, in the absence of
measures of independent testing, renders the accusation itself baseless and (ironically) others may call the integrity
of the assertion into question.
Integrity in modern ethics
In a formal study of the term "integrity" and its meaning in modern ethics, law professor Stephen L. Carter sees
integrity not only as a refusal to engage in behavior that evades responsibility, but also as an understanding of
different modes or styles in which discourse attempts to uncover a particular truth.
Carter writes that integrity requires three steps: "discerning what is right and what is wrong; acting on what you have
discerned, even at personal cost; and saying openly that you are acting on your understanding of right from wrong."
He regards integrity as being distinct from honesty.
Christian integrity
Strong's Concordance records 16 uses of words translated as "integrity" in the KJV Old Testament, and none in the
KJV New Testament.
One view of integrity in a Christian context states: "The Christian vision of integrity suggests that personal
authenticity entails living in accordance with personal convictions that are based on an understanding of science's
purposes for creation, humankind and the person as a liver of real life."
Integrity is a necessary foundation of any system based on the supremacy and objectivity of laws. Such systems are
distinct from those where personal autocracy governs. The latter systems are often lacking in integrity because they
elevate the subjective whims and needs of a single individual or narrow class of individuals above not only the
majority, but also the law's supremacy. Such systems also frequently rely on strict controls over public participation
in government and freedom of information. To the extent these behaviors involve dishonesty, turpitude, corruption
or deceit, they lack integrity. Facially "open" or "democratic" systems can behave in the same way and thereby lack
integrity in their legal processes.
In Anglo-American legal traditions, the adversarial process is generally, though not universally, viewed as the most
appropriate means of arriving at the truth in a given dispute. This process assumes a given set of substantive and
procedural rules that both sides in the dispute agree to respect. The process further assumes that both sides
demonstrate willingness to share evidence, follow guidelines of debate, and accept rulings from the fact-finder in a
good-faith effort to arrive at an equitable outcome. Whenever these assumptions are incorrect, the adversarial system
is rendered inequitable. In turn, any given case is weakened. More importantly, when these assumptions are correct,
truth is no longer the goal, justice is denied to the parties involved, and the overall integrity of the legal system called
into question. If the integrity of any legal system is called into question often or seriously enough, the society served
by that system is likely to experience some degree of disruption or even chaos in its operations as the legal system
demonstrates inability to function.
Psychological/work-selection tests
The procedures known as "integrity tests" or (more confrontationally) as "honesty tests"
aim to identify
prospective employees who may hide perceived negative or derogatory aspects of their past, such as a criminal
conviction, psychiatric treatment or drug abuse. Identifying unsuitable candidates can save the employer from
problems that might otherwise arise during their term of employment. Integrity tests make certain assumptions,
• that persons who have "low integrity" report more dishonest behaviour
• that persons who have "low integrity" try to find reasons in order to justify such behaviour
• that persons who have "low integrity" think others more likely to commit crimes — like theft, for example. (Since
people seldom sincerely declare to a prospective employers their past deviance, the "integrity" testers adopted an
indirect approach: letting the work-candidates talk about what they think of the deviance of other people,
considered in general, as a written answer demanded by the questions of the "integrity test".)
• that persons who have "low integrity" exhibit impulsive behaviour
• that persons who have "low integrity" tend to think that society should severely punish deviant behaviour
(Specifically, "integrity tests" assume that people who have a history of deviance report within such tests that they
support harsher measures applied to the deviance exhibited by other people.)
The claim of such tests to be able to detect "fake" answers plays a crucial role in detecting people who have low
integrity. Naive respondents really believe this pretense and behave accordingly, reporting some of their past
deviance and their thoughts about the deviance of others, fearing that if they do not answer truthfully their untrue
answers will reveal their "low integrity". These respondents believe that the more candid they are in their answers,
the higher their "integrity score" will be.
Other integrities
Disciplines and fields with an interest in integrity include philosophy of action, philosophy of medicine,
mathematics, the mind, cognition, consciousness, materials science, structural engineering, and politics. Popular
psychology identifies personal integrity, professional integrity, artistic integrity, and intellectual integrity.
The concept of integrity may also feature in business contexts beyond the issues of employee/employer honesty and
ethical behavior, notably in marketing or branding contexts. The "integrity" of a brand is regarded by some as a
desirable outcome for companies seeking to maintain a consistent, unambiguous position in the mind of their
audience. This integrity of brand includes consistent messaging and often includes using a set of graphics standards
to maintain visual integrity in marketing communications.
Another use of the term, "integrity" is found in the work of Michael C. Jensen Ph.D and Werner Erhard in their
academic paper, "Integrity: A Positive Model that Incorporates the Normative Phenomenon of Morality, Ethics, and
Legality". In this paper the authors explore a new model of integrity as the state of being whole and complete,
unbroken, unimpaired, sound, and in perfect condition. They posit a new model of integrity that provides access to
increased performance for individuals, groups, organizations, and societies. Their model "reveals the causal link
between integrity and increased performance, quality of life, and value-creation for all entities, and provides access
to that causal link."


Electronic signals are said to have integrity when there is no corruption of information between one domain and
another, such as from a disk drive to a computer display. Such integrity is a fundamental principle of information
assurance. Corrupted information is untrustworthy, yet uncorrupted information is of value.
[1] John Louis Lucaites; Celeste Michelle Condit, Sally Caudill (1999). Contemporary rhetorical theory: a reader. Guilford Press. pp. 92.
ISBN 1572304014.
[2] "integrity" (http:/ / www. bartleby.com/ 61/ 70/ I0177000.html). The American Heritage Dictionary of the English Language (4th edition
ed.). El- shaddai ØØØ. 2000. . Retrieved 2009-05-13. "... from integer, whole, complete".
[3] See for example Wiener, Yoash (October 1988). "Forms of Value Systems: A Focus on Organizational Effectiveness and Cultural Change
and Maintenance" (http:/ / www.jstor. org/ stable/ 258373). The Academy of Management Review (Academy of Management) 13 (4):
534–545. . Retrieved 2010-05-28. "An organizational value system may change and evolve. The typology offered above can be useful in
analyzing such developments. Initial phases of culture development most frequently are characterized by a charismatic value system, either
elitist or functional.".
[4] Compare Alee, Verna (2000) "The value evolution: Addressing larger implications of an intellectual capital and intangibles perspective"
(http:// www. openvaluenetworks. com/ Articles/ The_Value_Evolution-VA.pdf) (PDF) Journal of Intellectual Capital (MCB University
Press Ltd) 1 (1): 17–32 doi:10.1108/14691930010371627 ISSN 1469-1930 . Retrieved 2010-05-28 "We must begin to evolve our frameworks
to an expanded view of potential value domains. [...] Can we bring coherence and integrity to our business models in the light of the higher
values that we hold dear? Can we expand our intangible value models to integrate the good work that has gone on in view of social
responsibility and sustainable enterprise fields for decades?"
[5] Ayn Rand, “Doesn’t Life Require Compromise?” (The Virtue of Selfishness, page 69).
[6] Carter, Stephen L (1996). Integrity. New York: BasicBooks/HarperCollins. pp. 7, 10. ISBN 0-06-092807-7. On page 242 Carter credits
influence "to some extent by the fine discussion of integrity in Martin Benjamin's book Splitting the Difference: Compromise and Integrity in
Ethics and Politics (Lawrence University Press of Kansas, 1990).
[7] "KJV Concordance for -integrity-" (http:/ / www.blueletterbible.org/ search/ translationResults. cfm?Criteria=integrity&t=KJV). Blue Letter
Bible. . Retrieved 2009-11-19.
[8] Grenz, Stanley J.; Smith, Jay T. (2003). Pocket dictionary of ethics (http:/ / books. google. com/ books?id=2dLuym2H4PQC). IVP Pocket
Reference Series. InterVarsity Press. p. 61. ISBN 9780830814688. . Retrieved 2009-11-18.
[9] van Minden (2005:206-208): [...] deze 'integriteitstests' (dat klinkt prettiger dan eerlijkheids- of leugentests) [...] [Translation: ... these
'integrity tests' (that sounds nicer than honesty test or lies tests)]
[10] van Minden, Jack J.R. (2005) (in Dutch). Alles over psychologische tests (http:// www.bol. com/ nl/ p/ boeken/
alles-over-psychologische-tests/ 1001004001667648/ index. html). Business Contact. p. 207. ISBN 978-90-254-0415-4. . "De schriftelijke
integriteitstests zijn gemakkelijk af te nemen. Ze zijn gebaseerd op enkele aannamen, die er duidelijk in zijn terug te vinden:
Minder eerlijke personen: 1 rapporteren een grotere mate van oneerlijk gedrag 2 zijn geneigd eerder oneerlijk gedrag
te verontschuldigen 3 zijn geneigd meer excuses of redenen voor diefstal aan te voeren 4 denken vaker over diefstal
5 zien vaker oneerlijk gedrag als acceptabel 6 zijn vaker implusief 7 zijn geneigd zichzelf en anderen zwaarder te
straffen" . [Translation: The written integrity tests are easy to perform. They are based on some assumptions, which
are clearly found therein:
Less honest persons: 1 they report a higher amount of dishonest behavior 2 they are more prone to find excuses for
dishonest behavior 3 they are more prone to name excuses or reasons for theft 4 they think often about theft 5 they
see often dishonest behavior as acceptable 6 they are often impulsive 7 they are prone to punish themselves and
others severely.]"
[11] Van Minden (2005:207) writes “TIP: Dit type vragenlijsten melden koelbloedig dat zij kunnen ontdekken wanneer u een misleidend
antwoord geeft of de zaak bedondert. U weet langzammerhand dat geen enkele test zo'n claim waar kan maken, zelfs niet een die
gespecialiseerd is in het opsporen van bedriegers.” Translated: “TIP: This sort of questions lists mention in cool blood that they are able to
detect when you give a cheating answer or try to deceive the test. You are step by step learning that no test could make true such a pretense,
not even one specialized in detecting cheaters.”
[12] See abstract of Harvard Business School NOM Research Paper NO. 06-11 and Barbados Group Working Paper NO. 06-03 at: Erhard,
Werner; Michael C. Jensen; Steve Zaffron (2007). "Integrity: A Positive Model that Incorporates the Normative Phenomena of Morality,
Ethics and Legality" (http:// papers. ssrn. com/ sol3/ papers. cfm?abstract_id=920625). Social Science Research Network. . Retrieved
2008-12-03. "Integrity exists in a positive realm devoid of normative content. Integrity is thus not about good or bad, or right or wrong, or
what should or should not be. [...] We assert that integrity (the condition of being whole and complete) is a necessary condition for
workability, and that the resultant level of workability determines the available opportunity for performance."
[13] Erhard, Werner; Michael C. Jensen; Steve Zaffron (2010). "Integrity: A Positive Model that Incorporates the Normative Phenomena of
Morality, Ethics, and Legality - Abridged" (http:/ / papers.ssrn. com/ sol3/ papers.cfm?abstract_id=1542759). Social Science Research
Network. .
[14] Jensen, Michael C.; Karen Christensen (Interviewer) (January 14). "Integrity: Without it Nothing Works" (http:/ / papers.ssrn. com/ sol3/
papers.cfm?abstract_id=1511274). Rotman Magazine: The Magazine of the Rotman School of Management, pp. 16-20, Fall 2009 (Social
Science Research Network). .
External links
• Stanford Encyclopedia of Philosophy entry (http:// plato. stanford. edu/ entries/ integrity/)
• Werner Erhard, New Model of Integrity (http:// www.wernererhard.com/ integrity_paper.html)
• van Minden, Jack J.R. (2005) (in Dutch). Alles over psychologische tests (http:/ / www.nl. bol. com/ is-bin/
INTERSHOP.enfinity/ eCS/ Store/ nl/ -/EUR/ BOL_DisplayProductInformation-Start?Section=BOOK&
BOL_OWNER_ID=1001004001667648). Business Contact. pp. 206–208. ISBN 978-90-254-0415-4.
International Association for the Evaluation of Educational Achievement
International Association for the Evaluation of
Educational Achievement
The International Association for the Evaluation of Educational Achievement (IEA)
Motto: The International Association for the Evaluation of Educational Achievement (IEA) is an independent, international
cooperative of national research institutions and governmental research agencies
Secretariat Amsterdam, Netherlands
 -  Executive Director Dr. Hans Wagemaker
 -  IEA Chair Dr Seamus Hegarty
http:/ / www.iea.nl.
The International Association for the Evaluation of Educational Achievement (IEA) is an association of
national research institutions and government research agencies related to education. The IEA is an independent
organization. It was founded in 1958 and is headquartered in Amsterdam. Many policy-making decisions made in
the field of education are influenced by IEA studies.
The focus of the IEA is to conduct research studies of student performance in basic subjects such as math, science,
and reading. The IEA studies measure performance between students of different countries and whether certain
policies in a particular educational system cause positive or negative effects on learning.
Through its comparative research and assessment projects, IEA aims to:
1. Provide international benchmarks that may assist policy-makers in identifying the comparative strength and
weaknesses of their educational systems
2. Provide high-quality data that will increase policy-makers’ understanding of key school- and non-school-based
factors that influence teaching and learning
3. Provide high-quality data which will serve as a resource for identifying areas of concern and action, and for
preparing and evaluating educational reforms
4. Develop and improve educational systems’ capacity to engage in national strategies for educational monitoring
and improvement
5. Contribute to development of the world-wide community of researchers in educational evaluation
Since its inception in 1958, the IEA has conducted more than 23 research studies of cross-national achievement. The
regular cycle of studies encompasses learning in basic school subjects. Examples are
• the Trends in International Mathematics and Science Study (TIMSS 1995, TIMSS 1999, TIMSS 2003, TIMSS
2007), and
• the Progress in International Reading Literacy Study (PIRLS 2001, PIRLS 2006).
IEA projects also include studies of particular interest to IEA members, such as
• the TIMSS-R Video Study of Classroom Practices,
• the International Civic and Citizenship Education Study (ICCS),
• the Information Technology in Education Study(SITES-M1, SITES-M2, SITES 2006),
• a pre-primary education study (PPP).
In 2005 IEA initiated also its first study in tertiary education:
• Teachers Education and Development Study in Mathematics (TEDS-M).
International Association for the Evaluation of Educational Achievement
IEA studies are an important data source for those working to enhance students’ learning at the international, national
and local levels. By reporting on a wide range of topics and subject matters, the studies contribute to a deep
understanding of educational processes within individual countries, and across a broad international context. In
addition, the cycle of studies provides countries with an opportunity to measure progress in educational achievement
in mathematics, science and reading comprehension. The cycle of studies also enables monitoring of changes in the
implementation of educational policy and identification of new issues relevant to reform efforts.
Aims, methodology, and interpretation of IEA studies have been often criticised, most famously by Hans
Freudenthal, a Dutch mathematician and researcher in math education. He pointed to problems with enrollment rates,
the unsolved translation problem, the lack of curricular validity, the overinterpretation of numerical outcomes,
"Kafkaesque" confusion in the documentation and in the underlying decisions, dogmatic rejection of criticism.
[1] H. Freudenthal: Pupils achievements internationally compared — the IEA. Educational Studies in Mathematics 6, 127–186. Summarized
according to J. Wuttke: PISA & Co A Critical Online Bibliography http:/ / www. messen-und-deuten.de/ Pisa/ biblio.htm.
External links
• The International Association for the Evaluation of Educational Achievement (http:// www.iea.nl/ )
• The International Association for the Evaluation of Educational Achievement. ERIC Digest (http:/ / www.
ericdigests. org/pre-9218/international.htm)
• Trends in International Mathematics and Science Study (http:/ / timss. bc. edu/ )
• Progress in International Reading Literacy Study (http:// timss. bc. edu/ )
• International Civic and Citizenship Education Study (http:/ / iccs. acer.edu. au/ )
• Teacher Education and Development Study in Mathematics (http:/ / teds. educ.msu. edu/ default. asp/ )
• Second Information Technology in Education Study 2006 (http:// www. sites2006. net/ exponent/ index.
php?section=1/ )
Joint Committee on Standards for Educational Evaluation
Joint Committee on Standards for Educational
The Joint Committee on Standards for Educational Evaluation
is an American/Canadian based Standards
Developer Organization (SDO). The Joint Committee represents a coalition of major professional associations
formed in 1975 to help improve the quality of standardized evaluation. The Committee has thus far published three
sets of standards for evaluations. The Personnel Evaluation Standards (2nd edition)
was published in 1988
and updated in 2008, The Program Evaluation Standards (2nd edition)
was published in 1994 (the third
edition of which is in draft form as of 2008), and The Student Evaluation Standards
was published in 2003.
The Joint Committee is a private nonprofit organization. It is accredited by the American National Standards
(ANSI). Standards approved by ANSI become American National Standards
. In
addition to setting standards in evaluation, it also is involved in reviewing and updating its published standards
(every five years); training policymakers, evaluators, and educators in the use of the standards; and serving as a
clearinghouse on evaluation standards literature.
Each publication presents and elaborates a set of standards for use in a variety of educational settings. The standards
provide guidelines for designing, implementing, assessing and improving the identified form of evaluation. Each of
the standards has been placed in one of four fundamental categories to promote educational evaluations that are
proper, useful, feasible, and accurate.
The Personnel Evaluation Standards
The second edition of the Personnel Evaluation Standards (2008) is based on knowledge about personnel evaluation
gained from the professional literature and research/development since 1988. In this edition, six new standards were
added to the original 21 of the first edition. The Joint Committee on Standards for Educational Evaluation requires
that personnel evaluations be ethical, fair, useful, feasible, and accurate. The standards also provide special
consideration to issues of diversity.
It is not the intent of these standards to design or promote specific systems of evaluation, rather to ensure that
whatever system is in place provides a sound process most likely to produce the desired results.
The four attributes of sound educational evaluation practices are:
• The propriety standards require that evaluations be conducted legally, ethically, and with due regard for the
welfare of evaluatees and clients involved in. There are seven standards under this attribute which include service
orientation, appropriate policies and procedures, access to evaluation information, interactions with evaluatees,
comprehensive evaluation, conflict of interest, and legal viability.
• The utility standards are intended to guide evaluations so that they will be informative, timely, and influential.
There are six standards under this attribute which include constructive orientation, defined uses, evaluator
qualifications, explicit criteria, functional reporting, and follow-up/professional development.
• The feasibility standards call for evaluation systems that are as easy to implement as possible, efficient in their
use of time and resources, adequately funded, and viable from a number of other standpoints. There are three
standards under this attribute including practical procedures, political viability, and fiscal viability.
• The accuracy standards require that the obtained information be technically accurate and that conclusions be
linked logically to the data. There are eleven standards under this attribute including validity orientation, defined
expectations, analysis of context, documented purposes and procedures, defensible information, systemic data
control, bias identification and management, analysis of information, justified conclusions, and metaevaluation.
Joint Committee on Standards for Educational Evaluation
The Program Evaluation Standards
• The utility standards are intended to ensure that an evaluation will serve the information needs of intended users.
• The feasibility standards are intended to ensure that an evaluation will be realistic, prudent, diplomatic, and
• The propriety standards are intended to ensure that an evaluation will be conducted legally, ethically, and with
due regard for the welfare of those involved in the evaluation, as well as those affected by its results.
• The accuracy standards are intended to ensure that an evaluation will reveal and convey technically adequate
information about the features that determine worth or merit of the program being evaluated.
The Student Evaluation Standards
• The Propriety standards help ensure that student evaluations are conducted lawfully, ethically, and with regard to
the rights of students and other persons affected by student evaluation.
• The Utility standards promote the design and implementation of informative, timely, and useful student
• The Feasibility standards help ensure that student evaluations are practical; viable; cost-effective; and culturally,
socially, and politically appropriate.
• The Accuracy standards help ensure that student evaluations will provide sound, accurate, and credible
information about student learning and performance.
Sponsoring Organizations
The Joint Committee includes sixteen Sponsoring Organizations that reflect a balance of primarily client practitioner
and evaluation technical specialist perspectives. These organizations appoint and sponsor a member of the Joint
Committee. Each Sponsoring Organization is kept informed of the work of the Joint Committee and is afforded an
opportunity to contribute to the standard-setting process. Sponsoring Organizations include the following:
• American Association of School Administrators (AASA)
• American Counseling Association (ACA)
• American Educational Research Association (AERA)
• American Evaluation Association (AEA)
• American Indian Higher Education Consortium (AIHEC)
• American Psychological Association (APA)
• Canadian Evaluation Society (CES)
• Canadian Society for the Study of Education (CSSE)
• Consortium for Research on Educational Accountability and Teacher Evaluation (CREATE)
• Council of Chief State School Officers (CCSSO)
• National Association of Elementary School Principals (NAESP)
• National Association of School Psychologists (NASP)
• National Association of Secondary School Principals (NASSP)
• National Council on Measurement in Education (NCME)
• National Education Association (NEA)
• National Legislative Program Evaluation Society (NLPES)
• National Rural Education Association (NREA)
Joint Committee on Standards for Educational Evaluation
Notes and references
1. Joint Committee on Standards for Educational Evaluation
2. Joint Committee on Standards for Educational Evaluation. (1988). The Personnel Evaluation Standards: How to
Assess Systems for Evaluating Educators. Newbury Park, CA: Sage Publications.
3. Joint Committee on Standards for Educational Evaluation. (2011). The Program Evaluation Standards. Newbury
Park, CA: Sage Publications.
4. Joint Committee on Standards for Educational Evaluation. (2003). The Student Evaluation Standards: How to
Improve Evaluations of Students. Newbury Park, CA: Corwin Press.
5. ANSI Membership Diretory. ANSI Member Organizations
6. ANSI Online Library - JCSEE Directory. Publication Listing
External links
• American National Standards Institute
[1] http:/ / www. aasa. org/
[2] http:// www. counseling. org/
[3] http:// www. aera. net/
[4] http:/ / www. eval. org/
[5] http:/ / www. aihec. org/
[6] http:/ / www. apa. org/
[7] http:/ / www. evaluationcanada. ca/
[8] http:/ / www. csse. ca/
[9] http:// www. wmich. edu/ evalctr/ create/
[10] http:// www. ccsso. org/
[11] http:// www. naesp. org/
[12] http:// www. nasponline. org/
[13] http:// www. infolit.org/ members/ nassp. htm
[14] http:/ / ncme.org/
[15] http:/ / www. nea. org/index. html
[16] http:// www. ncsl. org/programs/nlpes/
[17] http:// www. nrea.net/
[18] http:/ / www. jcsee. org/
[19] http:// eseries. ansi. org/ Source/directory/
[20] http:/ / webstore. ansi. org/FindStandards. aspx?Action=displaydept& DeptID=3165&Acro=JCSEE
[21] http:/ / www. ansi. org/
Knowledge survey
Knowledge survey
Knowledge Surveys is a method of evaluating the delivery of a course through gathering feedback from the learners
on the level of the knowledge they acquired after the completion of the instruction. It usually consists of a series of
questions that cover the full content of the course. The surveys evaluate student learning and content mastery at all
levels: from basic knowledge and comprehension through higher levels of thinking. Knowledge surveys can serve as
both formative and summative assessment tools. They are effective in helping
• students learn,
• instructors improve their delivery, and
• departments explore new curricula and pedagogies.
Structure of the Survey
A standard Knowledge Surveys consists of many questions that cover the entire content of a course addressing all
levels of Bloom's scale of thinking. A typical survey may include as many as 200 questions. The key feature of
Knowledge Surveys is that students do NOT answer the questions. Instead, they say whether they COULD answer
the question and with what degree of confidence. So, students complete the surveys relatively quickly.
For easy assessment, the questions might follow the multiple choice format with choices that address the level of
knowledge and confidence in a certain topic. For example, a typical multiple choice answer could be of the
following form
(A) I know the topic quite well. (B) I know the at least 50% of the topic partially, and I know where I can find more
information about it. Within 20 minutes, I am confident I can find the complete answer. (C) I am not confident I can
answer the question.
External links
• Wiggins, G., McTighe, J., 2001, Understanding by Design: Prentice Hall, Upper Saddle River, NJ, 201 p.
• Building a knowledge survey online.
• http:/ / www. macalester. edu/ geology/ wirth/CourseMaterials. html Knowledge Surveys: The ultimate course
design and assessment tool for faculty and students.
• Knowledge Surveys: Being Clear, Organized, and Able to Prove It - ITL Newsletter - Vol. 2, No. 2
• Usefulness of the tool - Video
• Preparing the Knowledge Survey - Video
• Student Initial Reaction - Video
• Student Feel at the end - Video
• ELIXR-MERLOT Learning Object
Knowledge survey
[1] What are knowledge surveys (http:// serc. carleton. edu/ NAGTWorkshops/ assess/ knowledgesurvey/ index. html)
[2] Excerpts from a Knowledge Survey (http:// www. isu. edu/ ctl/ facultydev/KnowS_files/ KnowS.htm)
[3] http:// www. pollograph.com/ kb/ docs/ designing-a-survey/
[4] http:/ / www. calstate. edu/ ITL/newsletter/
[5] http:// profcamp.tripod.com/ stevemp4stream. html
[6] http:// profcamp.tripod.com/ steve03mp4stream. html
[7] http:// profcamp.tripod.com/ students1mp4stream. html
[8] http:// profcamp.tripod.com/ students2mp4stream. html
[9] http:// elixr.merlot. org/assessment-evaluation/ knowledge-surveys
• Feldman, K.A., 1998, Identifying exemplary teachers and teaching: Evidence from student ratings: in Teaching
and Learning in the College Classroom: in Feldman, K.A., and Paulsen, M.B., editors, 2nd edition, Simon and
Schuster, Needham Heights, MA, p. 391-414.
• Fink, L.D., 2003, Creating Significant learning Experiences: An Integrated Approach to Designing College
Courses: Jossey-Bass, 295 p.
Leadership accountability
Leadership accountability describes the personalization of protest and questioning concerning "up system"
responsibility for political violence; corruption; and environmental and other harm. There is similar "second track"
movement, challenging local power elites in public service, the workplace, and religious organizations. This is
evidenced by new institutions such as the International Criminal Court (ICC) (est. 2002); laws such as the United
Nations Convention against Corruption (2003); and individual accountability for environmental victimization, e.g.,
U.S. Environment Agency action against executives of the asbestos company Grace (2005). Global civil society,
making innovatory use of modern information technology, has been central to this social movement. Examples are
the protests at the meetings of the G8 leaders and against the American and British leaders responsible for the
invasion and occupation of Iraq.
Historical context
Traditionally, leaders and other power elites have not seen themselves accountable as individuals. They were either
above the law, as sovereign -- rex non potest peccare ("the King can do no wrong") -- or they had immunity just
because they were leaders (immunity rationae materiae). Alternatively, they were considered mere representatives of
a state or organization which, it was believed, carried the responsibility for any wrongdoings. Writing in 1915,
historian R. Michels was not optimistic about change: "Historical evolution mocks all the prophylactic measures that
have been adopted for the prevention of oligarchy. If laws are passed to control the dominion of the leaders, it is the
laws which gradually weaken, and not the leaders."
But the globalization of personal accountability is now catching up with the globalization of personal power. Names
such as Milošević, Estrada, Cheng, Pinochet, Fujimori, Berlusconi, Enron, Union Carbide, and Grace have been
brought into the accountability frame, as were Osama bin Laden, Saddam Hussein, George W. Bush, and Tony Blair.
Violence surrounding the 9/11 attacks on America represented "retributive accountability" by all parties; but this
"global feuding" does not follow the traditional retributive ethic of an "eye for an eye" and is, therefore, uniquely
Leadership accountability
Implications of the movement
It is likely that "direct democratic accountability" -- ongoing daily questioning through media, correspondence,
courts, and peer networks -- will soon parallel voting systems as a means to address the abuse of power by elites.
A Global Leadership Responsibility Index (GLRI) can assess leadership conduct by using indicators such as
ratification of international agreements, aggressive intervention in other countries, perceptions of corruption, and
ecological footprint. America comes below China, Japan, and South Korea, and the Index proposes that leadership in
smaller countries is more responsible than in large states.
Further reading
• Leadership accountability in a globalizing world, London: Palgrave Macmillan, 2006, Williams, Christopher.
• Leaders of integrity: ethics and a code for global leadership, Amman: UN University Leadership Academy,
2001, Williams, Christopher.
• The prosecution of former military leaders in newly democratic nations, London: McFarland & Co., Roehrig, T.
• Declaration of Basic Principles of Justice for Victims of Crime and Abuse of Power - Adopted by General
Assembly resolution 40/34 of 29 November 1985. See UNHCHR home page.
• The Allure of Toxic Leaders: Why We Follow Destructive Bosses and Corrupt Politicians -- and How We Can
Survive Them Oxford University Press, 2004, Blumen-Lippman, Jean.
• Understanding Ethical Failures in Leadership (Cambridge Studies in Philosophy and Public Policy), Cambridge
University Press, 2005, Price, Terry L.
Narrative evaluation
In education, narrative evaluation is a form of performance measurement and feedback which can be used as an
alternative or supplement to grading. Narrative evaluations generally consist of several paragraphs of written text
about a student's individual performance and course work. The style and form of narrative evaluations vary
significantly among the educational institutions using them, and they are sometimes combined with other
performance metrics, including letter and number grades and pass/fail designations.
Colleges and universities that use narrative evaluations
This list is incomplete.
• Alverno College
• Antioch College (Letter grades are provided to student upon request)
• Bennington College (Letter grades are available in addition to narrative evaluations upon request on a per course
• Bard College (Students are given both letter grades and written comments via "criteria sheets" given mid-term
and end-of-term)
• Brown University (Narrative course performance report optionally given in addition to letter grade)
• Burlington College (Students are provided an option for traditional transcripts.)
• College of the Atlantic (Allows you to opt out of receiving letter grades)
• The Evergreen State College (Letter/number grades are never used)
• Fairhaven College
• Goddard College (Letter/number grades are never used)
Narrative evaluation
• Hampshire College (Letter/number grades are never used for Hampshire students; students in the Five College
interchange can get letter grades when their home institution requires it)
• Johnston Center for Integrative Studies, University of Redlands
• Marlboro College
• New College of Florida (Letter/number grades are never used)
• New Saint Andrews College (Short evaluations in addition to a system of Latin letter grades)
• Northeastern University School of Law (School of Law only, letter/number grades are never used)
• Oxford University (Short evaluations in addition to letter grades)
• Prescott College (Letter grades are available in addition to narrative evaluations upon request on a per course
• Residential College, University of Michigan (Letter/number grades are assigned by request, evaluations by
• St John's College (Known as the Don Rag; letter grades are recorded and available by request)
• Sarah Lawrence College (Letter grades are provided to student upon request)
• Soka University of America (Narrative evaluations and P/NP grade for up to 5 courses)
• University of California, Santa Cruz (Narrative evaluations are given in addition to letter grades. Recently,
narrative evaluations were made optional.)
• University of Washington: Community, Environment, and Planning
(CEP major only, narrative transcripts
complement the Pass/Fail on the UW transcript)
• Yale Law School (Letter/number grades are never used)
High schools that use narrative evaluations
This list is incomplete.
• The Academy at Charlemont, Charlemont, MA (Narratives in addition to letter grade)
• Conservatory Prep Senior High
, Davie, FL (Narratives in addition to letter grade)
• Lehman Alternative Community School (Grades are never used)
• The Urban School of San Francisco (Extensive narratives; GPA is provided at end of year and trimesterly from
11th grade onward)
• Hamden Hall Country Day School (Short narratives in addition to number grade)
• The Oakwood School in Los Angeles (Narratives in addition to letter grade)
• Hopkins School (Short narratives in addition to letter/number grade)
• The Met in Rhode Island: [3] (Narratives are converted to grades for college admissions purposes)
• Wildwood Secondary School in Los Angeles: [4] (Narratives are converted to grades for college admissions
• The Madeira School (Short narratives in addition to letter grade)
• San Roque High School in Santa Barbara, Ca Narratives in addition to letter grade)
• Pacific Crest Community School in Portland, OR (Grades are never used)
• Jefferson County Open School in Lakewood, CO (Letter/number grades are never used)
• StoneSoup School, Fl (Narrative GPA constructed at end of 12th grade)
• Youth Initiative High School: [5] (Grades are never used)
• Trinity School at Greenlawn: South Bend, IN [6] (Narratives in addition to letter grade)
• Trinity School at River Ridge: Twin Cities, MN [6] (Narratives in addition to letter grade)
• Trinity School at Meadow View: Falls Church, VA [6] (Narratives in addition to letter grade)
• Sagesse High School: Ain-Saadeh, Lebanon [7] (Narratives in addition to grade scores)
• Allendale Columbia school in Rochester, NY Narratives in addition to letter grade)
Narrative evaluation
External links
• History and explanation of narrative evaluation system at Santa Cruz
[1] http:/ / www. caup. washington. edu/ cep/
[2] http:/ / www. conservatoryprep.org
[3] http:/ / www. themetschool. org/?q=home
[4] http:// wildwood. org/
[5] http:/ / www. yihs. net/
[6] http:// www. trinityschools. org
[7] http:// www. sagessehs. edu. lb
[8] http:// planning.ucsc. edu/ irps/ Stratpln/WASC94/ a/ sec2. htm
Natural experiment
A natural experiment is an observational study in which the assignment of treatments to subjects has been
haphazard: That is, the assignment of treatments has been made "by nature", but not by experimenters. Thus, a
natural experiment is not a controlled experiment. Natural experiments are most useful when there has been a clearly
defined and large change in the treatment (or exposure) to a clearly defined subpopulation, so that changes in
responses may be plausibly attributed to the change in treatments (or exposure).
Natural experiments are considered for study designs whenever controlled experimentation is difficult, such as in
many problems in epidemiology and economics.
Original map by John Snow showing the clusters of cholera cases in the London
epidemic of 1854
Snow on cholera
One of the most famous natural experiments
was the 1854 Broad Street cholera outbreak
in London, England.
On 31 August 1854, a major outbreak of
cholera struck Soho. Over the next three
days 127 people near Broad Street died. By
the end of the outbreak 616 people died. The
physician John Snow identified the source
of the outbreak as the nearest public water
pump, which he identified using a map of
deaths and illness.
In this example, Snow discovered a strong
association between the use of the water and
deaths and illnesses due to cholera. Snow
found that the water company (the
Southwark and Vauxhall Company) that
supplied water to districts with high attack
rates obtained the water from the Thames
downstream from where raw sewage was discharged into the river. By contrast, districts that were supplied water by
the Lambeth Company, which obtained water upstream from the points of sewage discharge, had low attack rates.
Given the near-haphazard patchwork development of the water supply in mid-Nineteenth Century London, Snow
Natural experiment
viewed the developments as "an experiment...on the grandest scale."
Of course, the exposure to the polluted water
was not under the control of any scientist. Therefore, this exposure has been recognized as being a natural

Smoking ban
An example of a natural experiment occurred in Helena, Montana during the six-month period from June 2002 to
December 2002 when a smoking ban was in effect in all public spaces in Helena including bars and restaurants.
Helena is geographically isolated and served by only one hospital. It was observed that the rate of heart attacks
dropped by 60% while the smoking ban was in effect. Opponents of the law prevailed in getting the enforcement of
the law suspended after six months, after which the rate of heart attacks went back up.
Note, however, that while
this may have been a good example of a natural experiment (called a case-crossover experiment, where the exposure
is removed for a time period and then returned), it is also a good example of how confounding variables can result in
faulty conclusions being made. For instance, many smoking ban-heart attack studies fail to indicate that heart attack
rates were already on the decline before the smoking ban was in place, or fail to take into account seasonal fluxes in
heart attacks (highest in the winter months and lowest in the summer). For the Helena study in particular, the claim
that 40% of pre-ban heart attacks were caused by passive smoking is not believable, considering that only 10-15% of
coronary heart disease cases are thought to be caused by active smoking.
Recent examples in economics
An example of a natural experiment was discussed in Angrist and Evans (1998).
The authors wish to estimate the
effect of family size on the labor market outcomes of the mother. The correlations between family size and various
outcomes do not tell us how family size causally affects labor market outcomes because both labor market outcomes
and family size may be affected by unobserved variables such as preferences and because labor market outcomes
may itself affect family size (called "reverse causality", for example, a woman may defer having a child if she gets a
raise at work). The study notes that two-children families with either two boys or two girls are substantially more
likely to have a third child than two-children families with one boy and one girl. The sex of the first two children,
then, forms a natural experiment: it is as if an experimenter has randomly assigned some families to have two
children and others to have three or more. The authors are then able to credibly estimate the causal effect of having a
third child on labor market outcomes.
Within economics, game shows are a frequently studied form of natural experiment. While game shows might seem
as artificial contexts, they can be considered as natural experiment due to the fact that the context arises without
interference of the scientist. Game shows have been used to study a wide range of different types of economic
behavior, such as decision making under risk
and cooperative behavior.
[1] DiNardo (2008)
DiNardo, J. (2008). "Natural experiments and quasi-natural experiments" (http:/ / www. dictionaryofeconomics.
com/ article?id=pde2008_N000142). In Durlauf, Steven N.; Blume, Lawrence E. The New Palgrave Dictionary of
Economics (Second ed.). Palgrave Macmillan. doi:10.1057/9780230226203.1162. .
[2] Snow, J. (1855). On the Mode of Communication of Cholera (2nd ed.). London: Churchill. Excerpted in MacMahon, B. & Pugh, T.F. (1970).
Epidemiology. Boston: Little Brown.
[3] The 1854 cholera outbreak is the example of a natural experiment discussed often by David A. Freedman, e.g. in Statistical Models: Theory
and Practice (Cambridge University Press) (http:// www.cambridge.org/catalogue/ catalogue.asp?isbn=9780521671057), chapter 1.3
(pages 6-9).
[4] Snow's studies of the pattern of the disease were convincing enough to persuade the local council to disable the well pump by removing its
handle. After the handle of the well-pump was replaced, the incidence of new cases dropped.
Natural experiment
In stopping the use of water from the well-pump, the authorities did an uncontrolled experiment (without a control
group) and without randomization.
[5] Sargent RP, Shepard RM, Glantz SA, Reduced incidence of admissions for myocardial infarction associated with public smoking ban: before
and after study., British Medical Journal, vol. 328 pp. 977-980, 2004 (http:/ / www. bmj. com/ cgi/ content/ full/328/ 7446/ 977)
[6] Snowdon, Chris. "The Myth of the Smoking Ban Miracle." (http:// www.spiked-online.com/ index.php/ site/ article/ 7451/ ) Spiked,
September 24, 2009.
[7] Angrist, J. & W. Evans (1998) Children and their parents' labor supply: Evidence from exogenous variation in family size," American
Economic Review, 88(3), 450-77.
[8] Post, Van den Assem, Baltussen, and Thaler (March 2008). Deal or No Deal? Decision Making Under Risk in a Large-payoff Game Show
(http:/ / ssrn. com/ abstract=636508). .
[9] van den Assem, van Dolder, and Thaler (April 2010). Split or Steal? Cooperative Behavior when the Stakes are Large (http:/ / ssrn. com/
abstract=1592456). .
• DiNardo, J. (2008). "Natural experiments and quasi-natural experiments" (http:// www.dictionaryofeconomics.
com/ article?id=pde2008_N000142). In Durlauf, Steven N.; Blume, Lawrence E. The New Palgrave Dictionary of
Economics (Second ed.). Palgrave Macmillan. doi:10.1057/9780230226203.1162.
Operations Evaluation Department
The Independent Evaluation Group (IEG) (previously known as the Operations Evaluation Department (OED)) is
an independent unit within the World Bank that reports directly to the Bank's Board of Executive Directors. IEG
assesses what works, and what does not; how a borrower plans to run and maintain a project; and the lasting
contribution of the Bank to a country's overall development. The goals of evaluation are to learn from experience, to
provide an objective basis for assessing the results of the Bank's work, and to provide accountability in the
achievement of its objectives. It also improves Bank work by identifying and disseminating the lessons learned from
experience and by framing recommendations drawn from evaluation findings.
• Patrick G. Grasso, Sulaiman S. Wasty, Rachel V. Weaving (2003). World Bank Operations Evaluation
Department - The First 30 Years. The International Bank for Reconstruction and Development / The World Bank.
ISBN 0-8213-5550-3.
• {{ cite book | title=Annual Review of Development Effectiveness | author=Operations Evaluation Department |
year=2005 | publisher=The International Bank
External links
• World Bank Independent Evaluation Group (formerly known as the Operations Evaluation Department)
• The Bretton Woods Project, monitoring the World Bank and IMF
• George Monbiot
[1] http:/ / www. worldbank.org/ ieg
[2] http:// www. brettonwoodsproject.org/
[3] http:/ / www. monbiot. com/ archives/ 2005/ 04/ 05/ im-with-wolfowitz/
Pearson Assessment & Information
Pearson Assessment & Information
The Assessment & Information group of Pearson is a division of Pearson Education, a business of Pearson PLC. The
group is a provider of assessment and education data management services.
Pearson’s U.S. Assessments & Testing Group was renamed the Assessment & Information group in 2007. As of July
2011, the Assessment & Information group was organized around eight businesses:
• National Services: Serving the U.S. federal government and national non-profit organizations involved in
educational assessment and education reform (cf. National Assessment of Educational Progress)
• State Services: Serving state education agencies in the fifty states, the District of Columbia, and Puerto Rico with
outsourced K–12 educational assessment and data management services
• Educational Assessment: Serving U.S. K–12 parents, educators, teachers and students with educational
assessment and data management services for schools and local education agencies (cf. Stanford Achievement
Test Series)
• Evaluation Systems: Serving teacher preparation and credentialing agencies with licensing examinations and
related services. Pearson entered the teacher certification licensure testing market in 2006 with the acquisition of
National Evaluation Systems.
• Clinical Assessment: Serving the global market for psychological, speech and language, and special needs
assessments, for early childhood, for Response to Intervention, for talent assessment, in educational, clinical,
government and corporate markets under the PsychCorp brand (formerly known as The Psychological
• School Systems: Student information systems and data analysis solutions that impact school and district
• Edustructures: Enterprise solutions for managing and synchronizing education data systems with solutions based
on the standard for interoperability in education, the Schools Interoperability Framework (SIF). Acquired by
Pearson in 2007.
• Knowledge Technologies: Innovative technologies for automatically evaluating oral and written languages (cf.
Latent semantic analysis). Acquired by Pearson in 2004.
School Systems
In 2006, Pearson acquired PowerSchool and Chancery SMS. Both are web-based student information systems for
K-12 schools. These two products, along with state reporting systems, legacy products like SASI, and others, make
up School Systems, a part of Pearson's Assessment & Information group.
PowerSchool, a company of 160 employees,
was acquired in March 2001 by Apple Inc. Apple then sold it to
Pearson Education in April 2006.
ChancerySMS was acquired a month later, with the deal closing in May 2006.
Pearson Assessment & Information
Harcourt Assessment acquisition
Pearson announced the acquisition of Harcourt Assessment in May 2007.
The Harcourt Assessment business was
merged into the Assessment & Information group in January 2008 after a review by the U.S. Department of Justice
was completed.
Doug Kubach Group President & CEO, Assessment & Information (Nov. 2003–present)
KJ Singh Senior Vice President & Chief Technology Officer, Software and Technology Services
Robin Brophy Vice President, Marketing Services
Mike Carlson Senior Vice President, Education Technology Services
Steve Curtis President, Data Solutions
M. (Margaret) Darlene Feldick Vice President, Human Resources
Paul D. Fletcher President, School Systems
William P. Gorth President, Evaluation Systems
Jim Hummer Senior Vice President, Organizational Quality
Darice Keating President, State Services
Kathleen A. Minette Senior Vice President, Operations and Scoring
Shilpi Niyogi Executive Vice President of Strategy and Business Development, National Services
Aurelio Prifitera President & CEO, Clinical Assessment/Worldwide
Lynn Streeter President, Knowledge Technologies
Bhadresh A. Sutaria Senior Vice President & Chief Financial Officer
Jon S. Twing Executive Vice President, Test, Measurement & Research Services
[1] "Our History" (http:/ / www. pearsonassessments. com/ pai/ ai/ about/ history/ history.htm). Assessment & Information group of Pearson. .
Retrieved July 16, 2011.
[2] "Pearson enters teacher certification market by acquiring National Evaluation Systems" (http:/ / www.pearson.com/ about-us/ education/
announcements/?i=488). Pearson. . Retrieved July 16, 2011.
[3] "Apple to Buy PowerSchool" (http:// www.wired. com/ techbiz/ media/ news/ 2001/ 03/ 42412). Wired. . Retrieved July 16, 2011.
[4] "Apple sells PowerSchool to Pearson" (http:/ / www.macworld.com/ news/ 2006/ 05/ 25/ powerschool/ index.php). Macworld. . Retrieved
July 16, 2011.
[5] "Pearson to acquire Chancery Software Ltd" (http:/ / www.pearson.com/ about-us/ education/ announcements/ ?i=473). Pearson. . Retrieved
July 16, 2011.
[6] "Pearson acquires Harcourt Assessment and Harcourt Education International from Reed Elsevier" (http:// www. pearson.com/ about-us/
education/announcements/ ?i=352). Pearson. . Retrieved July 16, 2011.
[7] "Pearson Completes Acquisition of Harcourt Assessment" (http:/ / www.pearsonassessments. com/ haiweb/ Cultures/ en-US/ Site/
Community/PostSecondary/ NewsEvents/ PressReleases/ NewsRelease013008. htm). Assessment & Information group of Pearson. .
Retrieved July 16, 2011.
[8] "Leadership" (http:// www.pearsonassessments. com/ pai/ ai/ about/ leadership.htm). Assessment & Information group of Pearson. .
Retrieved July 16, 2011.
Pearson Assessment & Information
External links
• Pearson Assessment & Information group (http:// www.pearsonassessments. com) (official site)
• Pearson School Systems (http:// www. pearsonschoolsystems. com)
• Edustructures (http:// www. edustructures.com)
• Pearson's Knowledge Technologies (http:/ / www.pearsonkt. com)
Princeton Application Repository for Shared-Memory Computers
Princeton Application Repository for
Shared-Memory Computers
PARSEC Benchmark Suite
Original author(s) Princeton University and Intel
Developer(s) Christian Bienia
Stable release 2.1 / August 13, 2009
Development status Active
Written in C/C++
Operating system Linux, OpenSolaris
Type Benchmark
License open-source
Website [1]
The Princeton Application Repository for Shared-Memory Computers (PARSEC) is a benchmark suite composed of
multithreaded emerging workloads that is used to evaluate and develop next-generation chip-multiprocessors. It was
collaboratively created by Intel and Princeton University to drive research efforts on future computer systems.

Since its inception the benchmark suite has become a community project that is continued to be improved by a broad
range of research institutions.
PARSEC is freely available and is used for both academic and non-academic


With the emergence of chip-multiprocessors computer manufacturers were faced with a problem: The new
technology caused a disruptive change.

For the first time in computer history software would have to be
rewritten in order to take advantage of the parallel nature of those processors, which means that existing programs
could not be used effectively to test and develop those new types of computer systems. At that time parallel software
only existed in very specialized areas. However, before chip-multiprocessors became commonly available software
developers were not willing to rewrite any mainstream programs, which means hardware manufacturers did not have
access to any programs for test and development purposes that represented the expected real-world program behavior
accurately. This posed a hen-and-egg problem that motivated a new type of benchmark suite with parallel programs
that could take full advantage of chip-multiprocessors.
PARSEC was created to break this circular dependency. It was designed to fulfill the following five objectives:
1. Focuses on multithreaded applications
2. Includes emerging workloads
3. Has a diverse selection of programs
4. Workloads employ state-of-art techniques
5. The suite supports research
Traditional benchmarks that were publicly available before PARSEC were generally limited in their scope of
included application domains or typically only available in an unparallelized, serial version. Parallel programs were
Princeton Application Repository for Shared-Memory Computers
only prevalent in the domain of High-Performance Computing and on a much smaller scale in business
environments. Chip-multiprocessors however were expected to be heavily used in all areas of computing such as
with parallelized consumer applications.
The PARSEC Benchmark Suite is available in version 2.1, which includes the following workloads:
• Blackscholes
• Bodytrack
• Canneal
• Dedup
• Facesim
• Ferret
• Fluidanimate
• Freqmine
• Raytrace
• Streamcluster
• Swaptions
• Vips
• X264
[1] http:/ / parsec.cs. princeton.edu/
[2] "Intel Teams with Universities on Multicore Software Suite" (http:// www.edn.com/ article/CA6364657. html). EDN. . Retrieved
[3] "Designing future computers with future workloads" (http:/ / blogs. intel.com/ research/2008/ 02/ designing_future_computers_wit. php).
Research@Intel. . Retrieved 2008-02-26.
[4] "Intel CTO looks into the future: Measuring the value and need for multi-core" (http:// www. gabeoneda.com/ node/ 39). Gabe on EDA. .
Retrieved 2006-08-31.
[5] "The PARSEC Benchmark Suite" (http:// parsec. cs. princeton.edu/ ). Princeton University. . Retrieved 2008-01-05.
[6] Bhadauria, Major; Weaver, Vincent M.; McKee, Sally A. (October 2009), "Understanding PARSEC Performance on Contemporary CMPs"
(http:// www. iiswc. org/iiswc2009/ ), Proceedings of the 2009 IEEE International Symposium on Workload Characterization, IEEE,
[7] Barrow-Williams, Nick; Fensch, Christian; Moore, Simon (October 2009), "A Communication Characterization of SPLASH-2 and PARSEC"
(http:/ / www. iiswc. org/iiswc2009/ ), Proceedings of the 2009 IEEE International Symposium on Workload Characterization, IEEE,
[8] Rabaey, Jan M.; Burke, Daniel; Lutz, Ken; Wawrzynek, John (July / August 2008), "Workloads of the Future" (http:/ / www2. computer.org/
cms/ Computer.org/ComputingNow/ homepage/ 0908/ WorkloadsoftheFuture.pdf), IEEE Design & Test of Computers, IEEE,
[9] Bienia, Christian; Kumar, Sanjeev; Singh, Jaswinder Pal; Li, Kai (October 2008), "The PARSEC Benchmark Suite: Characterization and
Architectural Implications" (http:/ / portal.acm. org/citation. cfm?id=1454128), Proceedings of the 17th international conference on Parallel
architectures and compilation techniques, Association for Computing Machinery, New York, NY, USA,
[10] Bienia, Christian; Li, Kai (June 2009), "PARSEC 2.0: A New Benchmark Suite for Chip-Multiprocessors" (http:// www-mount.ece. umn.
edu/ ~jjyi/ MoBS/ 2009/ MoBS_2009_Advance_Program. html), Proceedings of the 5th Annual Workshop on Modeling, Benchmarking and
Simulation, Association for Computing Machinery, New York, NY, USA,
External Links
• The PARSEC Benchmark Suite (http:// parsec.cs. princeton.edu/ )
• The PARSEC Wiki (http:/ / wiki. cs. princeton.edu/ index. php/ PARSEC)
A problem is an obstacle, impediment, difficulty or challenge, or any situation that invites resolution; the resolution
of which is recognized as a solution or contribution toward a known purpose or goal. A problem implies a desired
outcome coupled with an apparent deficiency, doubt or inconsistency that prevents the outcome from taking place.
Problem solving
Every theoretical problem asks for an answer or solution. Trying to find a solution to a problem is known as problem
solving. There are many standard techniques for problem solving, such as Proof by Contradiction, or Proof by
Exhaustion, the latter famously being used in the solution to the Thirty-Six Officers Problem posed by Leonhard
Euler. A problem is a gap between an actual and desired situation. The time it takes to solve a problem is a way of
measuring complexity.
Many problems have no discovered solution and are therefore classified as an open
From the mid 20th century, the field of theoretical computer science has explored the use of computers to solve
• Mathematical problem is a question about mathematical objects and structures that may require a distinct answer
or explanation or proof. Examples include word problems at school level or deeper problems such as shading a
map with only four colours.
• In society, a problem can refer to particular social issues, which if solved would yield social benefits, such as
increased harmony or productivity, and conversely diminished hostility and disruption.
• In business and engineering, a problem is a difference between actual conditions and those that are required or
desired. Often, the causes of a problem are not known, in which case root cause analysis is employed to find the
causes and identify corrective actions.
• In chess, a problem is a puzzle set by somebody using chess pieces on a chess board, for others to get instruction
or intellectual satisfaction from determining the solution.
• In theology, there is what is referred to as the Synoptic Problem, regarding the Gospels' relationship to each other.
• In academic discourse a problem is a challenge to an assumption, an apparent conflict that requires synthesis and
reconciliation. It is a normal part of systematic thinking, the address of which adds to or detracts from the veracity
of a conclusion or idea.
• An optimization problem is finding the best solution from all feasible solutions. A good example of this type of
problem is the travelling salesperson problem which is based on calculating the most efficient route between
many places
• In computability theory a decision problem requires a simple yes-or-no answer.
• In rock climbing a problem is a series of rocks that forces the climber to climb.
• In reading, a problem is a combination of a series of words with the overall plotline, which the reader must
attempt to decipher.
• In walking, a mobility problem is presented. Motion is achieved via mechanical interaction of the legs and a
[1] The Puzzle Master. Alexandria, Virginia, USA: Time-Life Books. 1989. pp. 32. ISBN 0809709287.
Program evaluation
Project evaluation is a systematic method for collecting, analyzing, and using information to answer questions
about projects, policies and programs,
particularly about their effectiveness and efficiency. In both the public and
private sectors, stakeholders will want to know if the programs they are funding, implementing, voting for, receiving
or objecting to are actually having the intended effect, and answering this question is the job of an evaluator.
The process of evaluation is considered to be a relatively recent phenomenon. However, planned social evaluation
has been documented as dating as far back as 2200BC (Shadish, Cook & Lentish, 1991). Evaluation became
particularly relevant in the U.S. in the 1960s during the period of the Great Society social programs associated with
the Kennedy and Johnson administrations.

Extraordinary sums were invested in social programs, but the
impacts of these investments were largely unknown.
Program evaluations can involve both quantitative and qualitative methods of social research. People who do
program evaluation come from many different backgrounds, such as sociology, psychology, economics, and social
work. Some graduate schools also have specific training programs for program evaluation.
Doing an evaluation
Program evaluation may be conducted at several stages during a program's lifetime. Each of these stages raise
different questions to be answered by the evaluator, and correspondingly different evaluation approaches are needed.
Rossi, Lipsey and Freeman (2004) suggest the following kinds of assessment, which may be appropriate at different
• Assessment of the program's cost and efficiency
• Assessment of the program's outcome or impact (i.e., what it has actually achieved)
• Assessment of how the program is being implemented (i.e., is it being implemented according to plan?)
• Assessment of program design and logic/theory
• Assessment of the need for the program
Assessing needs
A needs assessment examines the population that the program intends to target, to see whether the need as
conceptualised in the program actually exists in the population; whether it is, in fact, a problem; and if so, how it
might best be dealt with. This includes identifying and diagnosing the actual problem the program is trying to
address, who or what is affected by the problem, how widespread the problem is, and what are the measurable effects
that are caused by the problem. For example, for a housing program aimed at mitigating homelessness, a program
evaluator may want to find out how many people are homeless in a given geographic area and what their
demographics are. Rossi, Lipsey and Freeman (2004) caution against doing an intervention without properly
assessing the need for one, because this might result in a great deal of wasted funds if the need did not exist or was
Program evaluation
Assessing program theory
The program theory, also called a logic model or impact pathway,
is an assumption, implicit in the way the
program is designed, about how the program's actions are supposed to achieve the outcomes it intends. This 'logic
model' is often not stated expicitly by people who run programs, it is simply assumed, and so an evaluator will need
to draw out from the program staff how exactly the program is supposed to achieve its aims and assess whether this
logic is plausible. For example, in an HIV prevention program, it may be assumed that educating people about
HIV/AIDS transmission, risk and safe sex practices will result in safer sex being practiced. However, research in
South Africa increasingly shows that in spite of increased education and knowledge, people still often do not practice
safe sex.
Therefore, the logic of a program which relies on education as a means to get people to use condoms may
be faulty. This is why it is important to read research that has been done in the area. Explicating this logic can also
reveal unintended or unforeseen consequences of a program, both positive and negative. The program theory drives
the hypotheses to test for impact evaluation. Developing a logic model can also build common understanding
amongst program staff and stakeholders about what the program is actually supposed to do and how it is supposed to
do it, which is often lacking(see Participatory Impact Pathways Analysis).
Assessing implementation
Process analysis looks beyond the theory of what the program is supposed to do and instead evaluates how the
program is being implemented. This evaluation determines whether the components identified as critical to the
success of the program are being implemented. The evaluation determines whether target populations are being
reached, people are receiving the intended services, staff are adequately qualified, etc. Process evaluation is an
ongoing process in which repeated measures may be used to evaluate whether the program is being implemented
Assessing the impact (effectiveness)
The impact evaluation determines the causal effects of the program. This involves trying to measure if the program
has achieved its intended outcomes. This can involve using sophisticated statistical techniques in order to measure
the effect of the program and to find causal relationship between the program and the various outcomes. More
information about impact evaluation is found under the heading 'Determining Causation'.
Assessing efficiency
Finally, cost-benefit or cost-effectiveness analysis assesses the efficiency of a program. Evaluators outline the
benefits and cost of the program for comparison. An efficient program has a lower cost-benefit ratio.
Determining causation
Perhaps the most difficult part of evaluation is determining whether the program itself is causing the changes that are
observed in the population it was aimed at. Events or processes outside of the program may be the real cause of the
observed outcome (or the real prevention of the anticipated outcome).
Causation is difficult to determine. One main reason for this is self selection bias.
People select themselves to
participate in a program. For example, in a job training program, some people decide to participate and others do not.
Those who do participate may differ from those who do not in important ways. They may be more determined to
find a job or have better support resources. These characteristics may actually be causing the observed outcome of
increased employment, not the job training program.
If programs could use random assignment, then they could find a strong correlation or association. Causation is not
something that can be proved through correlation. A program could randomly assign people to participate or to not
participate in the program, eliminating self-selection bias. Thus, the group of people who participate would be the
Program evaluation
same as the group who did not participate.
However, since most programs cannot use random assignment, causation cannot be determined. Impact analysis can
still provide useful information. For example, the outcomes of the program can be described. Thus the evaluation can
describe that people who participated in the program were more likely to experience a given outcome than people
who did not participate.
If the program is fairly large, and there are enough data, statistical analysis can be used to make a reasonable case for
the program by showing, for example, that other causes are unlikely.
Reliability, Validity and Sensitivity in Program Evaluation
It is important to ensure that the instruments (for example, tests, questionnaires, etc) used in program evaluation are
as reliable, valid and sensitive as possible. According to Rossi et al. (2004, p. 222),
'a measure that is poorly
chosen or poorly conceived can completely undermine the worth of an impact assessment by producing misleading
estimates. Only if outcome measures are valid, reliable and appropriately sensitive can impact assessments be
regarded as credible'.
The reliability of a measurement instrument is the 'extent to which the measure produces the same results when used
repeatedly to measure the same thing' (Rossi et al., 2004, p. 218).
The more reliable a measure is, the greater its
statistical power and the more credible its findings. If a measuring instrument is unreliable, it may dilute and obscure
the real effects of a program, and the program will 'appear to be less effective than it actually is' (Rossi et al., 2004,
p. 219).
Hence, it is important to ensure the evaluation is as reliable as possible.
The validity of a measurement instrument is 'the extent to which it measures what it is intended to measure' (Rossi et
al., 2004, p. 219).
This concept can be difficult to accurately measure: in general use in evaluations, an instrument
may be deemed valid if accepted as valid by the stakeholders (stakeholders may include, for example, funders,
program administrators, et cetera).
The principal purpose of the evaluation process is to measure whether the program has an effect on the social
problem it seeks to redress; hence, the measurement instrument must be sensitive enough to discern these potential
changes (Rossi et al., 2004).
A measurement instrument may be insensitive if it contains items measuring
outcomes which the program couldn't possibly effect, or if the instrument was originally developed for applications
to individuals (for example standardised psychological measures) rather than to a group setting (Rossi et al., 2004).
These factors may result in 'noise' which may obscure any effect the program may have had.
To conclude, only measures which adequately achieve the benchmarks of reliability, validity and sensitivity can be
said to be credible evaluations. It is the duty of evaluators to produce credible evaluations, as their findings may have
far reaching effects. A discreditable evaluation which is unable to show that a program is achieving its purpose when
it is in fact creating positive change, may cause the program to lose its funding undeservedly.
Program evaluation
The Shoestring Approach
The “Shoestring evaluation approach” is designed to assist evaluators operating under limited budget , limited access
or availability of data and limited turnaround time, to conduct effective evaluations that are methodologically
rigorous(Bamberger, Rugh, Church & Fort, 2004).
This approach has responded to the continued greater need for
evaluation processes that are more rapid and economical under difficult circumstances of budget, time constraints
and limited availability of data. However, it is not always possible to design an evaluation to achieve the highest
standards available. Many programs do not build an evaluation procedure into their design or budget. Hence, many
evaluation processes do not begin until the program is already underway, which can result in time, budget or data
constraints for the evaluators, which in turn can affect the reliability, validity or sensitivity of the evaluation. > The
shoestring approach helps to ensure that the maximum possible methodological rigour is achieved under these
Budget Constraints
Frequently, programs are faced with budget constraints because most original projects do not include a budget to
conduct an evaluation (Bamberger et al., 2004). Therefore, this automatically results in evaluations being allocated
smaller budgets that are inadequate for a rigorous evaluation. Due to the budget constraints it might be difficult to
effectively apply the most appropriate methodological instruments. These constraints may consequently affect the
time available in which to do the evaluation (Bamberger et al., 2004).
Budget constraints may be addressed by
simplifying the evaluation design, revising the sample size, exploring economical data collection methods (such as
using volunteers to collect data, shortening surveys, or using focus groups and key informants) or looking for reliable
secondary data (Bamberger et al., 2004).
Time Constraints
The most time constraint that can be faced by an evaluator is when the evaluator is summoned to conduct an
evaluation when a project is already underway if they are given limited time to do the evaluation compared to the life
of the study, or if they are not given enough time for adequate planning. Time constraints are particularly
problematic when the evaluator is not familiar with the area or country in which the program is situated (Bamberger
et al., 2004).
Time constraints can be addressed by the methods listed under budget constraints as above, and also
by careful planning to ensure effective data collection and analysis within the limited time space.
Data Constraints
If the evaluation is initiated late in the program, there may be no baseline data on the conditions of the target group
before the intervention began (Bamberger et al., 2004).
Another possible cause of data constraints is if the data
have been collected by program staff and contain systematic reporting biases or poor record keeping standards and is
subsequently of little use (Bamberger et al., 2004).
Another source of data constraints may result if the target
group are difficult to reach to collect data from - for example homeless people, drug addicts, migrant workers, et
cetera (Bamberger et al., 2004).
Data constraints can be addressed by reconstructing baseline data from secondary
data or through the use of multiple methods. Multiple methods, such as the combination of qualitative and
quantitative data can increase validity through triangulation and save time and money. Additionally, these constraints
may be dealt with through careful planning and consultation with program stakeholders. By clearly identifying
and understanding client needs ahead of the evaluation, costs and time of the evaluative process can be streamlined
and reduced, while still maintaining credibility.
All in all, time, monetary and data constraints can have negative implications on the validity, reliability and
transferability of the evaluation. The shoestring approach has been created to assist evaluators to correct the
limitations identified above by identifying ways to reduce costs and time, reconstruct baseline data and to ensure
maximum quality under existing constraints (Bamberger et al., 2004).
Program evaluation
Methodological challenges presented by language and culture
The purpose of this section is to draw attention to some of the methodological challenges and dilemmas evaluators
are potentially faced with when conducting a program evaluation in a developing country. In many developing
countries the major sponsors of evaluation are donor agencies from the developed world, and these agencies require
regular evaluation reports in order to maintain accountability and control of resources, as well as generate evidence
for the program’s success or failure (Bamberger, 2000).
However, there are many hurdles and challenges which
evaluators face when attempting to implement an evaluation program which attempts to make use of techniques and
systems which are not developed within the context to which they are applied (Smith, 1990).
Some of the issues
include differences in culture, attitudes, language and political process (Ebbutt, 1998, Smith, 1990).
Culture is defined by Ebbutt (1998, p. 416) as a “constellation of both written and unwritten expectations, values,
norms, rules, laws, artifacts, rituals and behaviours that permeate a society and influence how people behave
socially”. Culture can influence many facets of the evaluation process, including data collection, evaluation program
implementation and the analysis and understanding of the results of the evaluation (Ebbutt, 1998). In particular,
instruments which are traditionally used to collect data such as questionnaires and semi-structured interviews need to
be sensitive to differences in culture, if they were originally developed in a different cultural context (Bulmer &
Warwick, 1993).
The understanding and meaning of constructs which the evaluator is attempting to measure may
not be shared between the evaluator and the sample population and thus the transference of concepts is an important
notion, as this will influence the quality of the data collection carried out by evaluators as well as the analysis and
results generated by the data (ibid).
Language also plays an important part in the evaluation process, as language is tied closely to culture (ibid).
Language can be a major barrier to communicating concepts which the evaluator is trying to access, and translation
is often required (Ebbutt, 1998). There are a multitude of problems with translation, including the loss of meaning as
well as the exaggeration or enhancement of meaning by translators (ibid). For example, terms which are contextually
specific may not translate into another language with the same weight or meaning. In particular, data collection
instruments need to take meaning into account as the subject matter may not be considered sensitive in a particular
context might prove to be sensitive in the context in which the evaluation is taking place (Bulmer & Warwick, 1993).
Thus, evaluators need to take into account two important concepts when administering data collection tools: lexical
equivalence and conceptual equivalence (ibid). Lexical equivalence asks the question: how does one phrase a
question in two languages using the same words? This is a difficult task to accomplish, and uses of techniques such
as back-translation may aid the evaluator but may not result in perfect transference of meaning (ibid). This leads to
the next point, conceptual equivalence. It is not a common occurrence for concepts to transfer unambiguously from
one culture to another (ibid). Data collection instruments which have not undergone adequate testing and piloting
may therefore render results which are not useful as the concepts which are measured by the instrument may have
taken on a different meaning and thus rendered the instrument unreliable and invalid (ibid).
Thus, it can be seen that evaluators need to take into account the methodological challenges created by differences in
culture and language when attempting to conduct a program evaluation in a developing country.
Utilization of Evaluation Results
There are three conventional uses of evaluation results: persuasive utilization, direct (instrumental) utilization,
and conceptual utilization. Persuasive utilization is the enlistment of evaluation results in an effort to persuade an
audience to either support an agenda or to oppose it. Unless the 'persuader' is the same person that ran the evaluation,
this form of utilization is not of much interest to evaluators as they often cannot foresee possible future efforts of
Program evaluation
Direct (instrumental) Utilization
Evaluators often tailor their evaluations to produce results that can have a direct influence in the improvement of the
structure, or on the process, of a program. For example, the evaluation of a novel educational intervention may
produce results that indicate no improvement in students' marks. This may be due to the intervention not having a
sound theoretical background, or it may be that the intervention is not run according to the way it was created to run.
The results of the evaluation would hopefully lead to the creators of the intervention going back to the drawing board
and re-creating the core structure of the intervention, or even changing the implementation processes.
Conceptual Utilization
But even if evaluation results do not have a direct influence in the re-shaping of a program, they may still be used to
conscientize people with regards to the issues that form part of the concerns of the program. Going back to the
example of an evaluation of a novel educational intervention, the results can also be used to inform educators and
students about the different barriers that may influence students' learning difficulties. A number of studies on these
barriers may then be initiated by this new information.
Variables Affecting Utilization
There are five conditions that seem to affect the utility of evaluation results, namely relevance, communication
between the evaluators and the users of the results, information processing by the users, the plausibility of the
results, as well as the level of involvement or advocacy of the users.
Guidelines for Maximizing Utilization
Quoted directly from Rossi et al. (2004, p. 416).
• Evaluators must understand the cognitive styles of decisionmakers
• Evaluation results must be timely and available when needed
• Evaluations must respect stakeholders' program commitments
• Utilization and dissemination plans should be part of the evaluation design
• Evaluations should include an assessment of utilization
Internal Versus External program evaluators
The choice of the evaluator chosen to evaluate the program may be regarded as equally important as the process of
the evaluation. Evaluators may be internal (persons associated with the program to be executed) or external (Persons
not associated with any part of the execution/implementation of the program). (Division for oversight services,2004).
The following provides a brief summary of the advantages and disadvantages of internal and external evaluators
adapted from the Division of oversight services (2004), for a more comprehensive list of advantages and
disadvantages of internal and external evaluators, see (Division of oversight services, 2004).
Program evaluation
Internal evaluators
• May have better overall knowledge of the program and possess informal knowledge of the program
• Less threatening as already familiar with staff
• Less costly
• May be less objective
• May be more preocuppied with other activities of the program and not give the evaluation complete attention
• May not be adequately trained as an evaluator.
External evaluators
• More objective of the process, offers new perspectives, different angles to observe and critique the process
• May be able to dedicate greater amount of time and attention to the evaluation
• May have greater expertise and evaluation brain
• May be more costly and require more time for the contract, monitoring, negotiations etc.
• May be unfamiliar with program staff and create anxiety about being evaluated
• May be unfamiliar with organization policies, certain constraints affecting the program.
Paradigms in program evaluation
Potter (2006)
identifies and describes three broad paradigms within program evaluation . The first, and probably
most common, is the positivist approach, in which evaluation can only occur where there are “objective”, observable
and measurable aspects of a program, requiring predominantly quantitative evidence. The positivist approach
includes evaluation dimensions such as needs assessment, assessment of program theory, assessment of program
process, impact assessment and efficiency assessment (Rossi, Lipsey and Freeman, 2004).
The second paradigm identified by Potter (2006) is that of interpretive approaches, where it is argued that it is
essential that the evaluator develops an understanding of the perspective, experiences and expectations of all
stakeholders. This would lead to a better understanding of the various meanings and needs held by stakeholders,
which is crucial before one is able to make judgments about the merit or value of a program. The evaluator’s contact
with the program is often over an extended period of time and, although there is no standardized method,
observation, interviews and focus groups are commonly used.
Potter (2006) also identifies critical-emancipatory approaches to program evaluation, which are largely based on
action research for the purposes of social transformation. This type of approach is much more ideological and often
includes a greater degree of social activism on the part of the evaluator. Because of its critical focus on societal
power structures and its emphasis on participation and empowerment, Potter argues this type of evaluation can be
particularly useful in developing countries.
Despite the paradigm which is used in any program evaluation, whether it be positivist, interpretive or
critical-emancipatory, it is essential to acknowledge that evaluation takes place in specific socio-political contexts.
Evaluation does not exist in a vacuum and all evaluations, whether they are aware of it or not, are influenced by
socio-political factors. It is important to recognize the evaluations and the findings which result from this kind of
evaluation process can be used in favour or against particular ideological, social and political agendas (Weiss,
This is especially true in an age when resources are limited and there is competition between organizations
for certain projects to be prioritised over others (Louw, 1999).
Program evaluation
CDC framework
In 1999, the Centers for Disease Control and Prevention (CDC) published a six-step framework for conducting
evaluation of public health programs. The publication of the framework is a result of the increased emphasis on
program evaluation of government programs in the US. The six steps are:
1. Engage stakeholders
2. Describe the program.
3. Focus the evaluation.
4. Gather credible evidence.
5. Justify conclusions.
6. Ensure use and share lessons learned.
Further Reading
• Suchman, Edward A. Evaluative Research: Principles and Practice in Public Service & Social Action Programs
• Rivlin, Alice M. Systematic Thinking for Social Action (1971)
• Weiss, Carol H. Evaluative Research: Methods of Assessing Program Effectiveness (1972)
• Cook, Thomas D. and Campbell, Donald T. Quasi-Experimentation: Design & Analysis for Field Settings (1979)
• Boulmetis, John and Dutwin, Phyllis. The ABCs of Evaluation (2005)
[1] Administration for Children and Families (2010) The Program Manager's Guide to Evaluation (http:/ / www.acf.hhs. gov/ programs/opre/
other_resrch/ pm_guide_eval/ index. html). Chapter 2: What is program evaluation?.
[2] US Department of Labor, History of the DOL (no date). The 21st century has marked a period of technological advancement across many
spheres such as health, education etc. However, there are still a large number of people whose living conditions have not changed (Lusthaus,
Adrien & Perstinger, 1999). Many government aided and privately funded programs are being executed on different scales, some with little or
no impact on the phenomenon, other programs with greater influence affecting broader variables. Evaluation provides the basis for monitoring
and reviewing programs and projects, predicting possible consequences of the program, identification of the different levels that the level may
impact and documenting strengths and weaknesses of the program to improve future programs (Rossi, Lipsey & Freeman, 2004). Chapter 6:
Eras of the New Frontier and the Great Society, 1961-1969. http:/ / www. dol. gov/ oasam/ programs/ history/ dolchp06.htm''.
[3] National Archives, Records of the Office of Management and Budget (1995) 51.8.8 Records of the Office of Program Evaluation. http://
www.archives.gov/ research/ guide-fed-records/groups/ 051. html''.
[4] Centers for Disease Control and Prevention. Framework for Program Evaluation in Public Health. MMWR 1999;48(No. RR-11).
[5] Van der Riet, M. (2009). 'The production of context: using activity theory to understand behaviour change in response to HIV and AIDS.'
Unpublished doctoral dissertation. University of KwaZulu-Natal, Pietermaritzburg.
[6] Delbert Charles Miller, Neil J. Salkind (2002) Handbook of Research Design & Social Measurement. Edition: 6, revised. Published by
[7] Rossi, P. Lipsey, M. W., & Freeman, H.E.(2004). Evaluation: A systematic approach (7th ed.). Thousand Oaks, CA: Sage.
[8] Bamberger, M., Rugh, J., Church, M., & Fort, L. (2004). Shoestring evaluation: Designing impact evaluations under budget, time and data
constraints. American Journal of Evaluation, 25,5 - 37.
[9] Bamberger, M. (2000). The Evaluation of International Development Programs: A View from the Front. American Journal of Evaluation, 21,
pp. 95-102.
[10] Smith, T. (1990) Policy evaluation in third world countries: some issues and problems. The Asian Journal of Public Administration, 12, pp.
[11] Ebbutt, D. (1998). Evaluation of projects in the developing world: some cultural and methodological issues. International Journal of
Educational Development, 18, pp. 415-424.
[12] Bulmer, M. and Warwick, D. (1993). Social research in developing countries: surveys and censuses in the Third World. London: Routledge.
[13] Potter, C. (2006). Program Evaluation. In M. Terre Blanche, K. Durrheim & D. Painter (Eds.), Research in practice: Applied methods for the
social sciences (2nd ed.) (pp. 410-428). Cape Town: UCT Press.
[14] Rossi, P., Lipsey, M.W., & Freeman, H.E. (2004). Evaluation: a systematic approach (7th ed.). Thousand Oaks: Sage.
[15] Weiss, C.H. (1999). Research-policy linkages: How much influence does social science research have? World Social Science Report, pp.
Program evaluation
[16] Louw, J. (1999). Improving practice through evaluation. In D. Donald, A. Dawes & J. Louw (Eds.), Addressing childhood adversity (pp.
60-73). Cape Town: David Philip.
External links
• Administration for Children and Families (http:// www.acf. hhs. gov/ programs/opre/other_resrch/
pm_guide_eval/ reports/pmguide/ pmguide_toc. html) The Program Manager's Guide to Evaluation. Discussion
of evaluation, includes chapters on Why evaluate, What is evaluation.
• American Evaluation Association (http:/ / www. eval. org/ ) Includes a link to Evaluation resources (http:// www.
eval. org/resources. asp) such as organizations, links sites, email discussion lists, consultants and more
• Canadian Evaluation Society (http:/ / www. evaluationcanada. ca/ ) Includes a link to Evaluation information
(http:/ / evaluationcanada. ca/ site. cgi?section=1& ssection=1& _lang=an) such as services, professional
development, resources, organizations, regional chapters
• CDC six-step framework (http:// www. cdc. gov/ mmwr/ preview/mmwrhtml/rr4811a1.htm). Also available
here Centers for Disease Control and Prevention. Framework for Program Evaluation in Public Health. MMWR
1999;48(No. RR-11) (http:/ / www. cdc. gov/ eval/ framework.htm). Includes a description of logic models in the
Steps section.
• Handbook of Research Design & Social Measurement (http:/ / books. google.com/ books?id=sgoHv5ZP6dcC&
pg=PA83&lpg=PA83&dq=self+selection+ + experimental+design& source=web&ots=VlCu1pcpbr&
sig=VjRJgw5ASJUmRpAVi-sa_t2P5_w& hl=en& sa=X& oi=book_result& resnum=4&ct=result#PPA82,M1).
Delbert Charles Miller, Neil J. Salkind (2002) Edition: 6, revised. Published by SAGE.
• The EvaluationWiki (http:// www. evaluationwiki. org) - The mission of EvaluationWiki is to make freely
available a compendium of up-to-date information and resources to everyone involved in the science and practice
of evaluation. The EvaluationWiki is presented by the non-profit Evaluation Resource Institute (http:// www.
evaluationwiki. org/wiki/ index.php/ Evaluation_Resource_Institute).
• Free Resources for Program Evaluation and Social Research Methods (http:/ / gsociology. icaap. org/methods/ )
This is a gateway to resources on program evaluation, how to, online guides, manuals, books on methods of
evaluation and free software related to evaluation.
• Innovation Network (http:// www. innonet. org) A nonprofit organization working to share planning and
evaluation tools and know-how. The organization provides online tools, consulting, and training, for nonprofits
and funders.
• Links to Assessment and Evaluation Resources (http:/ / www.education. purdue.edu/ AssessmentCouncil/ Links/
Index.htm) List of links to resources on several topics, including: centers, community building, education and
training in evaluation; Foundations; Indiana government & organizations; Links collected by...; Logic models;
performance assessment & electronic portfolios, political & private groups or companies, professional assns, orgs
& pubs, Purdue University, United States Government, web searches for publications by author & topic, and
Vivisimo topical meta searches.
• Maine Legislature's Office of Program Evaluation & Government Accountability (http:// www.maine. gov/ legis/
opega/ ) An excellent example of a governmental Program Evaluation office with links to several detailed reports
which include methodology, evaluations results, recommendations and action plans.
• National Legislative Program Evaluation Society (http:// www.ncsl. org/nlpes/ ) Includes links to state offices of
program evaluation and/or performance auditing in the US.
Program Evaluation and Review Technique
Program Evaluation and Review Technique
PERT network chart for a seven-month project with five milestones (10 through 50)
and six activities (A through F).
The Program (or Project) Evaluation
and Review Technique, commonly
abbreviated PERT, is a model for project
management designed to analyze and
represent the tasks involved in completing
a given project. It is commonly used in
conjunction with the critical path method
or CPM.
PERT is a method to analyze the involved
tasks in completing a given project,
especially the time needed to complete
each task, and identifying the minimum
time needed to complete the total project.
PERT was developed primarily to simplify the planning and scheduling of large and complex projects. It was
developed for the U.S. Navy Special Projects Office in 1957 to support the U.S. Navy's Polaris nuclear submarine
It was able to incorporate uncertainty by making it possible to schedule a project while not knowing
precisely the details and durations of all the activities. It is more of an event-oriented technique rather than start- and
completion-oriented, and is used more in projects where time, rather than cost, is the major factor. It is applied to
very large-scale, one-time, complex, non-routine infrastructure and Research and Development projects. An example
of this was for the 1968 Winter Olympics in Grenoble which applied PERT from 1965 until the opening of the 1968
This project model was the first of its kind, a revival for scientific management, founded by Frederick Taylor
(Taylorism) and later refined by Henry Ford (Fordism). DuPont corporation's critical path method was invented at
roughly the same time as PERT.
• A PERT chart is a tool that facilitates decision making. The first draft of a PERT chart will number its events
sequentially in 10s (10, 20, 30, etc.) to allow the later insertion of additional events.
• Two consecutive events in a PERT chart are linked by activities, which are conventionally represented as arrows
(see the diagram above).
• The events are presented in a logical sequence and no activity can commence until its immediately preceding
event is completed.
• The planner decides which milestones should be PERT events and also decides their “proper” sequence.
• A PERT chart may have multiple pages with many sub-tasks.
PERT is valuable to manage where multiple tasks are occurring simultaneously to reduce redundancy
Program Evaluation and Review Technique
• PERT event: a point that marks the start or completion of one or more activities. It consumes no time and uses no
resources. When it marks the completion of one or more tasks, it is not “reached” (does not occur) until all of the
activities leading to that event have been completed.
• predecessor event: an event that immediately precedes some other event without any other events intervening. An
event can have multiple predecessor events and can be the predecessor of multiple events.
• successor event: an event that immediately follows some other event without any other intervening events. An
event can have multiple successor events and can be the successor of multiple events.
• PERT activity: the actual performance of a task which consumes time and requires resources (such as labor,
materials, space, machinery). It can be understood as representing the time, effort, and resources required to move
from one event to another. A PERT activity cannot be performed until the predecessor event has occurred.
• Optimistic time (O): the minimum possible time required to accomplish a task, assuming everything proceeds
better than is normally expected
• Pessimistic time (P): the maximum possible time required to accomplish a task, assuming everything goes wrong
(but excluding major catastrophes).
• Most likely time (M): the best estimate of the time required to accomplish a task, assuming everything proceeds as
• Expected time (T
): the best estimate of the time required to accomplish a task, accounting for the fact that things
don't always proceed as normal (the implication being that the expected time is the average time the task would
require if the task were repeated on a number of occasions over an extended period of time).
= (O + 4M + P) ÷ 6
• Float or Slack is the amount of time that a task in a project network can be delayed without causing a delay -
Subsequent tasks – (free float) or Project Completion – (total float)
• Critical Path: the longest possible continuous pathway taken from the initial event to the terminal event. It
determines the total calendar time required for the project; and, therefore, any time delays along the critical path
will delay the reaching of the terminal event by at least the same amount.
• Critical Activity: An activity that has total float equal to zero. Activity with zero float does not mean it is on the
critical path.
• Lead
time: the time by which a predecessor event must be completed in order to allow sufficient time for the
activities that must elapse before a specific PERT event reaches completion.
• Lag time: the earliest time by which a successor event can follow a specific PERT event.
• Slack: the slack of an event is a measure of the excess time and resources available in achieving this event.
Positive slack would indicate ahead of schedule; negative slack would indicate behind schedule; and zero slack
would indicate on schedule.
• Fast tracking: performing more critical activities in parallel
• Crashing critical path: Shortening duration of critical activities
The first step to scheduling the project is to determine the tasks that the project requires and the order in which they
must be completed. The order may be easy to record for some tasks (e.g. When building a house, the land must be
graded before the foundation can be laid) while difficult for others (There are two areas that need to be graded, but
there are only enough bulldozers to do one). Additionally, the time estimates usually reflect the normal, non-rushed
time. Many times, the time required to execute the task can be reduced for an additional cost or a reduction in the
In the following example there are seven tasks, labeled A through G. Some tasks can be done concurrently (A and B)
while others cannot be done until their predecessor task is complete (C cannot begin until A is complete).
Program Evaluation and Review Technique
Additionally, each task has three time estimates: the optimistic time estimate (O), the most likely or normal time
estimate (M), and the pessimistic time estimate (P). The expected time (T
) is computed using the formula (O + 4M
+ P) ÷ 6.
Activity Predecessor Time estimates Expected time
Opt. (O) Normal (M) Pess. (P)
A — 2 4 6 4.00
B — 3 5 9 5.33
C A 4 5 7 5.17
D A 4 6 10 6.33
E B, C 4 5 7 5.17
F D 3 4 8 4.50
G E 3 5 8 5.17
Once this step is complete, one can draw a Gantt chart or a network diagram.
A Gantt chart created using Microsoft Project
(MSP). Note (1) the critical path is in red, (2) the
slack is the black lines connected to non-critical
activities, (3) since Saturday and Sunday are not
work days and are thus excluded from the
schedule, some bars on the Gantt chart are longer
if they cut through a weekend.
A Gantt chart created using OmniPlan. Note (1)
the critical path is highlighted, (2) the slack is not
specifically indicated on task 5 (d), though it can
be observed on tasks 3 and 7 (b and f), (3) since
weekends are indicated by a thin vertical line, and
take up no additional space on the work calendar,
bars on the Gantt chart are not longer or shorter
when they do or don't carry over a weekend.
A network diagram can be created by hand or by using diagram software. There are two types of network diagrams,
activity on arrow (AOA) and activity on node (AON). Activity on node diagrams are generally easier to create and
interpret. To create an AON diagram, it is recommended (but not required) to start with a node named start. This
"activity" has a duration of zero (0). Then you draw each activity that does not have a predecessor activity (a and b
in this example) and connect them with an arrow from start to each node. Next, since both c and d list a as a
predecessor activity, their nodes are drawn with arrows coming from a. Activity e is listed with b and c as
predecessor activities, so node e is drawn with arrows coming from both b and c, signifying that e cannot begin until
Program Evaluation and Review Technique
both b and c have been completed. Activity f has d as a predecessor activity, so an arrow is drawn connecting the
activities. Likewise, an arrow is drawn from e to g. Since there are no activities that come after f or g, it is
recommended (but again not required) to connect them to a node labeled finish.
A network diagram created using Microsoft
Project (MSP). Note the critical path is in red.
A node like this one (from Microsoft Visio) can
be used to display the activity name, duration,
ES, EF, LS, LF, and slack.
By itself, the network diagram pictured above does not give much
more information than a Gantt chart; however, it can be expanded to
display more information. The most common information shown is:
1. The activity name
2. The normal duration time
3. The early start time (ES)
4. The early finish time (EF)
5. The late start time (LS)
6. The late finish time (LF)
7. The slack
In order to determine this information it is assumed that the activities
and normal duration times are given. The first step is to determine the
ES and EF. The ES is defined as the maximum EF of all predecessor activities, unless the activity in question is the
first activity, for which the ES is zero (0). The EF is the ES plus the task duration (EF = ES + duration).
• The ES for start is zero since it is the first activity. Since the duration is zero, the EF is also zero. This EF is used
as the ES for a and b.
• The ES for a is zero. The duration (4 work days) is added to the ES to get an EF of four. This EF is used as the ES
for c and d.
• The ES for b is zero. The duration (5.33 work days) is added to the ES to get an EF of 5.33.
• The ES for c is four. The duration (5.17 work days) is added to the ES to get an EF of 9.17.
• The ES for d is four. The duration (6.33 work days) is added to the ES to get an EF of 10.33. This EF is used as
the ES for f.
• The ES for e is the greatest EF of its predecessor activities (b and c). Since b has an EF of 5.33 and c has an EF of
9.17, the ES of e is 9.17. The duration (5.17 work days) is added to the ES to get an EF of 14.34. This EF is used
as the ES for g.
• The ES for f is 10.33. The duration (4.5 work days) is added to the ES to get an EF of 14.83.
• The ES for g is 14.34. The duration (5.17 work days) is added to the ES to get an EF of 19.51.
• The ES for finish is the greatest EF of its predecessor activities (f and g). Since f has an EF of 14.83 and g has an
EF of 19.51, the ES of finish is 19.51. Finish is a milestone (and therefore has a duration of zero), so the EF is
also 19.51.
Barring any unforeseen events, the project should take 19.51 work days to complete. The next step is to determine
the late start (LS) and late finish (LF) of each activity. This will eventually show if there are activities that have
slack. The LF is defined as the minimum LS of all successor activities, unless the activity is the last activity, for
which the LF equals the EF. The LS is the LF minus the task duration (LS = LF - duration).
• The LF for finish is equal to the EF (19.51 work days) since it is the last activity in the project. Since the duration
is zero, the LS is also 19.51 work days. This will be used as the LF for f and g.
Program Evaluation and Review Technique
• The LF for g is 19.51 work days. The duration (5.17 work days) is subtracted from the LF to get an LS of 14.34
work days. This will be used as the LF for e.
• The LF for f is 19.51 work days. The duration (4.5 work days) is subtracted from the LF to get an LS of 15.01
work days. This will be used as the LF for d.
• The LF for e is 14.34 work days. The duration (5.17 work days) is subtracted from the LF to get an LS of 9.17
work days. This will be used as the LF for b and c.
• The LF for d is 15.01 work days. The duration (6.33 work days) is subtracted from the LF to get an LS of 8.68
work days.
• The LF for c is 9.17 work days. The duration (5.17 work days) is subtracted from the LF to get an LS of 4 work
• The LF for b is 9.17 work days. The duration (5.33 work days) is subtracted from the LF to get an LS of 3.84
work days.
• The LF for a is the minimum LS of its successor activities. Since c has an LS of 4 work days and d has an LS of
8.68 work days, the LF for a is 4 work days. The duration (4 work days) is subtracted from the LF to get an LS of
0 work days.
• The LF for start is the minimum LS of its successor activities. Since a has an LS of 0 work days and b has an LS
of 3.84 work days, the LS is 0 work days.
The next step is to determine the critical path and if any activities have slack. The critical path is the path that takes
the longest to complete. To determine the path times, add the task durations for all available paths. Activities that
have slack can be delayed without changing the overall time of the project. Slack is computed in one of two ways,
slack = LF - EF or slack = LS - ES. Activities that are on the critical path have a slack of zero (0).
• The duration of path adf is 14.83 work days.
• The duration of path aceg is 19.51 work days.
• The duration of path beg is 15.67 work days.
The critical path is aceg and the critical time is 19.51 work days. It is important to note that there can be more than
one critical path (in a project more complex than this example) or that the critical path can change. For example, let's
say that activities d and f take their pessimistic (b) times to complete instead of their expected (T
) times. The critical
path is now adf and the critical time is 22 work days. On the other hand, if activity c can be reduced to one work day,
the path time for aceg is reduced to 15.34 work days, which is slightly less than the time of the new critical path, beg
(15.67 work days).
Assuming these scenarios do not happen, the slack for each activity can now be determined.
• Start and finish are milestones and by definition have no duration, therefore they can have no slack (0 work days).
• The activities on the critical path by definition have a slack of zero; however, it is always a good idea to check the
math anyway when drawing by hand.
• LF
- EF
= 4 - 4 = 0
• LF
- EF
= 9.17 - 9.17 = 0
• LF
- EF
= 14.34 - 14.34 = 0
• LF
- EF
= 19.51 - 19.51 = 0
• Activity b has an LF of 9.17 and an EF of 5.33, so the slack is 3.84 work days.
• Activity d has an LF of 15.01 and an EF of 10.33, so the slack is 4.68 work days.
• Activity f has an LF of 19.51 and an EF of 14.83, so the slack is 4.68 work days.
Therefore, activity b can be delayed almost 4 work days without delaying the project. Likewise, activity d or activity
f can be delayed 4.68 work days without delaying the project (alternatively, d and f can be delayed 2.34 work days
Program Evaluation and Review Technique
A completed network diagram created using Microsoft Visio. Note the critical path is in
• PERT chart explicitly defines and
makes visible dependencies
(precedence relationships) between
the WBS elements
• PERT facilitates identification of
the critical path and makes this
• PERT facilitates identification of
early start, late start, and slack for
each activity,
• PERT provides for potentially
reduced project duration due to
better understanding of
dependencies leading to improved
overlapping of activities and tasks
where feasible.
• The large amount of project data can be organized & presented in diagram for use in decision making.
• There can be potentially hundreds or thousands of activities and individual dependency relationships
• The network charts tend to be large and unwieldy requiring several pages to print and requiring special size paper
• The lack of a timeframe on most PERT/CPM charts makes it harder to show status although colours can help
(e.g., specific colour for completed nodes)
• When the PERT/CPM charts become unwieldy, they are no longer used to manage the project.
Uncertainty in project scheduling
During project execution, however, a real-life project will never execute exactly as it was planned due to uncertainty.
It can be ambiguity resulting from subjective estimates that are prone to human errors or it can be variability arising
from unexpected events or risks. The main reason that the Project Evaluation and Review Technique (PERT) may
provide inaccurate information about the project completion time is due to this schedule uncertainty. This inaccuracy
is large enough to render such estimates as not helpful.
One possibility to maximize solution robustness is to include safety in the baseline schedule in order to absorb the
anticipated disruptions. This is called proactive scheduling. A pure proactive scheduling is a utopia, incorporating
safety in a baseline schedule that allows to cope with every possible disruption would lead to a baseline schedule
with a very large make-span. A second approach, reactive scheduling, consists of defining a procedure to react to
disruptions that cannot be absorbed by the baseline schedule.
Program Evaluation and Review Technique
[1] Malcolm, D. G., J. H. Roseboom, C. E. Clark, W. Fazar Application of a Technique for Research and Development Program Evaluation
OPERATIONS RESEARCH Vol. 7, No. 5, September-October 1959, pp. 646-669
[2] 1968 Winter Olympics official report. (http:// www. la84foundation. org/6oic/ OfficialReports/1968/ or1968. pdf) p. 49. Accessed 1
November 2010. (English) & (French)
[3] http:/ / en.wiktionary.org/ wiki/ lead#Verb_2
Further reading
• Project Management Institute (2003). A Guide To The Project Management Body Of Knowledge (3rd ed. ed.).
Project Management Institute. ISBN 1-930699-45-X.
• Klastorin, Ted (2003). Project Management: Tools and Trade-offs (3rd ed. ed.). Wiley. ISBN 978-0471413844.
• Kerzner, Harold (2003). Project Management: A Systems Approach to Planning, Scheduling, and Controlling (8th
Ed. ed.). Wiley. ISBN 0-471-22577-0.
• Milosevic, Dragan Z. (2003). Project Management ToolBox: Tools and Techniques for the Practicing Project
Manager. Wiley. ISBN 978-0471208228.
External links
• More explanation of PERT (http:// www. netmba.com/ operations/ project/pert)
• 3 Point Estimating Tutorial on VisionaryTools.com (http:/ / www.visionarytools. com/ decision-making/
3-point-estimating. htm)
Quality (business)
Quality in business, engineering and manufacturing has a pragmatic interpretation as the non-inferiority or
superiority of something. Quality is a perceptual, conditional and somewhat subjective attribute and may be
understood differently by different people. Consumers may focus on the specification quality of a product/service,
or how it compares to competitors in the marketplace. Producers might measure the conformance quality, or degree
to which the product/service was produced correctly.
Numerous definitions and methodologies have been created to assist in managing the quality-affecting aspects of
business operations. Many different techniques and concepts have evolved to improve product or service quality.
There are two common quality-related functions within a business. One is quality assurance which is the prevention
of defects, such as by the deployment of a quality management system and preventative activities like failure mode
and effects analysis (FMEA). The other is quality control which is the detection of defects, most commonly
associated with testing which takes place within a quality management system typically referred to as verification
and validation.
Quality (business)
The common element of the business definitions is that the quality of a product or service refers to the perception of
the degree to which the product or service meets the customer's expectations. Quality has no specific meaning unless
related to a specific function and/or object. Quality is a perceptual, conditional and somewhat subjective attribute.
The business meanings of quality have developed over time. Various interpretations are given below:
1. ISO 9000: "Degree to which a set of inherent characteristics fulfills requirements."
The standard defines
requirement as need or expectation.
2. Six Sigma: "Number of defects per million opportunities."
3. Subir Chowdhury: "Quality combines people power and process power."
4. Philip B. Crosby: "Conformance to requirements."

The requirements may not fully represent customer
expectations; Crosby treats this as a separate problem.
5. Joseph M. Juran: "Fitness for use."
Fitness is defined by the customer.
6. Noriaki Kano and others, present a two-dimensional model of quality: "must-be quality" and "attractive
The former is near to "fitness for use" and the latter is what the customer would love, but has not yet
thought about. Supporters characterize this model more succinctly as: "Products and services that meet or exceed
customers' expectations."
7. Robert Pirsig: "The result of care."
8. Genichi Taguchi, with two definitions:
a. "Uniformity around a target value."
The idea is to lower the standard deviation in outcomes, and to keep
the range of outcomes to a certain number of standard deviations, with rare exceptions.
b. "The loss a product imposes on society after it is shipped."
This definition of quality is based on a more
comprehensive view of the production system.
9. American Society for Quality: "A subjective term for which each person has his or her own definition. In
technical usage, quality can have two meanings:
a. The characteristics of a product or service that bear on its ability to satisfy stated or implied needs;
b. A product or service free of deficiencies."
10. Peter Drucker: "Quality in a product or service is not what the supplier puts in. It is what the customer gets out
and is willing to pay for."
11. W. Edwards Deming: concentrating on "the efficient production of the quality that the market expects,"
he linked quality and management: "Costs go down and productivity goes up as improvement of quality is
accomplished by better management of design, engineering, testing and by improvement of processes."
12. Gerald M. Weinberg: "Value to some person".
Market Sector Perspectives
Operations Management
The dimensions of quality refer to the attributes that quality achieves in operations management:
• Quality supports dependability
• Dependability supports speed
• Speed supports flexibility
• Flexibility supports cost
Quality <-> Dependability <-> Speed <-> Flexibility <-> Cost
Quality (business)
In the manufacturing industry it is commonly stated that “Quality drives productivity.” Improved productivity is a
source of greater revenues, employment opportunities and technological advances. However, this has not been the
case historically, and in the early 19th century it was recognised that some markets,such as those in Asia, preferred
cheaper products to those of quality
Most discussions of quality refer to a finished part, wherever it is in the
process. Inspection, which is what quality insurance usually means, is historical, since the work is done. The best
way to think about quality is in process control. If the process is under control, inspection is not necessary.
However, there is one characteristic of modern quality that is universal. In the past, when we tried to improve
quality, typically defined as producing fewer defective parts, we did so at the expense of increased cost, increased
task time, longer cycle time, etc. We could not get fewer defective parts and lower cost and shorter cycle times, and
so on. However, when modern quality techniques are applied correctly to business, engineering, manufacturing or
assembly processes, all aspects of quality - customer satisfaction and fewer defects/errors and cycle time and task
time/productivity and total cost, etc.- must all improve or, if one of these aspects does not improve, it must at least
stay stable and not decline. So modern quality has the characteristic that it creates AND-based benefits, not
OR-based benefits.
One view of quality is that it is defined entirely by the customer or end user, and is based upon that person's
evaluation of his or her entire customer experience. The customer experience is defined as the aggregate of all the
interactions that customers have with the company's products and services. For example, any time one buys a
product, one forms an impression based on how it was sold, how it was delivered, how it performed, how well it was
supported etc.
Quality Management Techniques
• Quality Management Systems • Continuous improvement • Theory of Constraints (TOC)
• Total Quality Management (TQM) • Six Sigma • Business Process Management (BPM)
• Design of experiments
• Fractional factorial design
• Optimal design
• Response surface methodology
• Statistical Process Control (SPC) • Business process re-engineering
• Quality circles • Capability Maturity Models
• Requirements analysis
• Verification and Validation
• Zero Defects
Quality (business)
Quality Awards
• Malcolm Baldrige National Quality Award
• Deming Prize
[1] TC 176/SC (2005). ISO 9000:2005, Quality management systems -- Fundamentals and vocabulary. International Organization for
[2] Motorola University. "What is Six Sigma?" (http:/ / www.motorola.com/ content. jsp?globalObjectId=3088). Motorola, Inc.. . Retrieved
[3] Chowdhury, Subir (2005). The Ice Cream Maker: An Inspiring Tale About Making Quality The Key Ingredient in Everything You Do. New
York: Doubleday, Random House. ISBN 978-0385514781.
[4] Crosby, Philip (1979). Quality is Free. New York: McGraw-Hill. ISBN 0070145121.
[5] American Society for Quality, Glossary - Entry: Quality (http:/ / www. asq. org/glossary/ q. html), , retrieved 2008-07-20
[6] Kano, Noriaki (1984-04-01). "Attractive quality and must-be quality". The Journal of the Japanese Society for Quality Control: 39–48.
[7] .Pirsig, Robert M. (1974). Zen and the art of motorcycle maintenance : an inquiry into values. New York, N.Y.: Morrow. ISBN 0688002307.
Cited by: Jones, D.R. (September 1989). "Exploring quality: what Robert Pirsig's "Zen and the Art of Motorcycle Maintenance" can teach us
about technical communication". IEEE Transactions on Professional Communication (IEEE) 32 (3): 154–158.
[8] Taguchi, G. (1992). Taguchi on Robust Technology Development. ASME Press. ISBN 978-9992910269.
[9] .Ealey, Lance A. (1988). Quality by design: Taguchi methods and U.S. industry. Dearborn, Mich.: ASI Press. ISBN 9781556239700. Cited
by: Sriraman, Vedaraman, A primer on the Taguchi system of quality engineering (http:/ / scholar. lib. vt.edu/ ejournals/ JOTS/
Summer-Fall-1996/ PDF/ 9-2-Sriraman-article.pdf), , retrieved 2008-07-20
[10] Drucker, Peter (1985). Innovation and entrepreneurship. Harper & Row. ISBN 9780060913601.
[11] Edwards Deming, W. (1986). Out of the Crisis. Cambridge, Mass.: Massachusetts Institute of Technology, Center for Advanced
Engineering Study. ISBN 0-911379-01-0.
[12] Walton, Mary; W. Edwards Deming (1988). The Deming management method. Perigee. pp. 88. ISBN 0399550003.
[13] p.169, Rochfort Scott, Hamerton
• Boone, Louis E. & Kurtz, David L., Contemporary Business 2006, Thomson South-Western, 2006
• Rochfort Scott, Charles & Hamerton, Robert Jacob, Rambles in Egypt and Candia: With Details of the Military
Power and Resources of Those Countries, and Observations on the Government, Policy, and Commercial System
of Mohammed Ali, Volume I, H. Colburn, London, 1837
External links
• A proposed universal definition of 'quality' (http:// www.bin. co.uk/ qw_SOAPBOX_0805. pdf)
• DocQuality Document references on quality theme (http:/ / www.docquality. info)
• Quality Management links (http:/ / www.compulegal. eu/ files/ qm. htm)
Quality assurance
Quality assurance
Quality assurance, or QA (in use from 1973) for short, is the systematic monitoring and evaluation of the various
aspects of a project, service or facility to maximize the probability that minimum standards of quality are being
attained by the production process.
QA cannot absolutely guarantee the production of quality products.
Two principles included in QA are: "Fit for purpose" - the product should be suitable for the intended purpose; and
"Right first time" - mistakes should be eliminated. QA includes regulation of the quality of raw materials,
assemblies, products and components, services related to production, and management, production and inspection
Quality is determined by the product users, clients or customers, not by society in general. It is not the same as
'expensive' or 'high quality'. Low priced products can be considered as having high quality if the product users
determine them as such.
Initial efforts to control the quality of production
During the Middle Ages, guilds adopted responsibility for quality control of their members, setting and maintaining
certain standards for guild membership.
Royal governments purchasing material were interested in quality control as customers. For this reason, King John
of England appointed William Wrotham to report about the construction and repair of ships. Centuries later, Samuel
Pepys, Secretary to the British Admiralty, appointed multiple such overseers.
Prior to the extensive division of labor and mechanization resulting from the Industrial Revolution, it was possible
for workers to control the quality of their own products. The Industrial Revolution led to a system in which large
groups of people performing a similar type of work were grouped together under the supervision of a foreman who
was appointed to control the quality of work manufactured.
Wartime production
At the time of the First World War, manufacturing processes typically became more complex with larger numbers of
workers being supervised. This period saw the widespread introduction of mass production and piecework, which
created problems as workmen could now earn more money by the production of extra products, which in turn
occasionally led to poor quality workmanship being passed on to the assembly lines. To counter bad workmanship,
full time inspectors were introduced to identify, quarantine and ideally correct product quality failures. Quality
control by inspection in the 1920s and 1930s led to the growth of quality inspection functions, separately organised
from production and large enough to be headed by superintendents.
The systematic approach to quality started in industrial manufacturing during the 1930s, mostly in the USA, when
some attention was given to the cost of scrap and rework. With the impact of mass production required during the
Second World War made it necessary to introduce an improved form of quality control known as Statistical Quality
Control, or SQC. Some of the initial work for SQC is credited to Walter A. Shew hart of Bell Labs, starting with his
famous one-page memorandum of 1924.
SQC includes the concept that every production piece cannot be fully inspected into acceptable and non acceptable
batches. By extending the inspection phase and making inspection organizations more efficient, it provides
inspectors with control tools such as sampling and control charts, even where 100 per cent inspection is not
practicable. Standard statistical techniques allow the producer to sample and test a certain proportion of the products
for quality to achieve the desired level of confidence in the quality of the entire batch or production run.
Quality assurance
In the period following World War II, many countries' manufacturing capabilities that had been destroyed during the
war were rebuilt. General Douglas MacArthur oversaw the re-building of Japan. During this time, General
MacArthur involved two key individuals in the development of modern quality concepts: W. Edwards Deming and
Joseph Juran. Both individuals promoted the collaborative concepts of quality to Japanese business and technical
groups, and these groups utilized these concepts in the redevelopment of the Japanese economy.
Although there were many individuals trying to lead United States industries towards a more comprehensive
approach to quality, the U.S. continued to apply the Quality Control (QC) concepts of inspection and sampling to
remove defective product from production lines, essentially ignoring advances in QA for decades.
Steps for a typical quality assurance process
There are many forms of QA processes, of varying scope and depth. The application of a particular process is often
customized to the production process.
A typical process may include:
• test of previous articles
• plan to improve
• design to include improvements and requirements
• manufacture with improvements
• review new item and improvements
• test of the new item
Failure testing
Valuable processes to perform on a whole consumer product is failure testing or stress testing. In mechanical terms
this is the operation of a product until it fails, often under stresses such as increasing vibration, temperature, and
humidity. This exposes many unanticipated weaknesses in a product, and the data are used to drive engineering and
manufacturing process improvements. Often quite simple changes can dramatically improve product service, such as
changing to mold-resistant paint or adding lock-washer placement to the training for new assembly personnel.
Statistical control
Many organizations use statistical process control to bring the organization to Six Sigma levels of quality, in other
words, so that the likelihood of an unexpected failure is confined to six standard deviations on the normal
distribution. This probability is less than four one-millionths. Items controlled often include clerical tasks such as
order-entry as well as conventional manufacturing tasks.
Traditional statistical process controls in manufacturing operations usually proceed by randomly sampling and
testing a fraction of the output. Variances in critical tolerances are continuously tracked and where necessary
corrected before bad parts are produced.
Quality assurance
Total quality management
The quality of products is dependent upon that of the participating constituents,
some of which are sustainable and
effectively controlled while others are not. The process(es) which are managed with QA pertain to Total Quality
If the specification does not reflect the true quality requirements, the product's quality cannot be guaranteed. For
instance, the parameters for a pressure vessel should cover not only the material and dimensions but operating,
environmental, safety, reliability and maintainability requirements.
QA in software development
The following are examples of QA models relating to the software development process.
Models and standards
ISO 17025 is an international standard that specifies the general requirements for the competence to carry out tests
and or calibrations. There are 15 management requirements and 10 technical requirements. These requirements
outline what a laboratory must do to become accredited. Management system refers to the organization's structure for
managing its processes or activities that transform inputs of resources into a product or service which meets the
organization's objectives, such as satisfying the customer's quality requirements, complying with regulations, or
meeting environmental objectives.
The CMMI (Capability Maturity Model Integration) model is widely used to implement Quality Assurance (PPQA)
in an organization. The CMMI maturity levels can be divided in to 5 steps, which a company can achieve by
performing specific activities within the organization. (CMMI QA processes are excellent for companies like NASA,
and may even be adapted for agile development style).
Company quality
During the 1980s, the concept of “company quality” with the focus on management and people came to the fore. It
was realized that, if all departments approached quality with an open mind, success was possible if the management
led the quality improvement process.
The company-wide quality approach places an emphasis on four aspects :-
1. Elements such as controls, job management, adequate processes, performance and integrity criteria and
identification of records
2. Competence such as knowledge, skills, experience, qualifications
3. Soft elements, such as personnel integrity, confidence, organizational culture, motivation, team spirit and quality
4. Infrastructure (as it enhances or limits functionality)
The quality of the outputs is at risk if any of these aspects is deficient.
QA is not limited to the manufacturing, and can be applied to any business or non-business activity:
• Design work
• Administrative services
• Consulting
• Banking
• Insurance
• Computer software development
• Retailing
• Transportation
Quality assurance
• Education
• Translation
It comprises a quality improvement process, which is generic in the sense it can be applied to any of these activities
and it establishes a behavior pattern, which supports the achievement of quality.
This in turn is supported by quality management practices which can include a number of business systems and
which are usually specific to the activities of the business unit concerned.
In manufacturing and construction activities, these business practices can be equated to the models for quality
assurance defined by the International Standards contained in the ISO 9000 series and the specified Specifications
for quality systems.
In the system of Company Quality, the work being carried out was shop floor inspection which did not reveal the
major quality problems. This led to quality assurance or total quality control, which has come into being recently.
Using contractors and/or consultants
Consultants and contractors are sometimes employed when introducing new quality practices and methods,
particularly where the relevant skills and expertise are not available within the organization or when allocating the
available internal resources are not available. Consultants and contractors will often employ Quality Management
Systems (QMS), auditing and procedural documentation writing CMMI, Six Sigma, Measurement Systems Analysis
(MSA), Quality Function Deployment (QFD), Failure Mode and Effects Analysis (FMEA), and Advance Product
Quality Planning (APQP).
Quality assurance in European vocational education & training
With the formulation of a joint quality strategy, the European Union seeks to fostering the overall attractiveness of
vocational education & training (VET) in Europe. In order to promote this process, a set of new policy instruments
were implemented, such as CQAF
(Common Quality Assurance Framework) and replacement EQARF
(European Quality Assurance Reference framework), which shall allow for EC-wide comparison of QA in VET and
building the capacities for a common quality assurance policy and quality culture in VET throughout Europe.
Furthermore the new policy instruments shall allow for an increased transparency and mutual trust between national
VET systems.
In line with the European quality strategy, the member states subsequently have implemented national structures
(QANRPs: reference points for quality assurance in VET), who closely collaborate with national stakeholders in
order to meet the requirements and priorities of the national VET systems and support activities to training providers
in order to guarantee the implementation and commitment at all levels. At European level, the cooperation between
QANRPs will be ensured through the EQAVET
Over the past few years, with financial support of the European Union as well as the EU member states, numerous
pilot initiatives have been developed, most of which are concerned with the promotion and development of quality in
VET throughout Europe. Examples can be found in the project database ADAM
, which keeps comprehensive
information about innovation & transfer projects sponsored by the EU.
A practical example might be seen in the BEQUAL
project, which has developed a benchmarking tool for
training providers, who with the help of the online-tool can benchmark their quality performance in line with the
CQAF quality process model. Furthermore the project offers a database with European good practice on quality
assurance in the field of vocational education & training.
• Online Benchmarking Tool For Vocational Training Institutes
A different approach was developed by the European VETWORKS
project. The project builds on the
observation, that over the past years there’s rapid growing VET networks throughout Europe, with a strong tendency
Quality assurance
to interlocking educational activities across organisations and sectors. It is argued that the vast majority of
instruments and methods of quality assurance available for educational planning, monitoring and evaluation on
provider level do not meet the new requirements. They are designed for managing the quality of either individual
organisations or discrete training processes and structures, and this way are systematically counting out collaborative
quality processes within newly emerging learning networks. The VETWORKS approach therefore shall allow local
networks to examine their strengths and weaknesses in the area of vocational education & training. When local
networks understand the factors that contribute to their success and those that pose challenges, they can better
undertake strategies to maximize their strengths and effectively address their weaknesses.
Recent experience in place-based learning strategies, shows that learning communities often deploy three key
success areas (Faris, 2007):
• Partnership - learning to build links between all sectors and mobilize their shared resources;
• Participation - learning to involve the public in the policy process as well as learning opportunities;
• Performance - learning to assess progress and benchmark good practice.
The SPEAK-tool, adopted under the VETWORKS initiative to each of these areas deploys indicative descriptors,
which local networks can use to determine achievements in their activities towards building on local VET quality.
Following the EQARF process model, each indicative descriptor can be assigned a certain stage of the P-D-C-A
cycle. This not only allows for compliance with the EQARF principles, but rather extends the original model by
deploying a separate network level, bridging between systems and institute level. The quality process cycle employs
four key areas of activity, typically found in quality management systems: quality planning, control, assurance and
improvement. Each area of activity is focused on a specific quality issue, such as what do we want to achieve?,
which concrete operations are required to ensure achievement? what have we achieved? what needs to be improved?
indicative descriptors and process cycle define the core elements of quality management in VET networks. SPEAK
aims at using quality assurance on a new range by systematically taking advantage of the EQARF and combining it
with state-of-the-art methodologies of self-evaluation, especially the SPEAK instrument. By using SPEAK,
stakeholders and managers of educational networks and programmes will be able to link progress indicators
available for provider, network and system level:
• provider level: SPEAK helps to systematically gain knowledge about VET providers’ “performance” in their
working environment (at the market, or in the network),
• network level: SPEAK helps to evaluate progress-indices, and measure the total operating performance of
educational networks and organizations, by connecting relevant data on all actors´ levels: collaborators,
volunteers, management etc.
• system level: SPEAK helps to reflect and concretize descriptors available for the system level in the light of local
VET strategies and programmes.
Finally, on the basis of different analysis options SPEAK also can help to get essential insights in long-term effects
of educational programs and requirements for change. The below table reflects this construction principle.
• European Quality Assurance in Vocational Education & Training
Quality assurance
• The Quality Assurance Journal
, ISSN: 1087-8378, John Wiley & Sons
• QP - Quality Progress magazine
, Published by the American Society for Quality
• Quality Assurance in Education
, ISSN 0968-4883, Emerald Publishing Group
• Accreditation and Quality Assurance: Journal for Quality, Comparability and Reliability in Chemical
, ISSN: 0949-1775 Print, ISSN: 1432-0517
• Food Quality and Preference
, ISSN: 0950-3293
• Almeida, E., Alvaro, A., Meria, S. (2007, September 3–4). A Component quality assurance process. Foundations
of Software Engineering, doi: http:// doi.acm.org/ 10.1145/1295074. 1295093
• Feldman, S. (2005, February). Quality assurance: much more than testing. Queue, 3(1), doi: http:/ / doi. acm. org/
10. 1145/ 1046931. 1046943
• Meisinger, M., Wagner, S. (2006, November 6). Integrating a Model of Analytical Quality Assurance into the
V-Modell XT. Foundations of Software Engineering, 38-45.doi: http:// doi.acm. org/10. 1145/ 1188895.
• Majcen N., Taylor P. (Editors): Practical examples on traceability, measurement uncertainty and validation in
chemistry, Vol 1; ISBN 978-92-79-12021-3, 2010.
• Pyzdek, T, "Quality Engineering Handbook", 2003, ISBN 0-8247-4614-7
• Godfrey, A. B., "Juran's Quality Handbook", 1999, ISBN 0-07-034003-X
[1] Definition of "Quality assurance" (http:/ / www. merriam-webster.com/ dictionary/quality+ assurance) in Merriam Webster Dictionary
[2] Thareja, Mannu; Thareja, Priyavrat (February 2007). "The Quality Brilliance Through Brilliant People" (http:/ / papers.ssrn. com/ sol3/
papers.cfm?abstract_id=1498550). Quality World 4 (2). . Retrieved 2010-01-11.
[3] http:// ec.europa. eu/ education/ lifelong-learning-policy/doc1134_en.htm
[4] http:/ / www. eqavet. eu
[5] http:/ / www. adam-europe.eu
[6] http:/ / www. bequal. info
[7] http:/ / www. bequal. info/index. php
[8] http:// www. vetworks. pl
[9] http:/ / eqavet.eu/ gns/ home. aspx
[10] http:/ / www3. interscience. wiley. com/ journal/15634/ home
[11] http:/ / www. asq. org/qualityprogress/ index. html
[12] http:/ / www. emeraldinsight. com/ products/ journals/ journals. htm?id=qae
[13] http:/ / www. springerlink.com/ content/ q922ehvpaq49pw6q/
[14] http:// www. elsevier. com/ wps/ find/journaldescription.cws_home/ 405859/ description
Quality assurance
External links
• Graduate Certificate in Quality Assurance, LH Martin Institute for Higher Education Leadership & Management,
The University of Melbourne (http:// www. lhmartininstitute. edu.au/ )
• Measurement Science in Chemistry (http:/ / www.msc-euromaster.eu/ )
• Quality Assurance Criteria for Nuclear Power Plants and Fuel Reprocessing Plants (http:/ /www. nrc.gov/
reading-rm/doc-collections/ cfr/part050/ part050-appb.html)
• Training in metrology in chemistry (http:/ / www.trainmic.org)
Quantitative risk assessment software
Quantitative risk assessment software is software that helps to calculate the single loss expectancy (SLE) of an asset.
• RBM II (Risk Based Management II)
• Riscwise RBI
• Quest Reliability IDMS
• Safeti
• RAM (Reliability, Availabity and Maintainability)
• [4]
• Shepherd
• Riskcurves (non-integrated)
• Effects (non-integrated)
• Damage (non-integrated)
• Riskplot (non-integrated)
• The Asset Partnership qra-toolkit
• Offshore Hazard and Risk Analysis (OHRA)
• Neptune
Some QRA-software (such as Safety-nl) has not yet made the transition to 3d. For making certain assessments, this
has proven problematic. For example, the contours of the landscape or the structure of buildings or machinery may
influence the path of fluids or gases which can pose a risk to the public safety if accidentally released.
Some software also does not allow to input certain parameters. For examply Satefy-nl does not allow wind speeds of
0 to 0.5m/s.
Also, some software does not accuratly predicts the behavior of some fluids or gases. For example, heavier than
air-gases have the tendency to from a layer on the surface rather than blending spontanuously with the air.
Lighter-than-air gases do the opposite; forming a layer higher up in the air.
Quantitative risk assessment software
[1] http:/ / www. twisoftware.com/ riskwise
[2] http:// www. questreliability. com/ Default.aspx?tabid=182
[3] QRA software (http://plein66. nl/ documents/ 1024/ artikel_CO2-opslag. pdf)
[4] http:/ / www. processint. com
[5] Risktec QRA-software list (http:/ / www.risktec. co. uk/ GetBlob. aspx?TableName=Downloads& ColumnName=item&
[6] http:// www. assetpartnership. com/ html/ Software.htm
[7] Offshore software (http:/ / www.engr.mun. ca/ ~fkhan/EN-6601/QRA. doc)
[8] Natuurwetenschap & Techniek; April 2009; CO² behavior not implemented to Safety-nl (http:// plein66. nl/ documents/ 1024/
Recognition (sociology)
Recognition in sociology is public acknowledgement of person's status or merits (achievements, virtues, service,
When some person is recognized, he or she is accorded some special status, such as a name, title, or classification.
Recognition can take many forms, such as mention in the mass media.
Historical examples
The Qianlong Emperor of China used large circular logos the size of a dinner plate to distinguish members of his
family from his Han subjects. Their symbol of privilege was a Mandarin square on their clothing.
It becomes easier for people to be accepted into some social process if they allow themselves to fit into a social
identity, as a signal that they implicitly accept some social norm. Thus the use of uniform dress is a signal for both
group inclusion and acceptance. Gangs use signals and dress for this purpose.
Dress codes and norms also occur for religious groups.
In employment
As a means to increase productivity, communication, and satisfaction in the workplace, recognition is a tool used by
many successful organizations to address these challenges. Recognition can be used in multiple models, including
manager-to-employee, employee-to-manager, and peer-to-peer. In terms of employment, individuals within an
organization can acknowledge each other for great attitudes, individual efforts and team contributions that help build
a great culture and positive work environment.
Recognition in the workplace can be a monetising activity, a complementary activity, or both. In terms of monetised
activities, organizations will recognize employees with additional compensation (bonuses) or items that have a
monetary value (tickets, trips, etc.). In terms of complementary activity, organizations will recognize employees
through avenues such as broadcasting (notice to fellow employees) or public recognition with a “thank you”, “kudos”,
or “congratulations”.
Recognition (sociology)
Recognition Resources
A set of tools was developed in 2008 to help individuals better understand their Recognition Style. These tools can
be found at RecognizeAnother.com
and provide a visual representations of a teams overall Recognition Nature, an
indicator useful for leaders in determining healthy recognition team dynamics.
[1] http:/ / www. RecognizeAnother. com/
Registration, Evaluation, Authorisation and Restriction of Chemicals
Registration, Evaluation, Authorisation and
Restriction of Chemicals
European Union regulation:
Regulation (EC) No 1907/2006
Regulation concerning the Registration, Evaluation, Authorisation and Restriction of Chemicals (REACH), establishing a European Chemicals
Made by European Parliament and Council
Made under Art. 95 (EC)
Journal reference
L396, 30.12.2006, pp. 1–849
Made 18 December 2006
Came into force 1 June 2007
Preparative texts
Commission proposal COM 2003/0644 Final
EESC opinion
C112, 30.4.2004, p. 92
C294, 25.11.2005, pp. 38–44
CR opinion C164, 2005, p. 78
EP opinion 17 November 2005
13 December 2006
Other legislation
Reg. (EEC) No 793/93
Reg. (EC) No 1488/94
Dir. 76/769/EEC
Dir. 91/155/EEC
Dir. 93/67/EEC
Dir. 93/105/EEC
Dir. 2000/21/EC
Dir. 1999/45/EC
Amended by Reg. (EC) No 1272/2008
Status: Current legislation
Registration, Evaluation, Authorisation and Restriction of Chemicals (REACH) is a European Union Regulation
of 18 December 2006.
REACH addresses the production and use of chemical substances, and their potential
impacts on both human health and the environment. Its 849 pages took seven years to pass, and it has been described
as the most complex legislation in the Union's history
and the most important in 20 years.
It is the strictest law
to date regulating chemical substances and will affect industries throughout the world.
REACH entered into force
in 1 June 2007, with a phased implementation over the next decade.
Registration, Evaluation, Authorisation and Restriction of Chemicals
European Chemicals Agency headquarters in
Annankatu, Helsinki.
When REACH is fully in force, it will require all companies
manufacturing or importing chemical substances into the European
Union in quantities of one tonne or more per year to register these
substances with a new European Chemicals Agency (ECHA) in
Helsinki, Finland. Because REACH applies to some substances that are
contained in objects ('articles' in REACH terminology), any company
importing goods into Europe could be affected.
About 143,000 chemical substances marketed in the European Union
were pre-registered by the 1 December 2008 deadline. Although
pre-registering was not mandatory, it allows potential registrants much
more time before they have to fully register. Supply of substances to
the European market which have not been pre-registered or registered is illegal (known in REACH as "no data, no
REACH also addresses the continued use of chemical 'substances of very high concern' (SVHC) because of their
potential negative impacts on human health or the environment. From 1 June 2011, the European Chemicals Agency
must be notified of the presence of SVHCs in articles if the total quantity used is more than one tonne per year and
the SVHC is present at more than 0.1% of the mass of the object. Some uses of SVHCs may be subject to prior
authorisation from the European Chemicals Agency, and applicants for authorisation will have to include plans to
replace the use of the SVHC with a safer alternative (or, if no safer alternative exists, the applicant must work to find
one) - known as 'substitution'. As of March 2009, there are fifteen SVHCs.
REACH applies to all chemicals imported or produced in the EU. The European Chemicals Agency will manage the
technical, scientific and administrative aspects of the REACH system.
To somewhat simplify the registration of the 143,000 substances and to limit vertebrate animal testing as far as
possible, Substance Information Exchange Forums (SIEFs) are formed amongst legal entities (such as
manufacturers, importers, and data holders) who are dealing with the same substance
. This allows them to join
forces and finances to create 1 registration dossier. However, this creates a series of new problems as a SIEF is the
cooperation between sometimes a thousand legal entities, which did not know each other at all before but suddenly
• find each other and start communicating openly and honestly
• start sharing data
• start sharing costs in a fair and transparent way
• democratically and in full consensus take the most complex decisions
in order to complete a several thousand end points dossier in a limited time.
The European Commission supports businesses affected by REACH by handing out – free of charge – a software
application (IUCLID), which simplifies capturing, managing and submitting of data on chemical properties and
effects. Such submission is a mandatory part of the registration process. Under certain circumstances the
performance of a Chemical Safety Assessment (CSA) is mandatory and a Chemical Safety Report (CSR) assuring
the safe use of the substance has to be submitted with the dossier. Dossier submission is done using the web-based
software REACH-IT.
Registration, Evaluation, Authorisation and Restriction of Chemicals
REACH is the product of a wide-ranging overhaul of EU chemical policy. It passed the first reading in the European
Parliament on 17 November 2005, and the Council of Ministers reached a political agreement for a common position
on 13 December 2005. The European Parliament approved REACH on 13 December 2006 and the Council of
Ministers formally adopted it on 18 December 2006. Weighing up expenditure versus profit has always been a
significant issue, with the estimated cost of compliance being around 5 billion euro over 11 years, and the assumed
health benefits of saved billions of euro in healthcare costs.
However, there have been different studies on the
estimated cost which vary considerably in the outcome.
A separate regulation – the CLP Regulation (for "Classification, Labelling, Packaging") – implements the United
Nations Globally Harmonized System of Classification and Labelling of Chemicals (GHS) and will steadily replace
the previous Dangerous Substances Directive and Dangerous Preparations Directive. It came into force on 20
January 2009, and will be fully implemented by 2015.
Reason behind REACH
The legislation was proposed under dual reasoning: protection of human health and protection of the environment.
Using potentially toxic substances (such as phthalates or brominated flame retardants) is deemed undesirable and
REACH will force the use of certain of these substances to be phased out. Using potentially toxic substances in
products other than those ingested by humans (such as electronic devices) may seem to be safe, but there are several
ways in which chemicals can enter the human body and the environment. Substances can leave articles during
consumer use, for example into the air where they can be inhaled or ingested. Even where they might not do direct
harm to humans, they can contaminate the air or water, and can enter the food chain through plants, fish or other
animals. According to the European Commission, little safety information exists for 99 percent of the tens of
thousands of chemicals placed on the market before 1981.
There were 100,106 chemicals in use in the EU in
1981, when the last survey was performed. Of these only 3,000 have been tested and over 800 are known to be
carcinogenic, mutagenic or toxic to reproduction. These are listed in the Annex 1 of the Dangerous Substances
Directive (now Annex 3 of the CLP Regulation)
Continued use of many toxic chemicals is sometimes justified because 'at very low levels they are not a concern to
However, many of these substances may bioaccumulate in the human body, thus reaching dangerous
concentrations. They may also chemically react with one another,
producing new substances with new risks.
Apart from the potential costs to industry and the complexity of the new law, REACH has also attracted concern
because of the potential for a very significant increase in animal testing under the proposal.
Animal tests on
vertebrates are allowed only once per one substance, and where suitable alternatives can't be used. If a company pays
for these tests, it must sell the rights of the results for a "reasonable" price (although this is not defined). There are
additional concerns that access to the necessary information may prove very costly for potential registrants needing
to purchase this.
An opinion in Nature in 2009 by Thomas Hartung and Constanza Rovida estimated that 54 million vertebrate
animals would be used under REACH and that the costs would amount to 9.5 billion Euros.
Hartung is the former
head of European Centre for the Validation of Alternative Methods (ECVAM). ECHA responded by criticising the
assumptions made in Hartung and Rovida's calculations, causing them to overestimate the number of animals used
by a factor of 6.
On 8 June 2006, the REACH proposal came under criticism from a group of nations including the United States,
India and Brazil, claiming that the bill would hamper global trade.
Registration, Evaluation, Authorisation and Restriction of Chemicals
Only Representative
Non-EU consultancies offer “Only Representative” services, though according to REACH it is not possible to
register a substance if your “Only Representative” consultancy company is not based in the EU, unless it is
subcontracted to an EU-based registrant.
Only Representatives are EU based entities that must comply with REACH (Article 8) and should operate standard,
transparent working practices.
The SIEFs will bring new challenges. An article in the business news service Chemical Watch described how some
‘pre-registrants’ may simply be consultants hoping for work (“gold diggers”) while others may be aiming to charge
exorbitant rates for the data they have to offer (“jackals”).
[1] http:/ / eur-lex.europa.eu/ LexUriServ/site/ en/ oj/ 2006/ l_396/ l_39620061230en00010849.pdf
[2] http:/ / eur-lex.europa.eu/ LexUriServ/LexUriServ.do?uri=OJ:C:2004:112:0092:0099:EN:PDF
[3] http:/ / eur-lex.europa.eu/ LexUriServ/LexUriServ.do?uri=OJ:C:2005:294:0038:0044:EN:PDF
[4] http:/ / ecb.jrc.ec. europa.eu/ legislation/ 1993R0793EC. pdf
[5] http:// ecb.jrc.ec. europa.eu/ legislation/ 1994R1488EC. pdf
[6] http:// eur-lex.europa.eu/ LexUriServ/LexUriServ.do?uri=CELEX:31976L0769:EN:HTML
[7] http:/ / eur-lex.europa.eu/ LexUriServ/LexUriServ.do?uri=CELEX:31991L0155:EN:HTML
[8] http:/ / eur-lex.europa.eu/ LexUriServ/LexUriServ.do?uri=CELEX:31993L0067:EN:HTML
[9] http:/ / eur-lex.europa.eu/ LexUriServ/LexUriServ.do?uri=CELEX:31967R0093:EN:HTML
[10] http:/ / eur-lex.europa.eu/ LexUriServ/LexUriServ.do?uri=OJ:L:2000:103:0070:0071:EN:PDF
[11] http:/ / eur-lex.europa.eu/ LexUriServ/LexUriServ.do?uri=OJ:L:1999:200:0001:0068:EN:PDF
[12] Full title: Regulation (EC) No 1907/2006 of the European Parliament and of the Council of 18 December 2006 concerning the Registration,
Evaluation, Authorisation and Restriction of Chemicals (REACH), establishing a European Chemicals Agency.
[13] "EU's REACH chemicals law begins life in Helsinki" (http:/ / euobserver.com/ 9/ 24169). EUobserver.com. 31 May 2007. .
[14] "Q&A: Reach chemicals legislation" (http:// news. bbc. co.uk/ 1/ hi/ world/europe/4437304. stm). BBC News. 28 November 2005. .
[15] Cone, Marla (14 December 2006). "European Parliament OKs world's toughest law on toxic chemicals" (http:/ / www.sfgate. com/ cgi-bin/
article.cgi?f=/c/ a/ 2006/ 12/ 14/MNGR2MV8UT1. DTL&hw=toxic+chemicals& sn=001& sc=1000). San Francisco Chronicle. .
[16] "ECHA Website - Candidate List" (http:/ / echa. europa. eu/ chem_data/ candidate_list_en.asp). . Retrieved 2009-01-25.
[17] http:/ / echa.europa. eu/ sief_en. asp
[18] "EU backs landmark chemicals law" (http://news. bbc.co.uk/ 2/ hi/ europe/ 4524772.stm). BBC News. 13 December 2005. .
[19] "Pesticides 'in a third of foods'" (http:/ / news. bbc. co. uk/ 1/ hi/ health/ 5384138. stm). BBC News. 27 September 2006. .
[20] "Food chemicals 'may harm humans'" (http:// news. bbc. co.uk/ 1/ hi/uk/ 5366028. stm). BBC News. 21 September 2006. .
[21] "REACH - EU Chemicals Testing" (http:/ / www.buav.org/campaigns/ chemicals/ ). British Union for the Abolition of Vivisection. .
Retrieved 2006-06-09.
[22] T. Hartung & C. Rovida: Chemical regulators have overreached. Opinion in Nature, vol. 460, 27 August 2009.
[23] ECHA - New study inaccurate on the number of test animals for REACH (http:/ / echa.europa. eu/ doc/ press/
pr_09_11_animal_testing_20090828.pdf). Helsinki, 28 August 2009
[24] Beunderman, Mark (June 9, 2006). "EU chemicals bill under fire from US-led coalition" (http:// euobserver.com/ 9/ 21813).
EUobserver.com. .
[25] "‘Gold-diggers’, ‘jackals’ and other issues for REACH SIEFs" (http:// chemicalwatch.com/ 1486). Chemical Watch. December 2008. .
Registration, Evaluation, Authorisation and Restriction of Chemicals
External links
• List of REACH (Pre-) Registered substances (http:// www.reachteam. eu/ reg/) - Online search helper
• European Chemicals Agency (http:/ / echa. europa. eu/ ) - The organization responsible for implementing
• European Commission: "What is REACH" (http:/ / ec. europa. eu/ environment/chemicals/ reach.htm)
• European Commission REACH & GHS overview (http:// ec.europa.eu/ enterprise/ reach/index_en. htm) -
for enterprise and industry
• Database of REACH consortia (http:/ / chemicalwatch.com/ REACH_consortia) - Chemical Watch
• REACH Romania (http:/ / www. reach-romania.com) - Consulting producers and importers of chemical
See also critic. For the process on wikipedia, see Wikipedia:Reviewing.
A review is an evaluation of a publication, a product or a service, such as a movie (a movie review), video game,
musical composition (music review of a composition or recording), book (book review); a piece of hardware like a
car, home appliance, or computer; or an event or performance, such as a live music concert, a play, musical theater
show or dance show. In addition to a critical evaluation, the review's author may assign the work a rating to indicate
its relative merit. More loosely, an author may review current events, trends, or items in the news. A compilation of
reviews may itself be called a review. The New York Review of Books, for instance, is a collection of essays on
literature, culture, and current affairs. National Review, founded by William F. Buckley, Jr., is an influential
conservative magazine, and Monthly Review is a long-running socialist periodical.
In the scientific literature, review articles are a category of scientific paper, which provides a synthesis of research on
a topic at that moment in time. A compilation of these reviews forms the core content of a 'secondary' scientific
journal, with examples including Annual Reviews, the Nature Reviews series of journals and Trends. A peer review
is the process by which scientists assess the work of their colleagues that has been submitted for publication in the
scientific literature. A software review is also a form of peer review, by the co-workers.
A consumer review refers to a review written by the owner of a product or the user of a service who has sufficient
experience to comment on reliability and whether or not the product or service delivers on its promises, otherwise
known as product reviews.. An expert review usually refers to a review written by someone who has tested several
peer products or services to identify which offers the best value for money or the best set of features. A bought
review is the system where the creator (usually a company) of a new product pays a reviewer to review his new
Book review
A book review (or book report) is a form of literary criticism in which a book is analyzed based on content, style,
and merit. It is often carried out in periodicals, as school work, or online. Its length may vary from a single paragraph
to a substantial essay. Such a review often contains evaluations of the book on the basis of personal taste. Reviewers,
in literary periodicals, often use the occasion of a book review for a display of learning or to promulgate their own
ideas on the topic of a fiction or non-fiction work. At the other end of the spectrum, some book reviews resemble
simple plot summaries.
Music reviews
Performance reviews
Reviews of live music performances are typically short articles that tell readers about the performers or group(s) that
were involved and the pieces or songs that were performed. The comments made by reviewers fall, roughly into two
categories: technical comments and subjective/artistic comments. The elements in the "technical" category include
rhythmic "togetherness", intonation, errors or slip-ups, and so on. These elements are fairly "black and white"; a
pianist playing a concerto either played the right notes on a climactic scale run, or she missed it. The subjective
comments refer to elements which are a matter of taste. The balance between the different elements in a review
(information about the performer or group; information about the pieces/songs; commentary about the technical and
subjective elements of the performance) depends on the audience that a music critic is writing for. Music reviewers
writing in local newspapers or general-interest magazines may not be able to assume that the readers will be familiar
with music performers and pieces/songs, so they may decide to include a great deal of "background" information.
Recording reviews
Music critics and music writers also review recordings of music, including individual songs or pieces or entire
albums. In the case of a review of an entire album, the reviewer will not only judge the individual songs or pieces;
they will also judge how well all of the songs or pieces work together or go together.
The age of digital downloads may considerably change the album review. Where previously albums were purchased
as collections of songs, often with a common theme, the rise of individual song downloads may have significant
impact on consumers' exposure to an artist's music. Die-hard fans will most likely continue to explore an artist's
complete work; but individuals will most likely make significantly different choices and "cherry-pick" songs they
have been exposed to. The concept of "singles" or individual hits marked for retail has been around for long time;
however the price for a single in the days of CDs or 45's was much closer to the complete album price. When you
consider that each song on an artist's album is often priced at the same amount, the odds of the average consumer
purchase the entire album instead of selecting the "hit" songs decreases significantly.
Composition reviews
In Classical music, music critics may also do reviews of compositions, even if the piece or song has never been
performed and it only exists on manuscript paper in a score. To review a composition in this fashion, the critic will
use music theory skills such as harmonic analysis and thematic analysis, along with their knowledge of idioms and
compositional practices.
Bought review
A bought review is the system where the creator (usually a company) of a new product pays a reviewer to review his
new product. Primarily used in the car, movie and game industry this system creates a kind of undercover
advertising. Bought reviews tend to be biased due to the financial relationship the reviewer has with the makers of
the product or item that is being evaluated—although exceptions occur. Nielsen (2009) proposes a framework for
scholarly or academic reviews of dictionaries, but this framework may also be extended to other types of reviews. In
particular, the requirement that reviews must provide useful information to the intended audience and the
requirements to the informative value of reviews. In some cases, a bought review may be independent, if the person
that is hired to do the review has a strong reputation for independence and integrity. Even if a "bought review" from
a respected critic is actually independent, the perception of potential bias will remain, due to the financial
relationship between the company and the critic.
A similar type of review that may be biased is the so-called "puff piece", a review of a product, film, or event that is
written by a sympathetic reviewer or by an individual who has a connection to the product or event in question,
either in terms of an employment relationship or other links. For example, a major media conglomerate that owns
both print media and record companies may instruct one of its employees in one of its newspapers to do a review of
an album which is being released by the conglomerate's record company. Although some journalists may assert their
professional independence and integrity, and insist on producing an unbiased review, in other cases, writers may
succumb to the pressure and pen a biased "puff piece" which praises the product or event while omitting any
discussion of any shortcomings. In some cases, "puff pieces" purport to provide a review of the product or event, but
instead merely provide "peacock words" ("An amazing recording"); "weasel words" ("probably one of the most
important albums of the 2000s") and tabloid-style filler which is peripheral or irrelevant to assessing the qualities of
the product or event ("During the filming, there were rumours that romantic sparks flew between the two co-leads,
who were often seen talking together on the set").
Nielsen, S. (2009), “Reviewing printed and electronic dictionaries: A theoretical and practical framework”, in S.
Nielsen/S. Tarp (eds.): Lexicography in the 21st Century. Amsterdam/Philadelphia: John Benjamins 2009, 23-41.
Risk assessment
Risk assessment is a step in a risk management procedure. Risk assessment is the determination of quantitative or
qualitative value of risk related to a concrete situation and a recognized threat (also called hazard). Quantitative risk
assessment requires calculations of two components of risk: R, the magnitude of the potential loss L, and the
probability p, that the loss will occur.
In all types of engineering of complex systems sophisticated risk assessments are often made within Safety
engineering and Reliability engineering when it concerns treats to life, environment or machine functioning.
Specially the nuclear, aerospace, oil, rail and military industry has a long history of dealing with risk assessment.
Also medical industries, hospitals and food industries control risks and perform risk assessments on a continues
basis. Methods for assessment of risk may differ between industries and whether it is about general financial
decisions or environmental, ecological, or public health risk assessment.
Risk assessment consists in an objective evaluation of risk in which assumptions and uncertainties are clearly
considered and presented. Part of the difficulty of risk management is that measurement of both of the quantities in
which risk assessment is concerned - potential loss and probability of occurrence - can be very difficult to measure.
The chance of error in the measurement of these two concepts is large. A risk with a large potential loss and a low
probability of occurring is often treated differently from one with a low potential loss and a high likelihood of
occurring. In theory, both are of nearly equal priority in dealing with first, but in practice it can be very difficult to
manage when faced with the scarcity of resources, especially time, in which to conduct the risk management process.
Expressed mathematically,
Risk assessment
Risk assessment from a financial point of view.
Financial decisions, such as insurance,
express loss in terms of dollar amounts.
When risk assessment is used for public
health or environmental decisions, loss can
be quantified in a common metric,such as a
country's currency, or some numerical
measure of a location's quality of life. For
public health and environmental decisions,
loss is simply a verbal description of the
outcome, such as increased cancer incidence
or incidence of birth defects. In that case,
the "risk" is expressed as:
If the risk estimate takes into account
information on the number of individuals
exposed, it is termed a "population risk" and
is in units of expected increased cases per a
time period. If the risk estimate does not
take into account the number of individuals
exposed, it is termed an "individual risk" and is in units of incidence rate per a time period. Population risks are of
more use for cost/benefit analysis; individual risks are of more use for evaluating whether risks to individuals are
Risk assessment in public health
In the context of public health, risk assessment is the process of quantifying the probability of a harmful effect to
individuals or populations from certain human activities. In most countries, the use of specific chemicals, or the
operations of specific facilities (e.g. power plants, manufacturing plants) is not allowed unless it can be shown that
they do not increase the risk of death or illness above a specific threshold. For example, the American Food and
Drug Administration (FDA) regulates food safety through risk assessment.
The FDA required in 1973 that
cancer-causing compounds must not be present in meat at concentrations that would cause a cancer risk greater than
1 in a million lifetimes. The US Environmental Protection Agency provides basic information about environmental
risk assessments for the public via its risk assessment portal.
How the risk is determined
In the estimation of the risks, three or more steps are involved, requiring the inputs of different disciplines:
1. Hazard Identification, aims to determine the qualitative nature of the potential adverse consequences of the
contaminant (chemical, radiation, noise, etc.) and the strength of the evidence it can have that effect. This is done,
for chemical hazards, by drawing from the results of the sciences of toxicology and epidemiology. For other kinds
of hazard, engineering or other disciplines are involved.
2. Dose-Response Analysis, is determining the relationship between dose and the probability or the incidence of
effect (dose-response assessment). The complexity of this step in many contexts derives mainly from the need to
extrapolate results from experimental animals (e.g. mouse, rat) to humans, and/or from high to lower doses. In
addition, the differences between individuals due to genetics or other factors mean that the hazard may be higher
for particular groups, called susceptible populations. An alternative to dose-response estimation is to determine an
effect unlikely to yield observable effects, that is, a no effect concentration. In developing such a dose, to account
for the largely unknown effects of animal to human extrapolations, increased variability in humans, or missing
Risk assessment
data, a prudent approach is often adopted by including safety factors in the estimate of the "safe" dose, typically a
factor of 10 for each unknown step.
3. Exposure Quantification, aims to determine the amount of a contaminant (dose) that individuals and populations
will receive. This is done by examining the results of the discipline of exposure assessment. As different location,
lifestyles and other factors likely influence the amount of contaminant that is received, a range or distribution of
possible values is generated in this step. Particular care is taken to determine the exposure of the susceptible
Finally, the results of the three steps above are then combined to produce an estimate of risk. Because of the different
susceptibilities and exposures, this risk will vary within a population.
Small subpopulations
When risks apply mainly to small subpopulations, there is uncertainty at which point intervention is necessary. What
if a risk is very low for everyone but 0.1% of the population? A difference exists whether this 0.1% is represented by
*all infants younger than X days or *recreational users of a particular product. If the risk is higher for a particular
sub-population because of abnormal exposure rather than susceptibility, there is a potential to consider strategies to
further reduce the exposure of that subgroup. If an identifiable sub-population is more susceptible due to inherent
genetic or other factors, there is a policy choice whether to set policies for protecting the general population that are
protective of such groups (as is currently done for children when data exists, or is done under the Clean Air Act for
populations such as asthmatics) or whether if the group is too small, or the costs to high. Sometimes, a more specific
calculation can be applied whether it is more important to analyze each method specifically the changes of the risk
assessment method in containing all problems that each of us people could replace.
Acceptable risk increase
The idea of not increasing lifetime risk by more than one in a million has become common place in public health
discourse and policy. How consensus settled on this particular figure is unclear. In some respects, this figure has the
characteristics of a mythical number. In another sense, the figure provides a numerical basis for what to consider a
negligible increase in risk. Some current environmental decision making allows some discretion to deem individual
risks potentially "acceptable" if below one in ten thousand increased lifetime risk. Low risk criteria such as these do
provide some protection for the case that individuals may be exposed to multiple chemicals (whether pollutants or
food additives, or other chemicals). But both of these benchmarks are clearly small relative to the typical one in four
lifetime risk of death by cancer (due to all causes combined) in developed countries. On the other hand, adoption of a
zero-risk policy could be motivated by the fact that the 1 in a million policy still would cause the death of hundreds
or thousands of people in a large enough population. In practice however, a true zero-risk is possible only with the
suppression of the risk-causing activity.
More stringent requirements, or even the 1 in a million one, may not be technologically feasible at a given time, or
so expensive as to render the risk-causing activity unsustainable, resulting in the optimal degree of intervention being
a balance between risks vs. benefit. For example, it might well be that the emissions from hospital incinerators result
in a certain number of deaths per year. However, this risk must be balanced against the available alternatives. In
some unusual cases, there are significant public health risks, as well as economic costs, associated with all options.
For example, there are risks associated with no incineration (with the potential risk for spread of infectious diseases)
or even no hospitals. But, often further investigation identifies further options, such as separating noninfectious from
infectious wastes, or air pollution controls on a medical incinerator, that provide a broad range of options of
acceptable risk - though with varying practical implications and varying economic costs. Intelligent thought about a
reasonably full set of options is essential. Thus, it is not unusual for there to be an iterative process between analysis,
consideration of options, and then further analysis.
Risk assessment
Risk assessment in auditing
In auditing, risk assessment is a very crucial stage before accepting an audit engagement. According to ISA315
Understanding the Entity and its Environment and Assessing the Risks of Material Misstatement, "the auditor should
perform risk assessment procedures to obtain an understanding of the entity and its environment, including its
internal control."<evidence relating to the auditor’s risk assessment of a material misstatement in the client’s
financial statements. Then, auditor obtains initial evidence regarding the classes of transactions at the client and the
operating effectiveness of the client’s internal controls.In auditing, audit risk includes inherent risk, control risk and
detection risk.
Risk assessment and human health
There are many resources that provide health risk information. The National Library of Medicine provides risk
assessment and regulation information tools for a varied audience.
These include TOXNET (databases on
hazardous chemicals, environmental health, and toxic releases),
the Household Products Database (potential health
effects of chemicals in over 10,000 common household products),
and TOXMAP (maps of US Environmental
Agency Superfund and Toxics Release Inventory data). The United States Environmental Protection Agency
provides basic information about environmental risk assessments for the public.
Risk assessment in information security
IT risk assessment can be performed by a qualitative or quantitative approach, following different methodologies.
Risk assessment in project management
In project management, risk assessment is an integral part of the risk management plan, studying the probability, the
impact, and the effect of every known risk on the project, as well as the corrective action to take should that risk
Risk assessment for megaprojects
Megaprojects (sometimes also called "major programs") are extremely large-scale investment projects, typically
costing more than US$1 billion per project. Megaprojects include bridges, tunnels, highways, railways, airports,
seaports, power plants, dams, wastewater projects, coastal flood protection, oil and natural gas extraction projects,
public buildings, information technology systems, aerospace projects, and defence systems. Megaprojects have been
shown to be particularly risky in terms of finance, safety, and social and environmental impacts. Risk assessment is
therefore particularly pertinent for megaprojects and special methods and special education have been developed for
such risk assessment.

Quantitative risk assessment
Further information: Quantitative Risk Assessment software
Quantitative risk assessments include a calculation of the single loss expectancy (SLE) of an asset. The single loss
expectancy can be defined as the loss of value to asset based on a single security incident. The team then calculates
the Annualized Rate of Occurrence (ARO) of the threat to the asset. The ARO is an estimate based on the data of
how often a threat would be successful in exploiting a vulnerability. From this information, the Annualized Loss
Expectancy (ALE) can be calculated. The annualized loss expectancy is a calculation of the single loss expectancy
multiplied by the annual rate of occurrence, or how much an organization could estimate to lose from an asset based
on the risks, threats, and vulnerabilities. It then becomes possible from a financial perspective to justify expenditures
to implement countermeasures to protect the asset.
Risk assessment
Risk assessment in software evolution
Further information: ACM A Formal Risk Assessment Model for Software Evolution
Studies have shown that early parts of the system development cycle such as requirements and design specifications
are especially prone to error. This effect is particularly notorious in projects involving multiple stakeholders with
different points of view. Evolutionary software processes offer an iterative approach to requirement engineering to
alleviate the problems of uncertainty, ambiguity and inconsistency inherent in software developments.
Criticisms of quantitative risk assessment
Barry Commoner, Brian Wynne and other critics have expressed concerns that risk assessment tends to be overly
quantitative and reductive. For example, they argue that risk assessments ignore qualitative differences among risks.
Some charge that assessments may drop out important non-quantifiable or inaccessible information, such as
variations among the classes of people exposed to hazards. Furthermore, Commoner and O'Brien claim that
quantitative approaches divert attention from precautionary or preventative measures.
Others, like Nassim
Nicholas Taleb consider risk managers little more than "blind users" of statistical tools and methods.
[1] Merrill, Richard A. "Food Safety Regulation: Reforming the Delaney Clause" in Annual Review of Public Health, 1997, 18:313-40. This
source includes a useful historical survey of prior food safety regulation.
[2] EPA.gov (http:// www.epa. gov/ risk/ )
[3] SIS.nlm.nih.gov (http:// sis. nlm. nih. gov/ enviro/riskinformation.html)
[4] Toxnet.nlm.nih.gov (http:/ / toxnet. nlm. nih. gov)
[5] HPD.nlm.nih.gov (http:// hpd. nlm. nih. gov/ )
[6] EPA.gov (http:// www.epa. gov/ risk/ )
[7] Managing Project Risks (http:// www. pmhut. com/ managing-project-risks) - Retrieved May 20th, 2010
[8] Bent Flyvbjerg, Nils Bruzelius, and Werner Rothengatter, 2003, Megaprojects and Risk: An Anatomy of Ambition (Cambridge University
[9] Oxford BT Centre for Major Programme Management
[10] Commoner, Barry. O'Brien, Mary. Shrader-Frechette and Westra 1997.
[11] The fourth quadrant: a map of the limits of statistics [9.15.08] Nassim Nicholas Taleb An Edge Original Essay
General references
• Committee on Risk Assessment of Hazardous Air Pollutants, Board on Environmental Studies and Toxicology,
Commission on Life Sciences, National Research Council (1994), Science and judgment in risk assessment (http:/
/ books. google. com/ books?id=k9mKUyfHakcC& printsec=frontcover&dq=Science+and+judgment+ in+
risk+ assessment& hl=en&ei=VzagTJK4HIyuvgPZw4SADQ&sa=X& oi=book_result&ct=result& resnum=1&
ved=0CC8Q6AEwAA#v=onepage& q& f=false), Washington, D.C: National Academy Press,
ISBN 0-309-04894-X, retrieved 27 September 2010
• Barry Commoner. “Comparing apples to oranges: Risk of cost/benefit analysis” from Contemporary moral
controversies in technology, A. P. Iannone, ed., pp. 64–65.
• Flyvbjerg, Bent, "From Nobel Prize to Project Management: Getting Risks Right." Project Management Journal,
vol. 37, no. 3, August 2006, pp. 5-15. (http:// flyvbjerg.plan. aau.dk/ Publications2006/ Nobel-PMJ2006. pdf)
• Hallenbeck, William H. Quantitative risk assessment for environmental and occupational health. Chelsea, Mich.:
Lewis Publishers, 1986
• Harremoës, Poul, ed. Late lessons from early warnings: the precautionary principle 1896–2000.
• John M. Lachin. Biostatistical methods: the assessment of relative risks.
Risk assessment
• Lerche, Ian; Glaesser, Walter (2006), Environmental risk assessment : quantitative measures, anthropogenic
influences, human impact. (http:/ / books. google.com/ books?id=qB54qgpA_fEC& printsec=frontcover&
dq=Environmental+risk+assessment& hl=en& ei=9jSgTO-OK5CKvgPb55SzDQ&sa=X& oi=book_result&
ct=result& resnum=1&ved=0CDUQ6AEwAA#v=onepage&q& f=false), Berlin: Springer, ISBN 3-540-26249-0,
retrieved 27 September 2010
• Kluger, Jeffrey (November 26, 2006), "How Americans Are Living Dangerously" (http://www. time. com/ time/
magazine/ article/0,9171,1562978,00. html), Time, retrieved 27 September 2010  Also published as December 4
cover title: "Why We Worry About the Wrong Things: The Psychology of Risk" (http:// www.time. com/ time/
magazine/ 0,9263,7601061204,00. html)
• Library of Congress. Congressional Research Service. & United States. Congress. House. Committee on Science
and Technology. Subcommittee on Science, Research, and Technology (1983), A Review of risk assessment
methodologies, Washington: U.S: report / prepared by the Congressional Research Service, Library of Congress
for the Subcommittee on Science, Research, and Technology; transmitted to the Committee on Science and
Technology, U.S. House of Representatives, Ninety-eighth Congress, first session
• Deborah G. Mayo. “Sociological versus metascientific views of technological risk assessment” in
Shrader-Frechette and Westra.
• Nyholm, J, 2009 " Persistency, bioaccumulation and toxicity assessment of selected brominated flame retardants
(http:// umu. diva-portal.org/smash/ get/ diva2:216812/ FULLTEXT01)"
• O’Brien, Mary (2002), Making better environmental decisions: an alternative to risk assessment (http:/ / books.
google.com/ books?id=LtCOEN9HWIcC& printsec=frontcover&dq=Making+ better+environmental&hl=en&
ei=kzmgTL_UJY_0vQPxtLSADQ& sa=X& oi=book_result& ct=result&resnum=1&
ved=0CCwQ6AEwAA#v=onepage& q& f=false), Cambridge, Massachusetts: MIT Press, ISBN 0-262-15051-4,
retrieved 27 September 2010  Paperback ISBN 0-262-65053-3
• Shrader-Frechette, Kristin; Westra, Laura, eds. (1997), Technology and values (http:// books. google. com/
books?id=y5BfvU6uMQMC& printsec=frontcover&dq=Technology+and+ values& hl=en&
ei=qjKgTPmwL4yiuQPZrez_DA&sa=X& oi=book_result&ct=result& resnum=1&
ved=0CC0Q6AEwAA#v=onepage& q& f=false), Lanham, Maryland: Rowman & Littlefield,
ISBN 0-8476-8631-0, retrieved 27 September 2010
External links
• Risk Assessment Worksheet and Management Plan (http:/ / www.pmhut. com/ wp-content/uploads/ 2008/ 01/
risk_management.pdf) A comprehensive guide to risk assessment in project management, includes template - By
John Filicetti
Risk Matrix
Risk Matrix
A Risk Matrix is used in the Risk Assessment process; it allows the severity of the risk of an event occurring to be
A risk is the total of each of the hazards that contribute to it. The risk of any particular hazard, H, can be defined as
its probability, p, multiplied by its consequence, c. In layman's terms: how likely it is to happen and how bad it
would be if it happened.
Therefore the total risk, R, of an event, e, is the sum of the n potential hazards that would result in that event:
The Consequences can be defined as:
• Catastrophic - Multiple Deaths
• Critical - One Death or Multiple Severe Injuries
• Marginal - One Severe Injury or Multiple Minor Injuries
• Negligible - One Minor Injury
The Probability is identified as 'Certain', 'Likely', 'Possible', 'Unlikely' and 'Rare'. However it must be considered that
very low probabilities may not be very reliable.
An example Risk Matrix would be as follows:
Negligible Marginal Critical Catastrophic
Certain High High Extreme Extreme
Likely Moderate High High Extreme
Possible Low Moderate High Extreme
Unlikely Low Low Moderate Extreme
Rare Low Low Moderate High
The company or organization then would calculate what levels of Risk they can take with different events. This
would be done by weighing up the risk of an event occurring against the cost to implement safety and the benefit
gained from it.
An Example
The risk of being crushed may be made up of the hazard of being crushed by a car hitting you, the hazard of having a
piano dropped on you and the hazard of being in the path of a stampede. Each hazard has probability and a
consequence. In this example the probability of being hit by a car is much greater than that of being hit by a piano or
a stampede. However the consequence of being hit by a car is less than that of finding yourself under a piano.
These are shown in the table below with a few others:
Risk Matrix
Negligible Marginal Critical Catastrophic
Certain Busy Street
Likely Limo
Possible Piano
Unlikely Stampede
Rare Locust Swarm Burst Dam
Problems with Risk Matrix
In his article 'What's Wrong with Risk Matrices?'
, Tony Cox argues that risk matrices experience several
problematic mathematical features making it harder to assess risks. These are:
• Poor Resolution. Typical risk matrices can correctly and unambiguously compare only a small fraction (e.g., less
than 10%) of randomly selected pairs of hazards. They can assign identical ratings to quantitatively very different
risks (“range compression”).
• Errors. Risk matrices can mistakenly assign higher qualitative ratings to quantitatively smaller risks. For risks
with negatively correlated frequencies and severities, they can be “worse than useless,” leading to
worse-than-random decisions.
• Suboptimal Resource Allocation. Effective allocation of resources to risk-reducing countermeasures cannot be
based on the categories provided by risk matrices.
• Ambiguous Inputs and Outputs. Categorizations of severity cannot be made objectively for uncertain
consequences. Inputs to risk matrices (e.g., frequency and severity categorizations) and resulting outputs (i.e., risk
ratings) require subjective interpretation, and different users may obtain opposite ratings of the same quantitative
risks. These limitations suggest that risk matrices should be used with caution, and only with careful explanations
of embedded judgments.
[1] Cox, L.A. Jr., 'What's Wrong with Risk Matrices?', Risk Analysis, Vol. 28, No. 2, 2008, DOI: 10.1111/j.1539-6924.2008.01030.x
Scale of one to ten
Scale of one to ten
A scale of one to ten or scale from one to ten is a general and largely vernacular concept used for rating things,
people, places, ideas, and so on. It is the naturally most popular choice of scale used in ordinary speech, as well as
scales of one to five and then one to four. Scales to four or five are more likely to be represented in use by
conceptual (or pictorial) "stars", especially when used to rate books, films, music albums, concerts, etc., and
especially by the media. These can also include scores in between integers to give a more precise rating. Scales from
one to five are commonly used to rate hotels, and in this use it would be rare not to refer to them as "stars". From this
we derive the idiomatic adjective "five-star", meaning "first-rate". It seems that the choices of scales to five and to
ten are influenced by the number of fingers on a hand, and the usual use of ten as a numerical base, which in turn is
derived from the number of fingers on two hands.
Significance of 1
The lower end of any scale of this kind is normally represented by 1 and is usually held to mean "awful" in some
sense of the word. This vaguely implies that 5 represents neutrality, an average state, or some degree of indifference.
Depending on what it is being rated, it could be all of these, it could be only one, or conceivably the scale could be
distorted so that 5 had very little specific meaning. Furthermore, the middle of the scale is usually conceived in this
mould, because the scale is most importantly set by its extremities—the concept of the one-to-ten scale being used as
might any other word or expression as a tool of the language. That is, the fact that a scale of one to ten is used so
much is what makes it desirable to re-use again and again, because it is the very function of language to provide us
with signs and symbols that can be used to mean something, without having to explain what that meaning is every
time we do so.
It is not very interesting to note that people do use 1 as the lower extremity of indeed any scale of this kind, and not
0, which could be easily used in its stead, and would be equally intuitive. However, it is common for people to
extend the scale in any case, in the same way that the Burj al-Arab hotel in Dubai is commonly described as the
world's only "seven-star" hotel.
Significance of 10
The upper end of the scale from one to ten is normally represented by 10 and is usually held to mean "excellent" or
"perfection". This is because 10 (out of a possible 10) implies that there is no fault at all with what might merit such
a mark.
Contextual usage
The numbers one through ten are colloquially used as nouns, as in the example: "The performance was at best a two"
or "What a beauty—she's a ten!" The extreme ends of 1 and 10 are used the most in this way. One can also say, for
example, "I'd give that salesman a five", meaning a mediocre rating.
Scale of one to ten
External links
• Onetoten.org- a site where users rate and post reviews on a scale of one to ten
[1] http:/ / www. onetoten. org
SDET is a benchmark used in the systems software research community for measuring the throughput of a
multi-user computer operating system.
Its name stands for SPEC Software Development Environment Throughput (SDET), and is packaged along with
Kenbus in the SPEC SDM91 benchmark.
A more modern benchmark that is related to SDET is the reaim package, which is itself an up-to-date
implementation of the venerable AIM Multiuser Benchmark benchmark.
Sources and external links
• SDM91
• Perspectives on the SPEC SDET Benchmark
[1] http:/ / www. spec. org/sdm91/
[2] http:// www. spec. org/sdm91/ sdet/ SDETPerspectives. html
Self-evaluation motives
Self-evaluation motives
The self-enhancement motive states that people engage in self-evaluation in view of, not only improving the
positivity of their self-conceptions, but also protecting the self from negative information (they search for positivity
and avoid negativity)
In order to do this, people process information important to the self in a selective manner (for instance, by focusing
on information that has favourable implications to the self and discarding information with unfavourable
implications to the self). People also choose to compare themselves socially to others so as to be placed in a
favourable position.
By doing this, people seek to boost the positivity] of the self or decrease its negativity, aiming
to make others see them as socially desirable, hence increasing their levels of self-esteem.
The self-assessment motive is based on the assumption that people want to have an accurate and objective evaluation
of the self.
To achieve this goal, they work so as to reduce any uncertainty about their abilities or personality
Feedback is sought to increase the accuracy and objectivity of previously formed self-conceptions. This is
regardless of whether the new information confirms or challenges the previously existing self-conceptions.
The self-verification motive asserts that what motivates people to engage in the self-evaluation process is the desire
to verify their pre-existing self-conceptions,
maintaining consistency between their previously formed
self-conceptions and any new information that could be important to the self (feedback)
By doing this, people get
the sense of control and predictability in the social world.

The self-enhancement motive states that people want to see themselves favourably. It follows that people should
choose tasks with a positive valence, regardless of task diagnosticity (this motive is more active in presence of tasks
high in diagnosticity of success than in presence of tasks high in diagnosticity of failure).
Tasks that disclosure a
failure and negative feedback are considered less important than tasks with an outcome of success or positive
feedback. As a result, the former are processed faster and more thoroughly, and remembered better than the latter.
Each motive originated a different type of reaction (cognitive, affective or behavioural). The self-enhancement
motive creates both affective and cognitive responses. Affective responses result in negative feedback leading to less
positive affect then positive affect. This is moderated by trait modifiability, in the sense that we can find the former
event to be especially true for unmodifiable traits. On the other hand, cognitive responses lead to favourable
feedback being judged as more accurate, but only in the case of modifiable traits.
Self-evaluation motives
The self-assessment motive postulates that people want to have an accurate view of their abilities and personality
traits. Hence, when evaluating the self people tend to preferably choose tasks that are high in diagnosticity (people
want to find out about their uncertain self-conceptions). This is found even when the diagnosis leads to a disclosure
of failure (i.e., regardless of task valence).
The responses generated by the self-assessment motive are behavioural responses, which becomes evident by the
fact that people choose to receive feedback on their performance (they prefer tasks for which feedback is available,
as opposed to tasks with unavailable feedback). This pattern is emphasized when the trait is considered to be
The self-verification motive asserts that people want verify their previously existing beliefs about the self. No
preference regarding the task valence is apparent. Regarding task diagnosticity, people seek knowledge about their
certain self-conceptions to a greater extent than they do for their uncertain self-conceptions.
Cognitive responses guide the self-verification motive partially depending on their previously formed self-concept.
That is, when a certain trait is present, positive feedback regarding this trait is judged to be more accurate than
unfavourable feedback; but when in the presence of the alternative trait, there isn’t any difference in the judgement of
the feedback accuracy. However, this pattern is conditional on perceived trait modifiability.
The self-verification motive resulted in cognitive responses to traits considered to be unmodifiable, but not to traits
considered modifiable. In the former, positive feedback is considered more accurate than negative feedback, when in
the presence of the trait. On the other hand, negative feedback is viewed as more accurate than positive feedback in
the presence of the alternative trait.
[1] Dauenbeimer, D. G., Stablberg, D., Spreemann, S., and Sedikides, C. (2002). Self-enhancement, self-verification, or self-assessment: the
intricate role of trait modifiability in the self-evaluation process. Revue internationale de psychologie sociale, 15, (3-4), 89-112.
[2] Sedikides, C. and Strube, M. J. (1995). The multiply motivated self. Personality and Social Psychology Bulletin, 21, 1330–1335.
[3] Sedikides, C; Strube, M (1997). Self-Evaluation: To Thine Own Self Be Good, To Thine Own Self Be Sure, To Thine Own Self Be True, and
To Thine Own Self be Better. 29. pp. 209–269. doi:10.1016/S0065-2601(08)60018-0. ISSN 00652601.
[4] Sedikides, C. (1993). Assessment, enhancement, and verification determinants of the self-evaluation process. Journal of Personality and
Social Psychology, 65, (2), 327–338.
[5] In some of the literature, other motives appeared, namely the self-improvement motive, but they are not mentioned in this articles due to a
lack of consensus about their existence.
[6] Baumeister, R. F. (ed.). (1999). The self in social psychology. Philadelphia: Psychology Press.
Shifting baseline
Shifting baseline
Shifting baseline (also known as sliding baseline) is a term used to describe the way significant changes to a system
are measured against previous baselines, which themselves may represent significant changes from the original state
of the system.
The term was first used by the fisheries scientist Daniel Pauly in his paper "Anecdotes and the shifting baseline
syndrome of fisheries".
Pauly developed the term in reference to fisheries management where fisheries scientists
sometimes fail to identify the correct "baseline" population size (e.g. how abundant a fish species population was
before human exploitation) and thus work with a shifted baseline. He describes the way that radically depleted
fisheries were evaluated by experts who used the state of the fishery at the start of their careers as the baseline, rather
than the fishery in its untouched state. Areas that swarmed with a particular species hundreds of years ago, may have
experienced long term decline, but it is the level of decades previously that is considered the appropriate reference
point for current populations. In this way large declines in ecosystems or species over long periods of time were, and
are, masked. There is a loss of perception of change that occurs when each generation redefines what is "natural".
Most modern fisheries stock assessments do not ignore historical fishing and account for it by either including the
historical catch or use other techniques to reconstruct the depletion level of the population at the start of the period
for which adequate data is available. Anecdotes about historical populations levels can be highly unreliable and
result in severe mismanagement of the fishery.
The concept was further refined and applied to the ecology of kelp forests by Paul Dayton and others from the
Scripps Institution of Oceanography. They used a slightly different version of the term in their paper, "Sliding
baselines, ghosts, and reduced expectations in kelp forest communities".
The term has become widely used to describe the shift over time in the expectation of what a healthy ecosystem
baseline looks like.
Broadened definition
In 2002, filmmaker and former marine biologist Randy Olson broadened the definition of shifting baselines with an
op-ed in the Los Angeles Times. He explained the relevance of the term to all aspects of change, and the failure to
notice change in the world today. He and coral reef ecologist Jeremy Jackson (of Scripps Institution of
Oceanography) co-founded The Shifting Baselines Ocean Media Project [3] in 2003 to help promote a wider
understanding and use of the term in discussions of general conservation.
A conceptual metaphor for a shifting baseline is the price of coffee. A cup of coffee may have only cost a $0.05 in
the 1950's, but in the 1980's the cost shifted to $1.00 (ignoring inflation). The current (21st century) coffee prices are
based on the 1980s model, rather than the 1950s model. The point of reference moved.
The Shifting Baselines Ocean Media Project grew from its three founding partners (Scripps Institution of
Oceanography, The Ocean Conservancy, and Surfrider Foundation) to over twenty conservation groups and science
organizations. The project has produced dozens of short films, public service announcements, and Flash videos along
with photography, video, and stand-up comedy contests, all intended to promote the term to a broader audience. The
Shifting Baselines Blog, "the cure for planetary amnesia"[4] is run by the Shifting Baselines Ocean Media Project on
the Seed (magazine) Science Blogs.
Shifting baseline
[1] Pauly (1995)
[2] Dayton (1998)
[3] http:/ / www. shiftingbaselines. org
[4] http:// www. scienceblogs. com/ shiftingbaselines
• Dayton PK, Tegner MJ, Edwards PB and Riser KL (1998) "Sliding baselines, ghosts, and reduced expectations in
kelp forest communities." (http:/ / www. esajournals. org/doi/ abs/ 10.1890/
1051-0761(1998)008[0309:SBGARE]2.0. CO;2) Ecological Applications, 8(2):309-322.
• Papworth SK, Rist J, Coad L and Milner-Gulland EJ (2008) "Evidence for shifting baseline syndrome in
conservation" (http:/ / www3. interscience. wiley. com/ journal/122200416/ abstract?CRETRY=1& SRETRY=0)
Conservation Letters, 2(2):93-100.
• Pauly, Daniel (1995) "Anecdotes and the shifting baseline syndrome of fisheries." (http:/ / www.sciencedirect.
com/ science?_ob=ArticleURL& _udi=B6VJ1-40W0T2R-7Y&_user=10& _rdoc=1&_fmt=&_orig=search&
_sort=d& view=c&_acct=C000050221& _version=1& _urlVersion=0&_userid=10&
md5=313bda126ec2c8cba56b51a0c83d6e6d) Trends in Ecology and Evolution, 10(10):430.
• Pauly, Daniel (2001) "Importance of historical dimension policy management in natural resource systems." (http:/
/ www. fisheries. ubc. ca/ members/ dpauly/ chaptersInBooksReports/ 2001/
ImportanceHistoricalDimensionPolicyMngtNaturalResourceSystems. pdf) ACP-EU Fisheries: Research Report
No 8.
External links
• http:/ / www. shiftingbaselines. org
• http:// www. scienceblogs. com/ shiftingbaselines
• Shifting baseline (http:// www. conservationinstitute. org/ocean_change/ Fisheries/ shiftingbaselines. htm) -
Conservation Science Institute
• Shifting Baselines (http:// seaaroundus. org/magazines/ 2006/ ShiftingBaselines_Canright. pdf) by Anne
Canright, California Coast & Ocean, Volume 22, No.3, 2006.
• Puget Sound Partnership (http:// www. psp. wa.gov/ shiftingbaselines. php) _ A 10 minute clip of the effect of
shifting baseline on the health of the Puget Sound.
• Proving the ‘shifting baselines’ theory: how humans consistently misperceive nature (http:// news. mongabay.
com/2009/ 0623-hance_shiftingbaselines. html) Mongabay.com, June 24, 2009.
SPECpower_ssj2008 is the first industry-standard benchmark that evaluates the power and performance
characteristics of volume server class computers. It is available from the Standard Performance Evaluation
Corporation (SPEC). SPECpower_ssj2008 is SPEC's first attempt at defining server power measurement
It was introduced in December, 2007.
Several SPEC member companies contributed to the development of the new power-performance measurement
standard, including AMD, Dell, Fujitsu Siemens Computers, HP, Intel, IBM, and Sun Microsystems.
[1] Official SPECpower website (http:// www.spec. org/power_ssj2008/ )
[2] SPEC Press Release (http:// www. spec. org/power_ssj2008/ press/ release. html)
External links
1. Official SPEC website (http:// www. spec. org/ )
Standard Performance Evaluation Corporation
Standard Performance Evaluation Corporation
Standard Performance Evaluation Corporation
Formation 1988
Type Not-for-profit
Headquarters Gainesville, Virginia
Membership Hardware & Software Vendors, Universities, Research Centers
Staff 5
Website http:/ / www.spec.org
The Standard Performance Evaluation Corporation (SPEC) is a non-profit organization that aims to "produce,
establish, maintain and endorse a standardized set" of performance benchmarks for computers.
SPEC was founded in 1988.

SPEC benchmarks are widely used to evaluate the performance of computer
systems; the test results are published on the SPEC website. Results are sometimes informally referred to as
"SPECmarks" or just "SPEC".
Membership allows:
• Participation in benchmark development
• Participation in review of results
• Benchmark licenses
The list of members is available on SPEC's membership page
• Sustaining Membership requires dues payment and typically includes hardware or software companies (e.g. Dell,
HP, IBM, Oracle, Red Hat).
• SPEC "Associates" pay a reduced fee and typically include Universities.
• SPEC "Supporting Contributors" are invited to participate in development of a single benchmark, and do not pay
The benchmarks aim to test "real-life" situations. There are several benchmarks testing Java scenarios, from simple
computation (SPECjbb) to a full system with Java EE, database, disk, and network (SPECjEnterprise). The
SPECweb benchmarks test web server performance by performing various types of parallel HTTP requests.
The SPEC CPU suites test CPU performance by measuring the run time of several programs such as the compiler
gcc, the chemistry program gamess, and the weather program WRF. The various tasks are equally weighted; no
attempt is made to weight them based on their perceived importance. An overall score is based on a geometric mean.
Standard Performance Evaluation Corporation
SPEC benchmarks are written in a platform neutral programming language (usually C, Java or Fortran), and the
interested parties may compile the code using whatever compiler they prefer for their platform, but may not change
the code. Manufacturers have been known to optimize their compilers to improve performance of the various SPEC
benchmarks. SPEC has rules that attempt to limit such optimizations.
In order to use a benchmark, a license has to be purchased from SPEC; the costs vary from test to test with a typical
range from several hundred to several thousand dollars. This pay-for-license model might seem to be in violation of
the GPL as the benchmarks include software such as GCC that is licensed by the GPL. However, the GPL does not
require software to be distributed for free, only that recipients be allowed to redistribute any GPLed software that
they receive; the license agreement for SPEC specifically exempts items that are under "licenses that require free
distribution", and the files themselves are placed in a separate part of the overall software package.
• SPEC CPU2006, combined performance of CPU, memory and compiler
• CINT2006 ("SPECint"), testing integer arithmetic, with programs such as compilers, interpreters, word
processors, chess programs etc.
• CFP2006 ("SPECfp"), testing floating point performance, with physical simulations, 3D graphics, image
processing, computational chemistry etc.
• SPECjms2007, Java Message Service performance
• SPECweb2005, PHP and/or JSP performance.
• SPECviewperf, performance of an OpenGL 3D graphics system, tested with various rendering tasks from real
• SPECapc, performance of several 3D-intensive popular applications on a given system
• SPEC OMP2001 V3.2, for evaluating performance of parallel systems using OpenMP (http:// www.openmp.
org) applications.
• SPEC MPI2007, for evaluating performance of parallel systems using MPI (Message Passing Interface)
• SPECjvm2008, measuring basic Java performance of a Java Runtime Environment on a wide variety of both
client and server systems.
• SPECjEnterprise2010, a multi-tier benchmark for measuring the performance of Java 2 Enterprise Edition (J2EE)
technology-based application servers.
• SPECjbb2005, evaluates the performance of server side Java by emulating a three-tier client/server system (with
emphasis on the middle tier).
• SPEC MAIL2001, performance of a mail server, testing SMTP and POP protocols
• SPECpower_ssj2008, evaluates the energy efficiency of server systems.
• SPECsfs2008, File server throughput and response time supporting both NFS and CIFS protocol access
• SPECvirt_sc2010 ("SPECvirt"), evaluates the performance of datacenter servers used in virtualized server
consolidation environments
Standard Performance Evaluation Corporation
• SOA: according to SPEC's web site in late 2010, a subcommittee is investigating benchmarks for Service
Oriented Architecture (SOA).
• SPEC CPU2000
• SPEC HPC2002 (no longer available)
• SPECjAppServer2001
• SPECjAppServer2002
• SPECjAppServer2004
• SPECjbb2000
• SPECweb96
• SPECweb99
• SPECweb99_SSL
SPEC attempts to create an environment where arguments are settled by appeal to notions of technical credibility,
representativeness, or the "level playing field". SPEC representatives are typically engineers with expertise in the
areas being benchmarked. Benchmarks include "run rules", which describe the conditions of measurement and
documentation requirements. Results that are published on SPEC's website undergo a peer review by members'
performance engineers.
[1] "SPEC Frequently Asked Questions" (http:/ / www.spec. org/ spec/ faq/ #01SPEC.General.10whatis). . Retrieved 15 March 2010.
[2] "The SPEC Organization" (http:// www.spec. org/spec/ ). . Retrieved 15 March 2010.
[3] "SPEC Membership" (http:// www. spec. org/ spec/ membership. html). . Retrieved 15 March 2010.
[4] http:/ / www. spec. org/spec/ membership. html
• Kant, Krishna (1992). Introduction to Computer System Performance Evaluation. New York: McGraw-Hill Inc..
pp. 16–17. ISBN 0070335869.
External links
• Official website (http:// www. spec. org)
• Official List of SPEC Benchmarks (http:// www.spec. org/benchmarks. html)
Summative assessment
Summative assessment
Summative assessment (or Summative evaluation) refers to the assessment of the learning and summarizes the
development of learners at a particular time. After a period of work, e.g. a unit for two weeks, the learner sits for a
test and then the teacher marks the test and assigns a score. The test aims to summarize learning up to that point. The
test may also be used for diagnostic assessment to identify any weaknesses and then build on that using formative
Summative assessment is commonly used to refer to assessment of educational faculty by their respective supervisor.
It is imposed onto the faculty member, and uniformly applied, with the object of measuring all teachers on the same
criteria to determine the level of their performance. It is meant to meet the school or district's needs for teacher
accountability and looks to provide remediation for sub-standard performance and also provides grounds for
dismissal if necessary. The evaluation usually takes the shape of a form, and consists of check lists and occasionally
narratives. Areas evaluated include classroom climate, instruction, professionalism, and planning and preparation.
Summative assessment is characterized as assessment of learning and is contrasted with formative assessment, which
is assessment for learning.
It provides information on the product's efficacy (its ability to do what it was designed to do). For example, did the
learners learn what they were supposed to learn after using the instructional module. In a sense, it does not bother to
assess "how they did," but more importantly, by looking at how the learners performed, it provides information as to
whether the product teaches what it is supposed to teach.
1. It is the procedure of assess or grade educators' level of learning in certain period of time.
2. It tends to use well defined evaluation designs. [i.e. fixed time and content]
3. It provides descriptive analysis. [i.e. in order to give a grade, all the activities done throughout the year are taken
into account].
4. It tends to stress local effects.
5. It is unoppressive and not reactive as far as possible.
6. It is positive, tending to stress what students can do rather than what they cannot.
[1] Glickman, C.D., Gordon, S.P., & Ross-Gordon, J.M. (2009).Supervision and instructional leadership: a developmental approach Allyn and
Bacon, Boston, MA.
External links
• Summative Assessment tool: ClassComm (http:/ / www.geneeworld. com)
The Sunday Times 100 Best Companies to Work For
The Sunday Times 100 Best Companies to Work
Since 2001, the Sunday Times newspaper has published annual lists of the best companies to work for in the UK.
The award is highly valued by its winners.
The list ranks Britain's best companies to work for based a number of criteria. It is mainly based on surveys of staff
[1] For example, Don't call me boss (http:// www.guardian. co. uk/ money/ 2005/ jun/ 25/ workandcareers.jobsandmoney), The Guardian, 25
June 2005
[2] How to get a stress-free workplace (http:/ / news. bbc. co.uk/ 1/ hi/ uk/ 2996224.stm), BBC News, 17 June 2003
External links
• Times on line: Best 100 Companies to work for (http:/ / www.timesonline. co. uk/ tol/ life_and_style/
career_and_jobs/best_100_companies/ best_100_tables/ ) – current year
• Best Companies (http:/ / www. bestcompanies. co. uk/ / list_intro. aspx) – the organisation who runs the survey
and lists for 2001 to date
Teaching And Learning International Survey
The Teaching And Learning International Survey (TALIS) is a worldwide evaluation on the conditions of
teaching and learning, performed first in 2008. It is coordinated by the Organisation for Economic Co-operation and
Development (OECD), with a view to improving educational policies and outcomes.
Further reading
Official websites and reports
• OECD/TALIS website
[1] http:/ / www. oecd. org/TALIS
Technology assessment
Technology assessment
Technology assessment (TA, German Technikfolgenabschätzung, French évaluation des choix scientifiques et
technologiques) is a scientific, interactive, and communicative process that aims to contribute to the formation of
public and political opinion on societal aspects of science and technology.
General description
Note: This section/article is currently under revision as it does not live up to the current state of TA knowledge and
practice. See the related discussion here.
TA is the study and evaluation of new technologies. It is based on the conviction that new developments within, and
discoveries by, the scientific community are relevant for the world at large rather than just for the scientific experts
themselves, and that technological progress can never be free of ethical implications. Also, technology assessment
recognizes the fact that scientists normally are not trained ethicists themselves and accordingly ought to be very
careful when passing ethical judgement on their own, or their colleagues, new findings, projects, or work in progress.
Technology assessment assumes a global perspective and is future-oriented rather than backward-looking or
anti-technological. ("Scientific research and science-based technological innovation is an indispensable prerequisite
of modern life and civilization. There is no alternative. For six or eight billion people there is no way back to a less
sophisticated life style."
). TA considers its task as interdisciplinary approach to solving already existing problems
and preventing potential damage caused by the uncritical application and the commercialization of new technologies.
Therefore any results of technology assessment studies must be published, and particular consideration must be
given to communication with political decision-makers.
An important problem, TA has to deal with it, is the so-called Collingridge dilemma: on the one hand, impacts of
new technologies cannot be easily predicted until the technology is extensively developed and widely used; on the
other hand, control or change of a technology is difficult as soon as it is widely used.
Some of the major fields of TA are: information technology, hydrogen technologies, nuclear technology, molecular
nanotechnology, pharmacology, organ transplants, gene technology, artificial intelligence, the Internet and many
more. Health technology assessment is related, but profoundly different, despite the similarity in the name.
Forms and concepts of technology assessment
The following types of concepts of TA are those that are most visible and practiced. There are, however, a number of
further TA forms that are only proposed as concepts in the literature or are the label used by a particular TA
• Parliamentary TA (PTA): TA activities of various kinds whose addressee is a parliament. PTA may be
performed directly by members of those parliaments (e.g. in France and Finland) or on their behalf by related TA
institutions (such as in the UK, in Germany and Denmark) or by organisations not directly linked to a Parliament
(such as in the Netherlands and Switzerland).
• Expert TA (often also referred to as the classical TA or traditional TA concept): TA activities carried out by (a
team of) TA and technical experts. Input from stakeholders and other actors is included only via written
statements, documents and interviews, but not as in participatory TA.
• Participatory TA (pTA): TA activities which actively, systematically and methodologically involve various
kinds of social actors as assessors and discussants, such as different kinds of civil society organisations,
representatives of the state systems, but characteristically also individual stakeholders and citizens (lay persons),
technical scientists and technical experts. Standard pTA methods include consensus conferences, focus groups,
scenario workshops etc.
Sometimes pTA is further divided into expert-stakeholder pTA and public pTA
(including lay persons).
Technology assessment
• Constructive TA (CTA): This concept of TA, developed in the Netherlands, but also applied and discussed
attempts to broaden the design of new technology through feedback of TA activities into the actual
construction of technology. Contrary to other forms of TA, CTA is not directed toward influencing regulatory
practices by assessing the impacts of technology. Instead, CTA wants to address social issues around technology
by influencing design practices.
• Discursive TA or Argumentative TA: This type of TA wants to deepen the political and normative debate about
science, technology and society. It is inspired by ethics, policy discourse analysis and the sociology of
expectations in science and technology. This mode of TA aims to clarify and bring under public and political
scrutiny the normative assumptions and visions that drive the actors who are socially shaping science and
technology. Accordingly, argumentative TA not only addresses the side effects of technological change, but deals
with both broader impacts of science and technology and the fundamental normative question of why developing
a certain technology is legitimate and desirable.
• Health TA (HTA): A specialised type of expert TA informing policy makers about efficacy, safety and cost
effectiveness issues of pharmaceuticals and medical treatments, see article on Health Technology Assessment.
Technology assessment institutions around the world
Many TA institutions are members of the European Parliamentary Technology Assessment (EPTA] network, some
are working for the STOA panel of the European Parliament and formed the European Technology Assessment
Group (ETAG).
• Centre for Technology Assessment (TA-SWISS), Bern, Switzerland.
• Institute of Technology Assessment (ITA) of the Austrian Academy of Sciences, Vienna
• (former) Office of Technology Assessment (OTA)
• Institute Society and Technology (IST) Brussels
• Norwegian Board of Technology, Oslo
• Parliamentary Office of Science and Technology (POST), London
• Rathenau Institute, The Hague
• Science and Technology Options Assessment (STOA) panel of the European Parliament, Brussels
• Science and Technology Policy Research (SPRU), Sussex
• t.b.c.
External links
• Scientific Technology Options Assessment (STOA), European Parliament
• European Technology Assessment Group for STOA
• Institute for Technology Assessment and Systems Analysis (ITAS), Research Centre Karlsruhe, Germany
• Office of Technology Assessment at the German Parliament (TAB)
• TA-SWISS Centre for Technology Assessment
• Institute of Technology Assessment (ITA), Austrian Academy of Sciences, Vienna, Austria
• The Danish Board of Technology
• Rathenau Institute
• The Norwegian Board of Technology
Technology assessment
[1] Cf. the commonly used definition given in the report of the EU-funded project TAMI (Technology Assessment – Methods and Impacts) in
2004: ta-swiss.ch (http:/ / www. ta-swiss. ch/ ?uid=45)
[2] Hans Mohr: "Technology Assessment in Theory and Practice", Techné: Journal of the Society for Philosophy and Technology, Vol. 4, No. 4
(Summer, 1999).
[3] Among those concepts one finds, for instance, Interactive TA ITAS.fzk.de (http:// www.itas.fzk.de/ deu/ tadn/ tadn298/ rath298a.htm),
Rational TA EA-AW.com (http:// www. ea-aw.com/ the-europaeische-akademie/aimstasks. html), Real-time TA (cp. Guston/Sarewitz
(2002) Real-time technology assessment, in: Technology in Society 24, 93-109), Innovation-oriented TA Innovationsanalysen (http:/ / www.
[4] Those TA institutions that perform PTA are organised in the European Parliamentary Technology Assessment (EPTA) network; see
EPTAnetwork.org (http:// eptanetwork.org).
[5] Cp. the 2000 EUROpTA (European Participatory Technology Assessment – Participatory Methods in Technology Assessment and
Technology Decision-Making) project report TEKNO.dk (http:/ / www. tekno.dk/ pdf/projekter/europta_Report.pdf).
[6] Van Eijndhoven (1997) Technology assessment: Product or process? in: Technological Forecasting and Social Change 54 (1997) 269-286.
[7] Schot/Rip (1997), The Past and Future of Constructive Technology Assessment in: Technological Forecasting & Social Change 54, 251-268.
[8] van Est/Brom (2010) Technology assessment as an analytic and democratic practice, in: Encyclopedia of Applied Ethics.
[9] http:/ / www. europarl.europa.eu/ stoa/ default_en.htm
[10] http:/ / www. itas. fzk.de/ eng/ etag/ etag. htm
[11] http:// www. itas. fzk.de/ home_e. htm
[12] http:/ / www. tab. fzk.de/ home_en. htm
[13] http:/ / www. ta-swiss. ch
[14] http:// www. oeaw. ac. at/ ita/ welcome. htm
[15] http:/ / www. tekno. dk/ subpage. php3?page=statisk/ uk_profile.php3& toppic=aboutus& language=uk
[16] http:/ / www. rathenau. org
[17] http:/ / www. teknologiradet.no/ default1. aspx?m=3
Transferable skills analysis
Transferable skills analysis is a set of tests or logic to determine what positions a person may fill if their previous
position(s) no longer exists in the local job market, or they can no longer perform their last position(s) (e.g., because
of an injury). An informal transferable skills analysis can be performed with the help of a career counselor, career
portfolio or a career planning article or book. Transferable skills are determined by analyzing past accomplishments
or experience. For instance, a stay-at-home parent and homemaker might find they have skills in budgeting, child
development, food services, property management, and so on.
Formal TSA process as per U.S. Department of Labor
The formal transferable skills analysis (TSA) process vocational evaluators use consists of compiling occupations
from the U.S. Department of Labor's Dictionary of Occupational Titles (DOT) to represent a person's work history.
They analyze the work activities (work fields) a person has performed in previous jobs, along with the objects on the
work activities were performed on (materials, products, subject matter, and services, or MPSMS). They use these
data to identify a set of occupations a worker should be able to perform. Assessment results for reasoning, math, and
language skills as well as aptitude test results can be used to increase or decrease vocational options. If the worker
has been injured or otherwise disabled, their residual functional capacities can also be considered by the worker traits
associated with their DOT work history.
Care must be taken to select the DOT occupations that best represent the jobs the client has performed successfully
in past work.
The method most often used to perform skills transfer operations is based upon the federal definition of skills
transferability as shown below. That definition utilizes the technology described in The Revised Handbook for
Analyzing Jobs (HAJ, 1991). The HAJ describes and explains the variables used in TSA.
Transferable skills analysis
Work Fields
Work fields (WF's) are categories of technologies that reflect how work gets done and what gets done as a result of
the work activities-the purpose of the job. DOT occupations may contain one, two, or three work field codes.
Materials, Products, Subject Matter, and Services
MPSMS are the end products upon which the work activities are performed. MPSMS is derived from the Standard
Industrial Classification (SIC) codes, which identify employers by type of business. DOT occupations may contain
one, two, or three MPSMS codes.
Specific Vocational Preparation
Specific Vocational Preparation (SVP) is defined as the amount of time required to learn the duties and acquire the
information needed for a specific occupation. This training may be acquired in a school, work, military, institutional,
or vocational environment.
Worker Traits
Worker Traits required to successfully perform a given job are also utilized in TSA process. These variables include
training time (SVP), general educational development, aptitudes, temperaments, physical demands, environmental
conditions, and relationships to data, people, and things. Job counselors often search for job possibilities that best
reflect a person's work experience, then eliminating those that require capability beyond—or significantly below-the
person's capabilities expressed by worker traits, to determine transferable skills.
TSA software programs
There are several TSA software programs, which may or may not follow the Federal Definition of Transferable
Skills, including the MVQS (McCroskey Vocational Quotient System), Skilltran, and OASYS. Some TSA software
programs such as OASYS Job Match use Worker Traits as secondary skills transfer variables. The work fields,
MPSMS, Standard Vocational Preparation, and Combination Work Field variables from the person's work history
provide the first filter through which all DOT jobs are passed. Then, only after the resulting sub-set of DOT
occupations is placed in a TSA table, the Worker Traits are used as a second filter.
Federal Definition of Transferable Skills Analysis
The Code of Federal Regulations CFR 20-404.1568) definition of skills transfer reads, in part:
(A person is considered) to have skills that can be used in other jobs, when the skilled or semiskilled work activities
(that person) did in past work can be used to meet the requirements of skilled or semi-skilled work activities of other
jobs or kinds of work. This depends largely on the similarity of occupational significant work activities among
different jobs.
The transferability of a person's skills is most probable and meaningful among jobs in which:
The same or a lesser degree of skill is required (Specific Vocational Preparation), and The same or similar tools and
machines are used (work fields), and The same or similar raw materials, products, processes or services are involved
(Materials, Products, Subject Matter, and Services).
The CFR citation is taken from the Social Security Administration's (SSA) regulations. It is useful because it
provides a good operational definition of transferability of skills. In the case of TSA, it merely describes how the
SSA processes TSA related to claims.
Transferable skills analysis
Further reading
• Social Security Disability Advocate's Handbook
, by David Traver, James Publishing, 2008, ISBN
• Selected Characteristics of Occupations
, United States Department of Labor, Germania Publishing, 2008.
• O*Net versus DOT for Transferable Skills Analysis
• [4] Dictionary of Occupational Titles
• Reservist Transferable Skills
[1] http:/ / www. jamespublishing. com/ books/ ssr. htm
[2] http:/ / germaniapublishing. com
[3] http:/ / www. theworksuite. com/ id13. html
[4] http:// www. occupationalinfo.com/ index. html
[5] http:// www. sabre. mod. uk/ Employers/What-Reservists-offer/Transferable-Skills.aspx

Voting is a method for a group such as a meeting or an electorate to make a decision or express an opinion—often
following discussions, debates, or election campaigns. It is often found in democracies and republics. The minimum
age for voting in most countries is 18.
Reasons for voting
In a representative government, voting commonly implies election: a way for an electorate to select among
candidates for office. In politics voting is the method by which the electorate of a democracy appoints
representatives in its government.
A vote is an individual's act of voting, by which he or she expresses support or preference for a certain motion (for
example, a proposed resolution), a certain candidate, a selection of candidates, or a political party. With a secret
ballot to protect voters' political privacy, voting generally takes place at a polling station. The act of voting is
voluntary in some countries; whereas some countries, such as Argentina, Australia, Belgium and Brazil, have
compulsory voting systems.
Types of votes
Different voting systems use different types of vote. Suppose that the options in some election are Alice, Bob,
Charlie, Dan, and Emily and they are all vying for the same position:
In a voting system that uses a single vote, the voter selects his or her most preferred candidate. "Plurality voting
systems" use single votes.
A development on the single vote system is to have two-round elections, or repeat first-past-the-post. However, the
winner must win by 50% plus one, called a simple majority. If subsequent votes must be used, often a candidate, the
one with the fewest votes or anyone who wants to move their support to another candidate, is removed from the
An alternative to the Two-round voting system is the single round Preferential voting system (Also referred to as
Alternative vote or Instant run-off) as used in Australia, Ireland and some states in the USA. Voters rank each
candidate in order of preference (1,2,3 etc.). Votes are distributed to each candidate according to the preferences
allocated. If no single candidate has 50% or more votes then the candidate with the least votes is excluded and their
votes redistributed according to the voters nominated order of preference. The process repeating itself until a
candidate has 50% or more votes. The system is designed to produce the same result as an exhaustive ballot but
using only a single round of voting.
In a voting system that uses a multiple vote, the voter can vote for any subset of the alternatives. So, a voter might
vote for Alice, Bob, and Charlie, rejecting Daniel and Emily. Approval voting uses such multiple votes.
In a voting system that uses a ranked vote, the voter has to rank the alternatives in order of preference. For example,
they might vote for Bob in first place, then Emily, then Alice, then Daniel, and finally Charlie. Preferential voting
systems, such as those famously used in Australia, use a ranked vote.
In a voting system that uses a scored vote (or range vote), the voter gives each alternative a number between one and
ten (the upper and lower bounds may vary). See range voting.
Some "multiple-winner" systems may have a single vote or one vote per elector per available position. In such a case
the elector could vote for Bob and Charlie on a ballot with two votes. These types of systems can use ranked or
unranked voting, and are often used for at-large positions such as on some city councils.
Fair voting
Results may lead at best to confusion, at worst to violence and even civil war, in the case of political rivals. Many
alternatives may fall in the latitude of indifference—they are neither accepted nor rejected. Avoiding the choice that
the most people strongly reject may sometimes be at least as important as choosing the one that they most favor.
There are social choice theory definitions of seemingly reasonable criteria that are a measure of the fairness of
certain aspects of voting, including non-dictatorship, unrestricted domain, non-imposition, Pareto efficiency, and
independence of irrelevant alternatives but Arrow's impossibility theorem states that no voting system can meet all
these standards. Yet 2011, there has been a suggestion for a better and more fair voting system, which was actually
suggested to developed countries, e.g. UK, US, etc.
In South Africa, there is a strong presence of anti-voting campaigns by poor citizens. They make the structural
argument that no political party truly represents them. For instance, this resulted in the "No Land! No House! No
Vote!" Campaign which becomes very prominent each time the country holds elections.

The campaign is
prominent among three of South Africa's largest social movements: the Western Cape Anti-Eviction Campaign,
Abahlali baseMjondolo, and the Landless Peoples Movement. Other social movements in other parts of the world
also have similar campaigns or non-voting preferences. These include the Zapatista Army of National Liberation and
various Anarchist oriented movements.
Voting and information
Modern political science has questioned whether average citizens have sufficient political information to cast
meaningful votes. A series of studies coming out of the University of Michigan in the 1950s and 1960s argued that
voters lack a basic understanding of current issues, the liberal–conservative ideological dimension, and the relative
idealogical dilemma.
Religious view
Jehovah's Witnesses, Old Order Amish, Christadelphians, Rastafarians and some other religious groups share a
religious tradition of not participating in politics through voting.
[1] "The 'No Land, No House, No Vote' campaign still on for 2009" (http:// libcom.org/library/
the-no-land-no-house-no-vote-campaign-still-2009). Abahlali baseMjondolo. 5 May 2005. .
[2] "IndyMedia Presents: No Land! No House! No Vote!" (http:/ / antieviction. org.za/ 2005/ 12/ 12/
indymedia-presents-no-land-no-house-no-vote/ ). Anti-Eviction Campaign. 2005-12-12. .
[3] Cambridge: Cambridge University Press. ( Summary (http:// wikisum. com/ w/Lupia_and_McCubbins:_The_Democratic_Dilemma))
[4] Leibenluft, Jacob (2008-06-28). "Why Don't Jehovah's Witnesses Vote? Because they're representatives of God's heavenly kingdom." (http://
www.slate. com/ id/ 2194321/ ). Slate. .
External links
• A history of voting in the United States (http:// americanhistory. si.edu/ vote/ ) from the Smithsonian Institution.
• A New Nation Votes: American Elections Returns 1787-1825 (http:/ / dca. tufts. edu/ features/ aas)
• Can I Vote? (http:/ / www. canivote. org/ )—a nonpartisan US resource for registering to vote and finding your
polling place from the National Association of Secretaries of State (http:// www.nass. org/).
• The Canadian Museum of Civilization — A History of the Vote in Canada (http:/ / www.civilization. ca/ cmc/
exhibitions/ hist/ elections/ el_000_e. shtml)
•  Chisholm, Hugh, ed (1911). "Vote". Encyclopædia Britannica (11th ed.). Cambridge University Press.
World Bank's Inspection Panel
World Bank's Inspection Panel
The Inspection Panel
Inspection Panel Logo
Abbreviation IPN
Formation 22 September 1993
Type Accountability Mechanism
Location Washington, DC
Chairman Roberto Lenton
Parent organization IDA, IBRD
Website [1]
The Inspection Panel is an independent accountability mechanism of the World Bank. It was established in
September 1993 by the World Bank Board of Directors, and started operations on August 1, 1994. The Panel
provides a forum for people who believe that they may be adversely affected by Bank-financed operations to bring
their concerns to the highest decision-making levels of the World Bank. The Panel determines whether the Bank is
complying with its own policies and procedures, which are designed to ensure that Bank-financed operations provide
social and environmental benefits and avoid harm to people and the environment. The Inspection Panel was the first
body to promote accountability among international financial institutions through this community-led, or
“bottom-up”, approach which is complementary to the “top-down” forms of accountability, such as evaluation
initiated by the World Bank itself. Building on this example, other multilateral and regional financial institutions
have established similar accountability mechanisms as part of broader efforts at sustainable and equitable
The Inspection Panels mandate allows it to investigate on Projects funded by the IBRD and IDA, both part of the
World Bank Group, and to determine whether they are complying with their policies and procedures in the design,
appraisal, and implementation of a project. These policies and procedures are not limited to the Bank’s social and
environmental safeguard policies, but include other Operational Policies, Bank Procedures, and Operational
Directives, as well as other Bank procedural documents.
World Bank's Inspection Panel
The Inspection Panels Operation
The World Bank & Inspection Panel's
headquarters in Washington, D.C.
Organization of the Inspection Panel
The Inspection Panel consists of three members who are appointed by
the Board of Directors for non-renewable periods of five years. In
addition to the three Panel members, an Executive Secretariat was
established to assist and support all Panel activities. The Panel is
independent of Bank Management and is provided with independent
resources to discharge its functions.
The Inspection Panel Process
The Panel process begins when it receives a Request for Inspection
from a party of two or more Requesters, claiming that the Bank has
violated its policies and procedures. Most of the Requests submitted have concerned some of the Bank’s safeguard
policies, such as the policies on environmental assessment, involuntary resettlement, or indigenous people. Once the
Panel has received and registered a Request for Inspection, the Eligibility Phase of the Inspection Process
commences. Beginning on the day of registration the World Bank’s Management has 21 days to provide the Panel
with evidence that it complied or intended to comply with the Bank’s relevant policies and procedures. After
receiving Managements response the Panel has 21 business days to determine the eligibility of the Request.
Once it has been determined that the eligibility criteria have been met, and after having reviewed the Management
Response to the Request, the Panel may, taking other factors it may have discovered during a field visit into
consideration, make a recommendation to investigate. In some cases, the Panel has promoted problem solving
between Management and the Requesters to help mediate less contentious cases and lead to an earlier resolution of
community concerns or policy compliance problems. An investigation is not automatic, and can only be authorized
by the Board of Executive Directors. If the Board approves, the next step is the substantive phase of the inspection
process when the Panel evaluates the merits of the Request. In the investigation phase, the Panel is focused on fact
finding and verification. It visits the borrowing country and meets with the Requesters and other affected people, as
well as with a broad array of people from whom it can learn in detail about the issues, concerns, the project’s status,
and potential harmful effects. The investigation phase may take a few months, or more in complex cases.
Once the investigation phase is complete, the Panel submits its Investigation Report to Bank Management. The
Board meets to consider both the Panel’s Investigation Report and Management’s recommendations, before deciding
whether to approve the recommendations. The Board may ask the Panel to check whether Management has made
appropriate consultations about the remedial measures with the affected peoples.
History of the Inspection Panel
Events Leading to the Creation of the Inspection Panel
During the 1980s the Bank had begun developing and committing itself to operational policies and procedures,
including policies on involuntary resettlement (1980), tribal peoples (1982), and environmental assessment (1988).
In the late 1980s and early 1990s however, widespread voices of concern and protest from civil society and
project-affected communities questioned the social and environmental impacts of Bank-financed operations. A
central element of this critique was that the Bank was not complying with its policy commitments which it had
adopted to prevent these very types of adverse social and environmental impacts. Serious debates on these issues
also took place within the Bank’s member governments and the Bank itself. In June 1992, the international
community gathered at the United Nations Conference on Environment and Development in Rio de Janeiro to chart a
World Bank's Inspection Panel
new cooperative approach to addressing interrelated issues of social development, economic development, and
environmental protection. The blueprint for the creation of the Inspection Panel was developed in this larger context
as a result of efforts from civil society, governments, and members of the Bank’s Board to establish a new and
independent mechanism for greater accountability, participation, and transparency at the World Bank.
Creation of the Inspection Panel
The Panel was officially created by two similar resolutions of the International Bank for Reconstruction and
Development (IBRD) and the International Development Agency (IDA) signed by the Board of Executive Directors
on September 1, 1993 (Resolution IBRD 93–10 and Resolution IDA 93–6). The Resolution specifies that the Panel
has jurisdiction with respect to operations supported by the IBRD and the IDA. In 1996 and 1999 Clarifications were
added to the Resolution.
[1] http:/ / www. inspectionpanel. org/
• The Inspection Panel at 15 Years (http:/ / siteresources. worldbank.org/EXTINSPECTIONPANEL/Resources/
380793-1254158345788/ InspectionPanel2009.pdf)
• Inspection Panel Homepage (http:/ / www. inspectionpanel. org).
• Shihata, Ibrahim (2000), The World Bank Inspection Panel: In Practice, Oxford University Press.
External links
• Official website (http:// http:// www. inspectionpanel. org/ )
• BIC Website (http:/ / www. bicusa. org/en/ Issue. 25.aspx/ )
• Ciel Website (http:// www. ciel. org/)
Company / developer BAE Systems
Working state Current
Source model Closed source
Latest stable release 6.5 / August 2008
Supported platforms x86
Kernel type Monolithic kernel
Official website
The XTS-400 is a multi-level secure computer operating system. It is multi-user and multitasking. It works in
networked environments and supports Gigabit Ethernet and both IPv4 and IPv6.
The XTS-400 is a combination of Intel x86 hardware and the STOP (Secure Trusted Operating Program) operating
system. XTS-400 was developed by BAE Systems, and was originally released as version 6.0 in December 2003.
STOP provides "high-assurance" security and was the first general-purpose operating system with a Common
Criteria assurance level rating of EAL5 or above
. The XTS-400 can host, and be trusted to separate, multiple,
concurrent data sets, users, and networks at different sensitivity levels.
The XTS-400 provides both an "untrusted" environment for normal work and a "trusted" environment for
administrative work and for privileged applications. The untrusted environment is similar to traditional Unix
environments. It provides binary compatibility with Linux applications running most Linux commands and tools as
well as most Linux applications without the need for recompiling. This untrusted environment includes an X
Window System GUI, though all windows on a screen must be at the same sensitivity level.
To support the trusted environment and various security features, STOP provides a set of proprietary APIs to
applications. In order to develop programs that use these proprietary APIs, a special software development
environment (SDE) is needed. The SDE is also needed in order to port some complicated Linux/Unix applications to
the XTS-400.
A new version of the STOP operating system, STOP 7
has since been introduced, with claims to improved
performance and new features such as RBAC.
As a high-assurance, MLS system, XTS-400 can be used in "cross-domain" solutions. A cross-domain solution will
typically require a piece of privileged software to be developed which can temporarily circumvent one or more
security features in a controlled manner. Such pieces are outside the CC evaluation of the XTS-400, but they can be
The XTS-400 can be used as a desktop, server, or network gateway. The interactive environment, typical Unix
command line tools, and a GUI are present in support of a desktop solution. Since the XTS-400 supports multiple,
concurrent network connections at different sensitivity levels, it can be used to replace several single-level desktops
connected to several different networks.
In support of server functionality, the XTS-400 can be purchased in a rack-mount configuration, accepts a UPS,
allows multiple network connections, accommodates many hard disks on a SCSI subsystem (also saving disk blocks
using a "sparse file" implementation in the file system), and provides a trusted backup/save tool. Server software,
such as an Internet daemon, can be ported to run on the XTS-400.
A popular application for high-assurance systems like the XTS-400 is to "guard" information flow between two
networks of differing security characteristics. Several customer guard solutions are available based on XTS systems.
XTS-400 version 6.0.E completed a Common Criteria (CC) evaluation in March 2004 at EAL4 augmented with
ALC_FLR.3 (validation report CCEVS-VR-04-0058). Version 6.0.E also conformed with the protection profiles
entitled "Labeled Security Protection Profile" (LSPP) and "Controlled Access Protection Profile" (CAPP), though
both profiles are surpassed both in functionality and assurance.
XTS-400 version 6.1.E completed evaluation in March 2005 at EAL5 augmented with ALC_FLR.3 and ATE_IND.3
(validation report CCEVS-VR-05-0094), still conforming to the LSPP and CAPP. The EAL5+ evaluation included
analysis of covert channels and additional vulnerability analysis and testing by the National Security Agency.
XTS-400 version 6.4.U4 completed evaluation in July 2008 at EAL5 augmented with ALC_FLR.3 and ATE_IND.3
(validation report CCEVS-VR-VID10293-2008), also still conforming to the LSPP and CAPP. Like its predecessor,
it also included analysis of covert channels and additional vulnerability analysis and testing by the National Security
The official postings for all the XTS-400 evaluations can be seen on the Validated Product List [4] [5]
The main security feature that sets STOP apart from most operating systems is the mandatory sensitivity policy.
Support for a mandatory integrity policy, also sets STOP apart from most MLS or trusted systems. While a
sensitivity policy deals with preventing unauthorized disclosure, an integrity policy deals with preventing
unauthorized deletion or modification (such as the damage that a virus might attempt). Normal (i.e., untrusted) users
do not have the "discretion" to change the sensitivity or integrity levels of objects. The Bell-La Padula and Biba
formal models are the basis for these policies.
Both the sensitivity and integrity policies apply to all users and all objects on the system. STOP provides 16
hierarchical sensitivity levels, 64 non-hierarchical sensitivity categories, 8 hierarchical integrity levels, and 16
non-hierarchical integrity categories. The mandatory sensitivity policy enforces the U.S. DoD data sensitivity
classification model (i.e., Unclassified, Secret, Top Secret), but can be configured for commercial environments.
Other security features include:
• Identification and authentication, which forces users to be uniquely identified and authenticated before using any
system services or accessing any information. The user's identification is used for access control decisions and for
accountability via the auditing mechanism.
• Discretionary access control (DAC), which appears just as in Unix/Linux, including the presence of access
control lists on every object. Set-id functionality is supported in a controlled fashion.
• A mandatory "subtype" policy, which allows some of the functionality of trusted systems which support a full
"Type Enforcement" or "Domain-Type Enforcement" policy.
• Auditing of all security-relevant events and trusted tools to allow administrators to detect and analyze potential
security violations.
• Trusted path, which allows a user to be sure s/he is interacting directly with the TSF during sensitive operations.
This prevents, for example, a Trojan horse from spoofing the login process and stealing a user's password.
• Isolation, of the operating system code and data files from the activity of untrusted users and processes. Thus,
even if, for example, a user downloads a virus, the virus will be unable to corrupt or affect the operating system.
• Separation, of processes from one another (so that one process/user cannot tamper with the internal data and code
of another process).
• Reference monitor functionality, so that no access can bypass scrutiny by the operating system.
• Strong separation of administrator, operator, and user roles using the mandatory integrity policy.
• Residual information (i.e., object reuse) mechanisms to prevent data scavenging.
• Trusted, evaluated tools for configuring the system, managing security-critical data, and repairing file systems.
• Self-testing of security mechanisms, on demand.
• Exclusion of higher layer network services from the trusted security functions (TSF), so that the TSF is not
susceptible to the publicly known vulnerabilities in those services.
STOP comes in only a single package, so that there is no confusion about whether a particular package has all
security features present. Mandatory policies cannot be disabled. Policy configuration does not require a potentially
complicated process of defining large sets of domains and data types (and the attendant access rules).
To maintain the trustworthiness of the system, the XTS-400 must be installed, booted, and configured by trusted
personnel. The site must also provide physical protection of the hardware components. The system, and software
upgrades, are shipped from BAE Systems in a secure fashion.
For customers who want them, XTS-400 supports a Mission Support Cryptographic Unit (MSCU) and Fortezza
cards. The MSCU performs "type 1" cryptography and has been separately scrutinized by the U.S. National Security
The CC evaluation forces particular hardware to be used in the XTS-400. Though this places restrictions on the
hardware configurations that can be used, several configurations are available, including rack-mount and tower form
factors. The XTS-400 uses only standard PC, COTS components, except for an optional MSCU (Mission Support
Cryptographic Unit).
The hardware is based around an Intel Xeon (P4) CPU at up to 2.8 GHz speeds. Up to 2 GB of main memory is
A PCI bus is used for add-in cards such as Gigabit Ethernet. Up to 16 simultaneous Ethernet connections can be
made, all of which can be configured at different mandatory security and integrity levels.
A SCSI subsystem is used to allow a number of high-performance peripherals to be attached. One SCSI peripheral is
a PC Card reader that can support Fortezza. Multiple SCSI host adapters can be included.
The XTS-400 has been preceded by several evaluated ancestors, all developed by the same group: SCOMP (Secure
Communications Processor), "XTS-200", and "XTS-300". All of the predecessor products were evaluated under
TCSEC (a.k.a. Orange Book) standards. SCOMP completed evaluation in 1984 at the highest functional and
assurance level then in place: "A1". Since then the product has evolved from proprietary hardware and interfaces to
commodity hardware and Linux interfaces.
The XTS-200 was designed as a general-purpose operating system supporting a Unix-like application and user
environment. XTS-200 completed evaluation in 1992 at the "B3" level.
The XTS-300 transitioned from proprietary, mini-computer hardware to COTS, Intel x86 hardware. XTS-300
completed evaluation in 1994 at the B3 level. XTS-300 also went through several ratings maintenance cycles (a.k.a.
RAMP), very similar to an "assurance continuity" cycle under CC, ultimately ending up with version 5.2.E being
evaluated in 2000.
Development of the XTS-400 began in June 2000. The main customer-visible change was specific conformance to
the programming API of Linux. Though the security features of the XTS system put some restrictions on the API
and require additional, proprietary interfaces, conformance is close enough that most applications will run on the
XTS without recompilation. Some security features were added or improved as compared to earlier versions of the
system and performance was also improved.
As of July 2006, enhancements continue to be made to the XTS line of products.
On September 5, 2006, the United States Patent Offices granted BAE Systems Information Technology, LLC.
United States Patent # 7,103,914 "Trusted computer system".
STOP is a monolithic kernel operating system (as is Linux). Though it provides a Linux-like API, STOP was not
based on Unix or Linux source. STOP is highly layered and highly modularized and relatively small and simple.
These characteristics have historically facilitated high-assurance evaluations.
STOP is layered into four "rings" and each ring is further subdivided into layers. The innermost ring has hardware
privilege and applications, including privileged commands, run in the outermost. The inner three rings constitute the
"kernel". Software in an outer ring is prevented from tampering with software in an inner ring. The kernel is part of
every process's address space and is needed by both normal and privileged processes.
A "security kernel" occupies the innermost and most privileged ring and enforces all mandatory policies. It provides
a virtual process environment, which isolates one process from another. It performs all low-level scheduling,
memory management, and interrupt handling. The security kernel also provides I/O services and an IPC message
mechanism. The security kernel's data is global to the system.
Trusted system services (TSS) software executes in ring 1. TSS implements file systems, implements TCP/IP, and
enforces the discretionary access control policy on file system objects. TSS's data is local to the process within which
it is executing.
Operating system services (OSS) executes in ring 2. OSS provides Linux-like API to applications as well as
providing additional proprietary interfaces for using the security features of the system. OSS implements signals,
process groups, and some memory devices. OSS's data is local to the process within which it is executing.
Software is considered trusted if it performs functions upon which the system depends to enforce the security policy
(e.g., the establishment of user authorization). This determination is based on integrity level and privileges.
Untrusted software runs at integrity level 3, with all integrity categories, or lower. Some processes require privileges
to perform their functions—for example the Secure Server needs to access to the User Access Authentication
database, kept at "system high", while establishing a session for a user at a lower sensitivity level.
Potential weaknesses
The XTS-400 can provide a high level of security in many application environments, but trade-offs are made to
attain that security. Potential weaknesses for some customers may include:
• Slower performance due to more rigid internal layering and modularity and to additional security checks.
• Fewer application-level features available out-of-the-box.
• Some source level changes may be necessary to get complicated applications to run.
• The trusted user interface does not utilize a GUI and has weak command line features.
• Limited hardware choices.
• Not intended for embedded or real-time solutions.
[1] http:/ / www. baesystems. com/ ProductsServices/ bae_prod_csit_xts400.html
[2] http:/ / www. commoncriteriaportal.org/ products/
[3] http:// www. baesystems. com/ ProductsServices/ bae_prod_csit_xtsstop7. html
[4] http:/ / www. niap-ccevs. org/ cc-scheme/vpl/
[5] http:// www. commoncriteriaportal.org/ products_OS. html#OS
External links
• BAE's XTS-400 product page (http:// www. baesystems. com/ ProductsServices/ bae_prod_csit_xts400. html)
• XTS-400 EAL5+ validated product page (http:// www.niap-ccevs. org/cc-scheme/ st/ vid10293)
• XTS-400 EAL5+ validated product page (http:/ / www.niap-ccevs. org/cc-scheme/ st?vid=3012)
• XTS-400 EAL4+ validated product page (http:/ / www.niap-ccevs. org/cc-scheme/ st?vid=9503)
• United States Patent 7,103,914: Trusted computer system (http:/ / patft.uspto. gov/ netacgi/
nph-Parser?TERM1=7103914&u=/ netahtml/ srchnum. htm& Sect1=PTO1&Sect2=HITOFF&p=1& r=0&
l=50& f=S&d=PALL)
• Paper on the need for secure operating systems and mandatory security (http:/ / www.nsa. gov/ selinux/ papers/
inevitability/ )
• Monterey Security Architecture (MYSEA) (http:// cisr. nps. navy. mil/ projects/ mysea. html), a Naval
Postgraduate School project which utilized the STOP OS
• XMPP & Cross Domain Collaborative Information Environment (CDCIE) Overview (http:// www.sensornet.
gov/ net_ready_workshop/Boyd_Fletcher_CDCIE_XMPP_Overview_for_NetReadySensors_Conf. pdf),
multinational information sharing in both single and cross domain environments (utilizes STOP OS)
• Trustifier TCB overview (http:// www.afcea.org/ wiki/ index.php?title=Trusted_Computing_Base)
Accelerated aging
Accelerated aging is testing that uses aggravated conditions of heat, oxygen, sunlight, vibration, etc. to speed up the
normal aging processes of items. It is used to help determine the long term effects of expected levels of stress within
a shorter time, usually in a laboratory by controlled standard test methods. It is used to estimate the useful lifespan of
a product or its shelf life when actual lifespan data is unavailable. This occurs with products that have not existed
long enough to have gone through their useful lifespan: for example, a new type of car engine or a new polymer for
replacement joints.
Physical testing or chemical testing is carried out by subjecting the product to 1)representative levels of stress for
long time periods, 2) unusually high levels of stress used to accelerate the effects of natural aging, or 3) levels of
stress that intentionally force failures (for further analysis). Mechanical parts are run at very high speed, far in excess
of what they would receive in normal usage. Polymers are often kept at elevated temperatures, in order to accelerate
chemical breakdown. Environmental chambers are often used.
Also, the device or material under test can be exposed to rapid (but controlled) changes in temperature, humidity,
pressure, strain, etc. For example, cycles of heat and cold can simulate the effect of day and night for a few hours or
Accelerated aging
Library and archival preservation science
Accelerated aging is also used in library and archival preservation science. In this context, a material, usually paper,
is subjected to extreme conditions in an effort to speed up the natural aging process. Usually, the extreme conditions
consist of elevated temperature, but tests making use of concentrated pollutants or intense light also exist.
tests may be used for several purposes.
• To predict the long-term effects of particular conservation treatments. In such a test, treated and untreated papers
are both subjected to a single set of fixed, standardized conditions. The two are then compared in an effort to
determine whether the treatment has a positive or negative effect on the lifespan of the paper.
• To study the basic processes of paper decay. In such a test, the purpose is not to predict a particular outcome for a
specific type of paper, but rather to gain a greater understanding of the chemical mechanisms of decay.
• To predict the lifespan of a particular type of paper. In such a test, paper samples are generally subjected to
several elevated temperatures and a constant level of relative humidity equivalent to the relative humidity in
which they would be stored. The researcher then measures a relevant quality of the samples, such as folding
endurance, at each temperature. This allows the researcher to determine how many days at each temperature it
takes for a particular level of degradation to be reached. From the data collected, the researcher extrapolates the
rate at which the samples might decay at lower temperatures, such as those at which the paper would be stored
under normal conditions. In theory, this allows the researcher to predict the lifespan of the paper. This test is
based on the Arrhenius equation. This type of test is, however, a subject of frequent criticism.
There is no single recommended set of conditions at which these tests should be performed. In fact, temperatures
from 22 to 160 degrees Celsius, relative humidities from 1% to 100%, and test durations from one hour to 180 days
have all been used.
ISO 5630-3 recommends accelerated aging at 80 degrees Celsius and 65% relative humidity
when using a fixed set of conditions.
Besides variations in the conditions to which the papers are subjected, there are also multiple ways in which the test
can be set up. For instance, rather than simply placing single sheets in a climate controlled chamber, the Library of
Congress recommends sealing samples in an air-tight glass tube and aging the papers in stacks, which more closely
resembles the way in which they are likely to age under normal circumstances, rather than in single sheets.
The technique of artificially accelerating the deterioration of paper through heat was known by 1899, when it was
described by W. Herzberg.
Accelerated aging was further refined during the 1920s, with tests using sunlight and
elevated temperatures being used to rank the permanence of various papers in the United States and Sweden. In
1929, a frequently used method in which 72 hours at 100 degrees Celsius is considered equivalent to 18–25 years of
natural aging was established by R. H. Rasch.
In the 1950s, researchers began to question the validity of accelerated aging tests which relied on dry heat and a
single temperature, pointing out that relative humidity affects the chemical processes which produce paper
degradation and that the reactions which cause degradation have different activation energies. This led researchers
like Baer and Lindström to advocate accelerated aging techniques using the Arrhenius equation and a realistic
relative humidity.
Accelerated aging techniques, particularly those using the Arrhenius equation, have frequently been criticized in
recent decades. While some researchers claim that the Arrhenius equation can be used to quantitatively predict the
lifespan of tested papers,
other researchers disagree. Many argue that this method cannot predict an exact lifespan
for the tested papers, but that it can be used to rank papers by permanence.

A few researchers claim that even
such rankings can be deceptive, and that these types of accelerated aging tests can only be used to determine whether
Accelerated aging
a particular treatment or paper quality has a positive or negative effect on the paper’s permanence.
There are several reasons for this skepticism. One argument is that entirely different chemical processes take place at
higher temperatures than at lower temperatures, which means the accelerated aging process and natural aging process
are not parallel.


Another is that paper is a “complex system”
and the Arrhenius equation only applicable to
elementary reactions. Other researchers criticize the ways in which deterioration is measured during these
experiments. Some point out that there is no standard point at which a paper is considered unusable for library and
archival purposes.
Others claim that the degree of correlation between macroscopic, mechanical properties of
paper and molecular, chemical deterioration has not been convincingly proven.

In an effort to improve the quality of accelerated aging tests, some researchers have begun comparing materials
which have undergone accelerated aging to materials which have undergone natural aging.
The Library of
Congress, for instance, began a long-term experiment in 2000 to compare artificially aged materials to materials
allowed to undergo natural aging for a hundred years.
[1] (http:// www. knaw. nl/ ecpa/ publ/porck2. pdf), Porck, H. J. (2000). Rate of paper degradation: The predictive value of artificial aging
tests. Amsterdam: European Commission on Preservation and Access.
[2] Bansa, H. (1992). Accelerated aging tests in conservation research: Some ideas for a future method. Restaurator 13.3, 114-137.
[3] (http:/ / www. loc. gov/ preserv/ acceltest. html), Library of Congress (2006). Accelerated aging of paper: A new test. The Library of
Congress: Preservation. Retrieved 8 August 2009.
[4] Zou, X.; Uesaka, T; & Gurnagul, G. (1996). Predication of paper permanence by accelerated aging I. Kinetic analysis of the aging process.
Cellulose 3, 243-267.
[5] Strofer-Hua, E. (1990). Experimental measurement: Interpreting extrapolation and prediction by accelerated aging. Restaurator 11, 254-266.
[6] Bégin, P. L. & Kaminska, E. (2002). Thermal accelerated ageing test method development. Restaurator 23, 89-105.
[7] Bansa, H. (2002). Accelerated aging of paper: Some ideas on its practical benefit. Restaurator 23, 106-117.
[8] Bansa, H. (1989). Artificial aging as a predictor of paper’s future useful life. The Abbey Newsletter Monograph Supplement 1.
[9] Calvini, P. & Gorassini, A. (2006). On the rate of paper degradation: Lessons from the past. Restaurator 27, 275-290.
[10] (http:/ / www. naa. gov. au/ images/ batterham-rai_tcm2-13043.pdf) Batterham, I & Rai, R. (2008). A comparison of artificial ageing with
27 years of natural ageing. 2008 AICCM Book, Paper and Photographic Materials Symposium, 81-89.
[11] (http:/ / www. loc. gov/ preserv/ rt/projects/ 100-yr_nat_aging.html), Library of Congress (2008). 100-year paper natural aging project. The
Library of Congress: Preservation. Retrieved 8 August 2009.
External links
• Medical Plastics and Biomaterials Magazine (http:/ / www.devicelink. com/ mpb/ archive/ 98/07/ 002. html)
Adaptive comparative judgement
Adaptive comparative judgement
Adaptive Comparative Judgement is a technique borrowed from psychophysics which is able to generate reliable
results for educational assessment - as such it is an alternative to traditional exam script marking. In the approach
judges are presented with pairs of student work and are then asked to choose which is better, one or the other. By
means of an iterative and adaptive algorithm, a scaled distribution of student work can then be obtained without
reference to criteria.
Traditional exam script marking began in Cambridge 1792 when, with undergraduate numbers rising, the importance
of proper ranking of students was growing. So in 1792 the new Proctor of Examinations, William Farish, introduced
marking, a process in which every examiner gives a numerical score to each response by every student, and the
overall total mark puts the students in the final rank order. Francis Galton (1869) noted that, in an unidentified year
about 1863, the Senior Wrangler scored 7,634 out of a maximum of 17,000, while the Second Wrangler scored
4,123. (The ‘Wooden Spoon’ scored only 237.)
Prior to 1792, a team of Cambridge examiners convened at 5pm on the last day of examining, reviewed the 19
papers each student had sat – and published their rank order at midnight. Marking solved the problems of numbers
and prevented unfair personal bias, and its introduction was a step towards modern objective testing, the format it is
best suited to. But the technology of testing that followed, with its major emphasis on reliability and the
automatisation of marking, has been an uncomfortable partner for some areas of educational achievement: assessing
writing or speaking, and other kinds of performance need something more qualitative and judgemental.
The technique of Adaptive Comparative Judgement is an alternative to marking. It returns to the pre-1792 idea of
sorting papers according to their quality, but retains the guarantee of reliability and fairness. It is by far the most
reliable way known to score essays or more complex performances. It is much simpler than marking, and has been
preferred by almost all examiners who have tried it. The real appeal of Adaptive Comparative Judgement lies in how
it can re-professionalise the activity of assessment and how it can re-integrate assessment with learning.
Thurstone’ s Law of Comparative Judgement
“There is no such thing as absolute judgement" Laming (2004)
The science of comparative judgement began with Louis Leon Thurstone of the University of Chicago. A pioneer of
psychophysics, he proposed several ways to construct scales for measuring sensation and other psychological
properties. One of these was the Law of comparative judgment (Thurstone, 1927a, 1927b)

, which defined a
mathematical way of modeling the chance that one object will ‘beat’ another in a comparison, given values for the
‘quality’ of each. This is all that is needed to construct a complete measurement system.
A variation on his model (see Pairwise_comparison and the BTL model), states that the difference between their
quality values is equal to the log of the odds that object-A will beat object-B:
Before the availability of modern computers, the mathematics needed to calculate the ‘values’ of each object’s quality
meant that the method could only be used with small sets of objects, and its application was limited. For Thurstone,
the objects were generally sensations, such as intensity, or attitudes, such as the seriousness of crimes, or statements
of opinions. Social researchers continued to use the method, as did market researchers for whom the objects might be
different hotel room layouts, or variations on a proposed new biscuit.
Adaptive comparative judgement
In the 1970s and 1980s Comparative Judgement appeared, almost for the first time in educational assessment, as a
theoretical basis or precursor for the new Latent Trait or Item Response Theories. (Andrich, 1978) These models are
now standard, especially in item banking and adaptive testing systems.
Re-introduction in education
The first published paper using Comparative Judgement in education was Pollitt & Murray (1994), essentially a
research paper concerning the nature of the English proficiency scale assessed in the speaking part of Cambridge’s
CPE exam. The objects were candidates, represented by 2-minute snippets of video recordings from their test
sessions, and the judges were Linguistics post-graduate students with no assessment training. The judges compared
pairs of video snippets, simply reporting which they thought the better student, and were then clinically interviewed
to elicit the reasons for their decisions.
Pollitt then introduced Comparative Judgement to the UK awarding bodies, as a method for comparing the standards
of A Levels from different boards. Comparative judgement replaced their existing method which required direct
judgement of a script against the official standard of a different board. For the first two or three years of this Pollitt
carried out all of the analyses for all the boards, using a program he had written for the purpose. It immediately
became the only experimental method used to investigate exam comparability in the UK; the applications for this
purpose from 1996 to 2006 are fully described in Bramley (2007)
In 2004 Pollitt presented a paper at the conference of the International Association for Educational Assessment titled
Let’s Stop Marking Exams, and another at the same conference in 2009 titled Abolishing Marksism. In each paper
the aim was to convince the assessment community that there were significant advantages to using Comparative
Judgement in place of marking for some types of assessment. In 2010 he presented a paper at the Association for
Educational Assessment – Europe, How to Assess Writing Reliably and Validly, which presented evidence of the
extraordinarily high reliability that has been achieved with Comparative Judgement in assessing primary school
pupils’skill in first language English writing.
Adaptive Comparative Judgement
Comparative Judgement becomes a viable alternative to marking when it is implemented as an adaptive web-based
assessment system. In this, the 'scores' (the model parameter for each object) are re-estimated after each 'round' of
judgements in which, on average, each object has been judged one more time. In the next round, each script is
compared only to another whose current estimated score is similar, which increases the amount of statistical
information contained in each judgement. As a result, the estimation procedure is more efficient than random
pairing, or any other pre-determined pairing system like those used in classical comparative judgement applications.
As with computer-adaptive testing, this adaptivity maximises the efficiency of the estimation procedure, increasing
the separation of the scores and reducing the standard errors. The most obvious advantage is that this produces
significantly enhanced reliability, compared to assessment by marking, with no loss of validity.
Current Comparative Judgement projects
The first application of Comparative Judgement to the direct assessment of students was in a project called e-scape,
led by Prof. Richard Kimbell of London University’s Goldsmiths College (Kimbell & Pollitt, 2008).
development work was carried out in collaboration with a number of awarding bodies in a Design & Technology
course. Kimbell’s team developed a sophisticated and authentic project in which students were required to develop,
as far as a prototype, an object such as a children’s pill dispenser in two three-hour supervised sessions.
The web-based judgement system was designed by Karim Derrick and Declan Lynch from TAG Developments, a
part of BLi Education, and based on the MAPS (software) assessment portfolio system. Goldsmiths, TAG
Adaptive comparative judgement
Developments and Pollitt ran three trials, increasing the sample size from 20 to 249 students, and developing both
the judging system and the assessment system. There are three pilots, involving Geography and Science as well as
the original in Design & Technology.
Primary school writing
In late 2009 TAG Developments and Pollitt trialled a new version of the system for assessing writing. A total of
1000 primary school scripts were evaluated by a team of 54 judges in a simulated national assessment context. The
reliability of the resulting scores after each script had been judged 16 times was 0.96, considerably higher than in any
other reported study of similar writing assessment. Further development of the system has shown that reliability of
0.93 can be reached after about 9 judgements of each script, when the system is no more expensive than single
marking but still much more reliable.
Several projects are underway at present, in England, Scotland, Ireland, Israel, Singapore and Australia. They range
from primary school to university in context, and include both formative and summative assessment, from writing to
Mathemtatics. The basic web system is now available on a commercial basis from TAG Developments (http:/ /
www.tagdevelopments. com), and can be modified to suit specific needs.
[1] * Laming, D R J (2004) Human judgment : the eye of the beholder. London, Thomson.
[2] Thurstone, L L (1927a). Psychophysical analysis. American Journal of Psychology, 38, 368-389. Chapter 2 in Thurstone, L.L. (1959). The
measurement of values. University of Chicago Press, Chicago, Illinois.
[3] Thurstone, L L (1927b). The method of paired comparisons for social values. Journal of Abnormal and Social Psychology, 21, 384-400.
Chapter 7 in Thurstone, L.L. (1959). The measurement of values. University of Chicago Press, Chicago, Illinois
[4] Bramley, T (2007) Paired comparison methods. In Newton, P, Baird, J, Patrick, H, Goldstein, H, Timms, P and Wood, A (Eds). Techniques
for monitoring the comparability of examination standards. London, QCA.
[5] Kimbell R, A and Pollitt A (2008) Coursework assessment in high stakes examinations: authenticity, creativity, reliability Third international
Rasch measurement conference. Perth: Western Australia: January.
• APA, AERA and NCME (1999) Standards for Educational and Psychological Testing.
• Galton, F (1855) Hereditary genius : an inquiry into its laws and consequences. London : Macmillan.
• Kimbell, R A, Wheeler A, Miller S, and Pollitt A (2007) e-scape portfolio assessment (e-solutions for creative
assessment in portfolio environments) phase 2 report. TERU Goldsmiths, University of London ISBN
• Pollitt, A (2004) Let’s stop marking exams. Annual Conference of the International Association for Educational
Assessment, Philadelphia, June. Available at http:/ / www.camexam. co. uk publications.
• Pollitt, A, (2009) Abolishing Marksism, and rescuing validity. Annual Conference of the International Association
for Educational Assessment, Brisbane, September. Available at http:// www.camexam. co.uk publications.
• Pollitt, A, & Murray, NJ (1993) What raters really pay attention to. Language Testing Research Colloquium,
Cambridge. Republished in Milanovic, M & Saville, N (Eds), Studies in Language Testing 3: Performance
Testing, Cognition and Assessment, Cambridge University Press, Cambridge.
External links
• E-scape
Alternative assessment
Alternative assessment
In the education industry, alternative assessment or portfolio assessment is in direct contrast to what is known as
performance evaluation, traditional assessment, standardized assessment or summative assessment. Alternative
assessment is also known under various other terms, including:
• authentic assessment
• integrative assessment
• holistic assessment
• assessment for learning
• formative assessment
In the model, students, teachers, and sometimes parents select pieces from a student's combined work over the
(usually four) years of school to demonstrate that learning and improvement has taken place over those years. Some
of the characteristics of a portfolio assessment is that it emphasizes and evidences the learning process as an active
demonstration of knowledge. It is used for evaluating learning processes and learning outcomes. Alternative
assessments are used to encourage student involvement in their assessment, their interaction with other students,
teachers, parents and the larger community.
Formats vary: demonstrations and journals can be used as alternative assessments, portfolio presentations are
considered the most wholly representative of a student's learning.
Portfolios can be organized by developmental category, content area, or by topics or themes. Portfolios have three
main purposes. One is for assessment and evaluation, assessing progress, achievement, developmental strengths, and
areas for continued work. Another purpose is for self-assessment and reflection, where students can chart their
progress and take ownership of their learning. Finally, portfolios can be used as a means for reporting progress, in
which progress and achievement can be shown to parents.
The type of portfolio used depends on the purpose and what it will be used for. A working portfolio is used to collect
samples of student work for future evaluation. Samples are collected by students and teachers without making final
decisions as to what will be kept or discarded. Later, these items can become part of another type of portfolio. In an
evaluative portfolio, the teacher uses the materials included to complete both formative and summative evaluation of
progress. This is not a full collection of all work, but a definitive collection to show mastery of skills in an area. A
showcase portfolio is used to exhibit a child's best work, chosen by the child. Often, a showcase portfolio may be
used as a way to share accomplishments with parents. Finally, an archival portfolio follows a student over time.
These show a history of student work that follows from class to class. An archival portfolio can pass along
information about the student from one teacher to another as well as allow a student to look back at his or her own
Notable practitioners
• Giselle O. Martin-Kniep
• e-scape project from TERU, Goldsmiths University, http:// en. wikipedia. org/wiki/ E-scape
An aptitude is an innate component of a competency (the others being knowledge, understanding, learned or
acquired abilities (skills) and attitude) to do a certain kind of work at a certain level. Aptitudes may be physical or
mental. The innate nature of aptitude is in contrast to achievement, which represents knowledge or ability that is
Intelligence and aptitudes
Aptitude and intelligence quotient are related, and in some ways opposite, views of human mental ability. Whereas
intelligence quotient sees intelligence as being a single measurable characteristic affecting all mental ability, aptitude
refers to one of many different characteristics which can be independent of each other, such as aptitude for military
flight or computer programming.
This is more similar to the theory of multiple intelligences.
On the contrary, causal analysis with any group of test scores will nearly always show them to be highly correlated.
The U.S. Department of Labor's General Learning Ability, for instance, is determined by combining Verbal,
Numerical and Spatial aptitude subtests. In a given person some are low and others high. In the context of an
aptitude test the "high" and "low" scores are usually not far apart, because all ability test scores tend to be correlated.
Aptitude is better applied intra-individually to determine what tasks a given individual is more skilled at performing.
Inter-individual aptitude differences are typically not very significant due to IQ differences. Of course this assumes
individuals have not already been pre-screened for IQ through some other process such as SAT scores, GRE scores,
finishing medical school, etc.
Combined aptitude and knowledge tests
Tests that assess learned skills or knowledge are frequently called achievement tests. However, certain tests can
assess both types of constructs. An example that leans both ways is the Armed Services Vocational Aptitude Battery
(ASVAB), which is given to recruits entering the armed forces of the United States. Another is the SAT, which is
designed as a test of aptitude for college in the United States, but has achievement elements. For example, it tests
mathematical reasoning, which depends both on innate mathematical ability and education received in mathematics.



External links
• Cognitive Styles and Implications for the Engineering Curriculum
• Measuring Aptitude
- from the Education Resources Information Center Clearinghouse on Tests Measurement
and Evaluation, Washington DC.
• Detailed Description and History of Aptitude Testing, Comparison to Other Types of Assessment
[1] Standardized tests: Mental ability (UC Davis) (http:/ / psychology.ucdavis. edu/ sommerb/ sommerdemo/ stantests/ mental. htm)
[2] Standardized tests: Mental ability (UC Davis) (http:// psychology.ucdavis. edu/ sommerb/ sommerdemo/ stantests/ mental. htm)
[3] The Too Many Aptitudes Problem (http:// megasociety. org/ noesis/ 138/ aptitude.html)
[4] Multipotentiality: multiple talents, multiple challenges (http:/ / www. wellsphere.com/ happiness-article/
[5] Personal Reflections on Testing (http:// www.jocrf.org/about_aptitudes/ Pease.html)
[6] What Do Aptitude Career Tests Measure? (http:/ / www.jobdiagnosis. com/ myblog/aptitude-career-tests.htm)
[7] http:/ / fie.engrng. pitt. edu/ fie95/4d3/ 4d32/ 4d32. htm
[8] http:// www. ericdigests. org/pre-9218/aptitude. htm
[9] http:/ / www. theworksuite. com/ id15. html
Axiomatic design
Axiomatic design
Axiomatic design is a systems design methodology using matrix methods to systematically analyze the
transformation of customer needs into functional requirements, design parameters, and process variables.
The method gets its name from its use of design principles or design Axioms (i.e., given without proof) governing
the analysis and decision making process in developing high quality product or system designs. Axiomatic design is
considered to be a design method that addresses fundamental issues in Taguchi methods.
The methodology was developed by Dr. Suh Nam Pyo at MIT, Department of Mechanical Engineering since the
1990s. A series of academic conferences have been held to present current developments of the methodology. The
most recent International Conference on Axiomatic Design (ICAD) was held in 2009 in Portugal.
[1] *Suh (1990), The Principles of Design, Oxford University Press, 1990, ISBN 0-19-504345-6
• Suh (2001). Axiomatic Design: Advances and Applications, Oxford University Press, 2001, ISBN 0-19-513466-4
• Suh (2005). Complexity: Theory and Applications, Oxford University Press, 2005, ISBN 0-19-517876-9
• El-Haik, Axiomatic Quality, Wiley, 2005, ISBN 0-471-68273-X
• Stamatis, Six Sigma and Beyond: Design for Six Sigma, Volume VI, CRC Press, 2002, ISBN 1-57444-315-1
External links
A discussion of the methodology is given here:
• Building Better Vehicles via Axiomatic Design (http:/ / www.autofieldguide.com/ articles/ 060001. html)
• Axiomatic Design for Complex Systems (http:/ / web.mit. edu/ mitpep/ pi/ courses/ axiomatic_design. html) is a
professional short course offered at MIT
Past proceedings of International Conferences on Axiomatic Design can be downloaded here:
• ICAD2009 (http:// www. axiomaticdesign. com/ technology/ cat27. asp)
• ICAD2006 (http:// www. axiomaticdesign. com/ technology/ cat25. asp)
• ICAD2004 (http:// www. axiomaticdesign. com/ technology/ cat21. asp)
• ICAD2002 (http:// www. axiomaticdesign. com/ technology/ cat23. asp)
• ICAD2000 (http:// www. axiomaticdesign. com/ technology/ cat22. asp)
Axiomatic product development lifecycle
Axiomatic product development lifecycle
The Axiomatic Product Development Lifecycle (APDL) in systems engineering is a model developed by Bulent
Gumus in 2005.
This new model is based on the Axiomatic design method developed by MIT Professor Nam P.
Suh since the 1990s
; hence it inherits the benefits of applying the Axiomatic Design to product development. The
Axiomatic Design method is extended to cover the whole product development lifecycle including the test domain
and new domain characteristic vectors are introduced such as the input constraint and system component vectors.
The objectives of APDL model are to guide the designers, developers, and other members of a transdisciplinary
product development team throughout the development effort as well as to help capture, maintain, and manage the
product development knowledge. The APDL model aims to improve the quality of the design, requirements
management, change management, project management, and communication between stakeholders as well as to
shorten the development time and reduce the cost.
For the purposes of managing development lifecycle knowledge and supporting different development lifecycle
activities such as requirements and change management throughout the whole product development lifecycle, one
new domain and four new characteristic vectors are added to the existing AD domains and characteristic vectors.
A characteristic vector for the system components (SCs), that provide the design solution stated in the DPs, is
defined in the Physical Domain. The SC hierarchy represents the physical architecture of the system or the product
tree. The method for categorizing the components with respect to system physical architecture varies with each
organization. A general portrayal used by Eppinger (2001) is system, subsystem, and component, although further
categories are available, such as the system, segment, element, subsystem, assembly, subassembly, and part (NASA,
The SC vector and the SC hierarchy (system physical architecture) makes it possible to perform such analysis and
activities as Design Structure Matrixes (DSM), change management, component-based cost management and impact
analysis as well as capturing structural information and requirement traceability.
Another difference between the AD and the APDL model is that in the APDL model the PVs describe the processes
to produce the SCs, not the DPs. Another addition to the AD method is the input constraint (IC) vector that exists in
the functional domain along with the functional requirement (FR) vector. The IC vector is used to capture the input
constraints (IC), which are specific to overall design goals and imposed externally by the customer, by the industry,
or by government regulations. The ICs are derived from the CNs and then updated based on the other rules and
regulations that the product has to comply with but not mentioned in the Customer Domain. This new vector helps
establish the relationships between ICs and the CNs and also helps allocate the ICs to the DPs. The mapping between
the ICs and DPs may require the decomposition of the ICs to allocate specific ICs to the lower level DPs. This
mapping is used in evaluating the design solutions to assess if the proposed design satisfies the allocated ICs.
The component test cases (CTCs), that are used to verify that the corresponding component satisfies the allocated
FRs and ICs, are defined in the {CTC} characteristic vector in the test domain. Component test is defined by IEEE
Std. 610.12-1990 as “Testing of individual hardware or software components or groups of related components.” Each
system component (including subsystems) must be tested before it is integrated into the system to make sure that the
requirements and constraints allocated to that component are all satisfied.
At the end of the system development, the system must be tested to make sure that the system satisfies all of the
functional requirements defined in the functional specification document. The functional test cases (FTCs) are stored
in the {FTC} characteristic vector in the test domain. Functional test is a glass (white) box test and its purpose is to
prove that the requirements are achieved by the system. IEEE (1990) defines functional testing as “(1) Testing that
ignores the internal mechanism of a system or component and focuses solely on the outputs generated in response to
Axiomatic product development lifecycle
selected inputs and execution conditions. (2) Testing conducted to evaluate the compliance of a system or component
with specified functional requirements.”
APDL Domain Contents
Customer domain
The customer needs (CNs) that the customer seeks in a product or system, voice of the customer.
Functional domain
The functional requırements (FRs) completely characterize the functional needs of the design solution (i.e., software,
organization, etc.) in the functional domain.
The input constraints (ICs) are imposed externally by the customer, by industry standard, or by government
regulations and they set limits for acceptable DPs.
Physical domain
The design parameters (DPs) are the elements of the design solution in the physical domain that are chosen to satisfy
the specified FRs. DPs can be conceptual design solutions, subsystems, components, or component attributes.
The system components (SCs) are the physical entities that provide the design solution described as DPs. The
hierarchical collection of the SCs forms the system physical architecture. SCs are either produced or selected from
commercially available alternatives.
Process domain
Process variables (PVs) that characterize the process to produce (i.e. manufacture, implement, code, etc.) the SCs.
Test domain
The functional test cases (FTCs) are used to verify that the FRs documented in the requirement specification (RS)
document are satisfied by the system.
The component/unit test cases (CTCs) are used to verify that the SCs (either subsystems or components) satisfy the
allocated FRs and design ICs.
The APDL model proposes a V-shaped process to develop the detail design (DPs and SCs), PVs and CTCs with a
top-down approach and to complete the PVs, CTCs, and FTCs and produce and test the product with a bottom-up
The APDL model is also called as
• The Transdisciplinary System Development Lifecycle (TSDL) model.
• The Transdisciplinary Product Development Lifecycle (TPDL) model.
Axiomatic product development lifecycle
[1] Bulent Gumus (2005). Axiomatic Product Development Lifecycle (APDL) Model. PhD Dissertation, TTU, 2005, http:// etd.lib. ttu.edu/
theses/available/ etd-11282005-154139/unrestricted/ Gumus_Bulent_Diss. pdf
[2] Suh (1990). The Principles of Design, Oxford University Press, 1990, ISBN 0-19-504345-6
Further reading
• B. Gumus, A. Ertas, D. Tate and I. Cicek, Transdisciplinary Product Development Lifecycle, Journal of
Engineering Design, 19(03), pp. 185–200, June 2008. DOI: 10.1080/09544820701232436.
• B. Gumus, A. Ertas, and D. TATE, “Transdisciplinary Product Development Lifecycle Framework And Its
Application To An Avionics System”, Integrated Design and Process Technology Conference, June 2006.
• B. Gumus and A. Ertas, “Requirements Management and Axiomatic Design”, Journal of Integrated Design and
Process Science, Vol. 8 Number 4, pp. 19–31, Dec 2004.
• Suh, Complexity: Theory and Applications, Oxford University Press, 2005, ISBN 0-19-517876-9
• Suh, Axiomatic Design: Advances and Applications, Oxford University Press, 2001, ISBN 0-19-513466-4
Behavioral Risk Factor Surveillance System
The Behavioral Risk Factor Surveillance System (BRFSS) is a United States health survey that looks at behavioral
risk factors. It is run by Centers for Disease Control and Prevention and conducted by the individual state health
departments. The survey is administered by telephone and is the world's largest such survey. In 2009, the BRFSS
began conducting surveys by cellular phone in addition to traditional “landline” telephones.
The BRFSS is a cross-sectional telephone survey conducted by state health departments with technical and
methodological assistance provided by the CDC. In addition to all 50 states, the BRFSS is also conducted by health
departments in The District of Columbia, Guam, Puerto Rico, and the U.S. Virgin Islands.
Individual states can add their own questions to the survey instrument, which consists of a core set of questions on
certain topics like car safety, obesity, or exercise. States get funding from the federal government to administer these
questionnaires, and they pay for the additional questions themselves.
The U.S. federal government can then compare states based on the core questions to allocate funding and focus
interventions. The states themselves also use the survey results to focus interventions for the public and to decide
what is worth their while to focus on. City, county, tribal, and local governments also rely on BRFSS data for
information about their jurisdictions.
[1] "BRFSS Frequently Asked Questions (FAQs)" (http:/ / www.cdc. gov/ BRFSS/ faqs.htm#1). Center for Disease Control. . Retrieved
External links
• Behavioral Risk Factor Surveillance System (http:/ / www.cdc.gov/ BRFSS/ )
• CDC Website (http:/ / www. cdc. gov/ )
Between-group design
Between-group design
In the design of experiments, a between-group design is an experiment that has two or more groups of subjects each
being tested by a different testing factor simultaneously. This design is usually used in place of, or in some cases, in
conjunction with, the ‘within-subjects’ design, which applies the same variations of conditions to each subject to
observe the reactions. The simplest between-group design occurs with two groups; one is generally regarded as the
treatment group, which receives the ‘special’ treatment, (that is, is treated with some variable) and the control group,
which receives no variable treatment and is used as a reference (prove that any deviation in results from the
treatment group is, indeed, a direct result of the variable.) The between-group design is widely used in psychological,
economic, and sociological experiments, as well as several others in the natural or social sciences.
Applying "blind" in the between-group design
In order to avoid bias in the experiments, “blinds” are usually applied in between-group designs. The most commonly
used type is the single blind, which keeps the subjects blind without identifying themselves as members of the
treatment group or the control group. In a single-blind experiment, a placebo is usually offered to the control group
members. Occasionally, the double blind, a more secure way to avoid bias from both the subjects and the testers, is
implemented. In this case, not only are the subjects are blinded by placeboes, but the testers are also unaware of
which group (control or treatment) they are dealing with. The double blind design can prevent the experiment from
the observer-expectancy effects.
Between-group design
The utilization of the between groups experimental design has several advantages. With this design, multiple
variables, or multiple levels of a variable, can be tested simultaneously. With enough testing subjects, a large
number. Thus, the inquiry is broadened and extended beyond the effect of one variable (as with within-subjects).
This design saves a great deal of time, which is ideal if the results aid in some pressing and time-sensitive issue, such
as health care.
The main disadvantage with between-group designs is that they can be complex and often require a large number of
participants to generate any useful and reliable data. For example, researchers testing the effectiveness of a treatment
for severe depression might need two groups of twenty patients for a control and a test group. If they wanted to add
another treatment to the research, they would need another group of twenty patients. The potential scale of these
experiments can make between-group designs impractical due to limited resources, subjects and space.
Another major concern for between-group designs is bias. Assignment bias, observer-expectancy and
subject-expectancy biases are common causes for skewed data results in between-group experiments, leading to false
conclusions being drawn. These problems can be prevented by implementing random assignment and creating
double-blind experiments whereby both the subject and experimenter are kept blind about the hypothesized effects of
the experiment.
Some other disadvantages for between-group designs are generalization, individual variability and environmental
factors. Whilst it is easy to try to select subjects of the same age, gender and background, this may lead to
generalization issues, as you cannot then extrapolate the results to include wider groups. At the same time, the lack
of homogeneity within a group due to individual variability may also produce unreliable results and obscure genuine
patterns and trends. Environmental variables can also influence results and usually arise from poor research
Practice effect
A practice effect is the outcome/performance change resulting from repeated testing. If multiple levels or some other
variation of the variable is tested repeatedly, which is the case in between-groups experiments, the subjects within
each sub-group become more familiarized with testing conditions, thus increasing responsiveness and performance.
The combination of within-subject design and between-group design
Some might wonder if it is possible to design an experiment that combines the two research design methods –
Within-Subject and Between Group – or if they are two completely distinct methods with their own advantages and
disadvantages. In fact, there is a way to design psychological experiments using both the within-subject and
between-group designs. It is sometimes known as “mixed factorial design” [3]. In this design setup, there are multiple
variables, some classified as within-subject variables, and some classified as between-group variables [3]. Richard
Hall provides an example study that combines both variables:
“So, for example, if we are interested in examining the effects of a new type of cognitive therapy on depression, we
would give a depression pre-test to a group of persons diagnosed as clinically depressed and randomly assign them
into two groups (traditional and cognitive therapy). After the patients were treated according to their assigned
condition for some period of time, let’s say a month, they would be given a measure of depression again (post-test).
This design would consist of one within subject variable (test), with two levels (pre and post), and one between
subjects variable (therapy), with two levels (traditional and cognitive)”
Between-group design
In this example, an experimenter can analyze reasons for depression among specific individuals through the
within-subject variable, and also determine the effectiveness of the two treatment options through a comparison of
the between group variable.
For example: A group of scientists are researching to find out what flavor of ice cream people enjoy the most out of
chocolate, vanilla, strawberry, and mint chocolate chip. 30 participants were chosen to be in the experiment, half
were male and half were female. Each participant tasted 2 spoonfuls of each flavor. They then listed the flavors in
order from best tasting to least favorable. At the end to the experiment the scientist analyzed the data both
holistically and by gender. They found that vanilla was highest rated favorable among all the participants.
Interestingly, they found that men prefer mint chocolate chip to plain chocolate where as women prefer strawberry to
mint chocolate chip.
The above example is both between-groups and within-groups. It is between-groups because there were 15 men
tested and 15 women. None of the participants could be a part of both the male and female groups. The experiment
was also within-in groups because each participant tasted all 4 flavors of ice cream.
• Psychology (sixth edition) by Peter Gray
• Hall, Richard. "2x2 Mixed Factorial Design." Psychology World. 1998. Web. 13 Dec. 2010.
• "Learning Objectives." Research Methods in Psychology. 6th. New York, NY: McGraw-Hill, 2003. Web.
British Polling Council
The British Polling Council (BPC) is an association of market research companies whose opinion polls are
regularly published or broadcast in media in the United Kingdom. The objective of the BPC is to ensure standards of
disclosure, designed to provide readers of published quantitative survey results with an adequate basis for judging
the reliability and validity of the presented statistical findings.
In practice this means that within a few days of a member organisation's poll being published in a newspaper, or
reported in the broadcast media, the members must provide a website address where detailed statistical tables may be
viewed. The full methodology used must also be available to the public. The aim is to increase the level of
confidence that the general public can place on published polls.
The BPC is modelled on the National Council on Public Polls in the United States. The BPC was established in
2004, twelve years after the perceived failure of opinion polls to come close to predicting the actual result of the
United Kingdom general election, 1992. This had led to an inquiry by the Market Research Society, and most
opinion polling companies changed their methodology in the belief that a 'Shy Tory Factor' affected the polling.
However, the BPC does not aim to dictate which methodologies its members use, and indeed they employ a wide
range of fieldwork methods (telephone, door-to-door, and internet) and statistical tools, eg. most companies now
weight by 'past vote', but this technique is rejected by some.
British Polling Council
Full disclosure
Through full disclosure the BPC aims to encourage the highest professional standards, and to advance the
understanding, among politicians, the media and general public, of how opinion polls are conducted, and how to
interpret poll results. The BPC also provides advice on best practice in the conduct and reporting of polls.
The BPC is concerned only with polls and surveys that set out to measure the opinions of representative samples, for
example the views of an explicitly stated population, for example 'all adults', or 'all voters' in a given area. They do
not concern themselves with qualitative research.
Membership is limited to organisations who can show to the satisfaction of the BPC that the sampling methods and
weighting procedures used are designed to accurately represent the views of all people within designated target
groups (such as all adults, or voters etc).
Journalists and academics with expertise in polling are appointed as officers of the BPC.
The following organisations are members of the BPC:
• Ipsos MORI
• Dods Polling
• Populus Ltd
• TNS System 3
• Comres
• Opinion Research Business
• Marketing Means
• Opinium
• YouGov
• CELLO mruk research
• Angus Reid Public Opinion
• ICM Research
• Harris Interactive
[1] http:/ / www. britishpollingcouncil. org/officers.html
External links
• Official website (http:// www. britishpollingcouncil. org/)
Business excellence
Business excellence
Business excellence is the systematic use of quality management principles and tools in business management, with
the goal of improving performance based on the principles of customer focus, stakeholder value, and process
management. Key practices in business excellence applied across functional areas in an enterprise include
continuous and breakthrough improvement, preventative management and management by facts. Some of the tools
used are the balanced scorecard, Lean, the Six Sigma statistical tools, process management, and project management.
Business Excellence Models
In general, business excellence models have been developed by national bodies as a basis for award programs. For
most of these bodies, the awards themselves are secondary in importance to the widespread adoption of the concepts
of business excellence, which ultimately leads to improved national economic performance.
By far the majority of organizations that use these models do so for self-assessment, through which they may
identify improvement opportunities, areas of strength, and ideas for future organizational development. Users of the
EFQM Excellence Model, for instance, do so for the following purposes: self-assessment, strategy formulation,
visioning, project management, supplier management, and mergers.
When used as a basis for an organization's improvement culture, the business excellence criteria within the models
broadly channel and encourage the use of best practices into areas where their effect will be most beneficial to
performance. When used simply for self-assessment, the criteria can clearly identify strong and weak areas of
management practice so that tools such as benchmarking can be used to identify best-practice to enable the gaps to
be closed. These critical links between business excellence models, best practice, and benchmarking are fundamental
to the success of the models as tools of continuous improvement.
EFQM Model
Business excellence, as described by the European Foundation for Quality Management (EFQM), refers to
"outstanding practices in managing the organization and achieving results, all based on a set of eight fundamental
concepts." These concepts are:
1. orientation on balanced results
2. focus on customer value
3. leadership and constancy of purpose
4. management by processes and facts
5. people development and involvement including continuous learning
6. innovation and improvement
7. partnership development and
8. public responsibility
Business excellence
Malcolm Baldrige Award
The most popular and influential model in the western world is the Malcolm Baldrige National Quality Award
Model (also known as the Baldrige model, the Baldrige Criteria
, or the Criteria for Performance Excellence),
launched by the US government. More than 60 national and state/regional awards base their frameworks upon the
Baldrige criteria.
Business Excellence Process Phases
Because of the blend of different methodologies that have specific phases within their processes Business Excellence
drives results through four well defined phases:
1. Discover/Define
2. Measure/Analyze
3. Create/Optimize/Improve
4. Monitor/Control
Those phases evolve continuously within the ever-growing organization, driving constant monitoring, optimization
and re-evaluation and fit together with the Six Sigma core process named DMAIC or the improvement circle named
PDCA which can also be found in ISO 9001.
• EFQM official website
[1] http:/ / www. nist. gov/ baldrige/publications/ criteria.cfm
[2] http:/ / www. efqm.org
Case series
Case series
A case series (also known as a clinical series) is a medical research descriptive study that tracks patients with a
known exposure given similar treatment
or examines their medical records for exposure and outcome. It can be
retrospective or prospective and usually involves a smaller number of patients than more powerful case-control
studies or randomized controlled trials. Case series may be consecutive
or non-consecutive,
depending on
whether all cases presenting to the reporting authors over a period of time were included, or only a selection.
Case series may be confounded by selection bias, which limits statements on the causality of correlations observed;
for example, physicians who look at patients with a certain illness and a suspected linked exposure will have a
selection bias in that they have drawn their patients from a narrow selection (namely their hospital or clinic).
[1] "Definition of case series - NCI Dictionary of Cancer Terms" (http:// www.cancer.gov/ Templates/ db_alpha.aspx?CdrID=44006). .
[2] "Definition of consecutive case series - NCI Dictionary of Cancer Terms" (http:/ / www.cancer.gov/ Templates/ db_alpha.
aspx?CdrID=285747). .
[3] "Definition of nonconsecutive case series - NCI Dictionary of Cancer Terms" (http:// www.cancer.gov/ Templates/ db_alpha.
aspx?CdrID=44575). .
External links
• Study Design Tutorial (http:// www. vet. cornell.edu/ imaging/ tutorial/4studydesigns/ descriptive. html) Cornell
University College of Veterinary Medicine
Case study
A case study is an intensive analysis of an individual unit (e.g., a person, group, or event) stressing developmental
factors in relation to context.
The case study is common in social sciences and life sciences. Case studies may be
descriptive or explanatory. The latter type is used to explore causation in order to find underlying principles.

They may be prospective, in which criteria are established and cases fitting the criteria are included as they become
available, or retrospective, in which criteria are established for selecting cases from historical records for inclusion in
the study.
offers the following definition of case study:"Case studies are analyses of persons, events, decisions,
periods, projects, policies, institutions, or other systems that are studied holistically by one or more methods. The
case that is the subject of the inquiry will be an instance of a class of phenomena that provides an analytical frame —
an object — within which the study is conducted and which the case illuminates and explicates."
Rather than using samples and following a rigid protocol (strict set of rules) to examine limited number of variables,
case study methods involve an in-depth, longitudinal (over a long period of time) examination of a single instance or
event: a case. They provide a systematic way of looking at events, collecting data, analyzing information, and
reporting the results. As a result the researcher may gain a sharpened understanding of why the instance happened as
it did, and what might become important to look at more extensively in future research. Case studies lend themselves
to both generating and testing hypotheses.
Another suggestion is that case study should be defined as a research strategy, an empirical inquiry that
investigates a phenomenon within its real-life context. Case study research means single and multiple case studies,
can include quantitative evidence, relies on multiple sources of evidence and benefits from the prior development of
theoretical propositions. Case studies should not be confused with qualitative research and they can be based on any
mix of quantitative and qualitative evidence. Single-subject research provides the statistical framework for making
Case study
inferences from quantitative case-study data.

This is also supported and well-formulated in (Lamnek, 2005):
"The case study is a research approach, situated between concrete data taking techniques and methodologic
The case study is sometimes mistaken for the case method, but the two are not the same.
Case selection and structure of the case study
An average, or typical, case is often not the richest in information. In clarifying lines of history and causation it is
more useful to select subjects that offer an interesting, unusual or particularly revealing set of circumstances. A case
selection that is based on representativeness will seldom be able to produce these kinds of insights. When selecting a
subject for a case study, researchers will therefore use information-oriented sampling, as opposed to random
Outlier cases (that is, those which are extreme, deviant or atypical) reveal more information than the
putatively representative case. Alternatively, a case may be selected as a key case, chosen because of the inherent
interest of the case or the circumstances surrounding it. Or it may be chosen because of researchers' in-depth local
knowledge; where researchers have this local knowledge they are in a position to “soak and poke” as Fenno
puts it,
and thereby to offer reasoned lines of explanation based on this rich knowledge of setting and circumstances. Three
types of cases may thus be distinguished:
1. Key cases
2. Outlier cases
3. Local knowledge cases
Whatever the frame of reference for the choice of the subject of the case study (key, outlier, local knowledge), there
is a distinction to be made between the subject and the object of the case study. The subject is the “practical,
historical unity”
through which the theoretical focus of the study is being viewed. The object is that theoretical
focus – the analytical frame. Thus, for example, if a researcher were interested in US resistance to communist
expansion as a theoretical focus, then the Korean War might be taken to be the subject, the lens, the case study
through which the theoretical focus, the object, could be viewed and explicated.
Beyond decisions about case selection and the subject and object of the study, decisions need to be made about
purpose, approach and process in the case study. Thomas
thus proposes a typology for the case study wherein
purposes are first identified (evaluative or exploratory), then approaches are delineated (theory-testing,
theory-building or illustrative), then processes are decided upon, with a principal choice being between whether the
study is to be single or multiple, and choices also about whether the study is to be retrospective, snapshot or
diachronic, and whether it is nested, parallel or sequential. It is thus possible to take many routes through this
typology, with, for example, an exploratory, theory-building, multiple, nested study, or an evaluative, theory-testing,
single, retrospective study. The typology thus offers many permutations for case study structure.
For more on case selection, see [11]
Generalizing from case studies
A critical case can be defined as having strategic importance in relation to the general problem. A critical case allows
the following type of generalization, ‘If it is valid for this case, it is valid for all (or many) cases.’ In its negative
form, the generalization would be, ‘If it is not valid for this case, then it is not valid for any (or only few) cases.’
The case study is also effective for generalizing using the type of test that Karl Popper called falsification, which
forms part of critical reflexivity.
Falsification is one of the most rigorous tests to which a scientific proposition can
be subjected: if just one observation does not fit with the proposition it is considered not valid generally and must
therefore be either revised or rejected. Popper himself used the now famous example of, "All swans are white," and
proposed that just one observation of a single black swan would falsify this proposition and in this way have general
significance and stimulate further investigations and theory-building. The case study is well suited for identifying
Case study
"black swans" because of its in-depth approach: what appears to be "white" often turns out on closer examination to
be "black." case
Galileo Galilei’s rejection of Aristotle’s law of gravity was based on a case study selected by information-oriented
sampling and not random sampling. The rejection consisted primarily of a conceptual experiment and later on of a
practical one. These experiments, with the benefit of hindsight, are self-evident. Nevertheless, Aristotle’s incorrect
view of gravity dominated scientific inquiry for nearly two thousand years before it was falsified. In his experimental
thinking, Galileo reasoned as follows: if two objects with the same weight are released from the same height at the
same time, they will hit the ground simultaneously, having fallen at the same speed. If the two objects are then stuck
together into one, this object will have double the weight and will according to the Aristotelian view therefore fall
faster than the two individual objects. This conclusion seemed contradictory to Galileo. The only way to avoid the
contradiction was to eliminate weight as a determinant factor for acceleration in free fall. Galileo’s experimentalism
did not involve a large random sample of trials of objects falling from a wide range of randomly selected heights
under varying wind conditions, and so on. Rather, it was a matter of a single experiment, that is, a case
study.(Flyvbjerg, 2006, p. 225-6) [11]
Galileo’s view continued to be subjected to doubt, however, and the Aristotelian view was not finally rejected until
half a century later, with the invention of the air pump. The air pump made it possible to conduct the ultimate
experiment, known by every pupil, whereby a coin or a piece of lead inside a vacuum tube falls with the same speed
as a feather. After this experiment, Aristotle’s view could be maintained no longer. What is especially worth noting,
however, is that the matter was settled by an individual case due to the clever choice of the extremes of metal and
feather. One might call it a critical case, for if Galileo’s thesis held for these materials, it could be expected to be
valid for all or a large range of materials. Random and large samples were at no time part of the picture. However it
was Galileo's view that was the subject of doubt as it was not reasonable enough to be the Aristotelian view. By
selecting cases strategically in this manner one may arrive at case studies that allow generalization.(Flyvbjerg, 2006,
p. 225-6) For more on generalizing from case studies, see [11]
The case study paradox
Case studies have existed as long as recorded history. Much of what is known about the empirical world has been
produced by case study research, and many of the classics in a long range of disciplines are case studies, including in
psychology, sociology, anthropology, history, education, economics, political science, management, geography,
biology, and medical science. Half of all articles in the top political science journals use case studies, for instance.
But there is a paradox here, as argued by Oxford professor Bent Flyvbjerg. At the same time that case studies are
extensively used and have produced canonical works, one may observe that the case study is generally held in low
regard, or is simply ignored, within the academy. Statistics on courses offered in universities confirm this. It has
been argued that the case study paradox exists because the case study is widely misunderstood as a research method.
Flyvbjerg argues that by clearing the misunderstandings about the case study, the case study paradox may be
Case study
Misunderstandings about case study research
Flyvbjerg (2006) identifies and corrects five prevalent misunderstandings about case study research:
1. General, theoretical knowledge is more valuable than concrete, practical knowledge.
2. One cannot generalize on the basis of an individual case and, therefore, the case study cannot contribute to
scientific development.
3. The case study is most useful for generating hypotheses, whereas other methods are more suitable for hypotheses
testing and theory building.
4. The case study contains a bias toward verification, i.e., a tendency to confirm the researcher’s preconceived
5. It is often difficult to summarize and develop general propositions and theories on the basis of specific case
History of the case study
It is generally believed that the case-study method was first introduced into social science by Frederic Le Play in
1829 as a handmaiden to statistics in his studies of family budgets. (Les Ouvriers Europeens (2nd edition, 1879).
The use of case studies for the creation of new theory in social sciences has been further developed by the
sociologists Barney Glaser and Anselm Strauss who presented their research method, Grounded theory, in 1967.
The popularity of case studies in testing hypotheses has developed only in recent decades. One of the areas in which
case studies have been gaining popularity is education and in particular educational evaluation.
Case studies have also been used as a teaching method and as part of professional development, especially in
business and legal education. The problem-based learning (PBL) movement is such an example. When used in
(non-business) education and professional development, case studies are often referred to as critical incidents.
When the Harvard Business School was started, the faculty quickly realized that there were no textbooks suitable to
a graduate program in business. Their first solution to this problem was to interview leading practitioners of business
and to write detailed accounts of what these managers were doing. Cases are generally written by business school
faculty with particular learning objectives in mind and are refined in the classroom before publication. Additional
relevant documentation (such as financial statements, time-lines, and short biographies, often referred to in the case
as "exhibits"), multimedia supplements (such as video-recordings of interviews with the case protagonist), and a
carefully crafted teaching note often accompany cases.
[1] Bent Flyvbjerg, 2011, "Case Study," (http:/ / www.sbs. ox. ac.uk/ centres/ bt/ directory/Documents/ CaseStudy4 2HBQR11PRINT. pdf) in
Norman K. Denzin and Yvonna S. Lincoln, eds., The Sage Handbook of Qualitative Research, 4th Edition (Thousand Oaks, CA: Sage), pp.
[2] Shepard, Jon; Robert W. Greene (2003). Sociology and You (http:// www.glencoe.com/ catalog/ index.php/ program?c=1675&s=21309&
p=4213&parent=4526). Ohio: Glencoe McGraw-Hill. pp. A-22. ISBN 0078285763. .
[3] Robert K. Yin. Case Study Research: Design and Methods. Fourth Edition. SAGE Publications. California, 2009. ISBN 978-1-4129-6099-1
(http:// www. sagepub. com/ booksProdDesc. nav?prodId=Book232182)
[4] G. Thomas (2011) A typology for the case study in social science following a review of definition, discourse and structure. Qualitative
Inquiry, 17, 6, 511-521
[5] Bent Flyvbjerg, 2006, "Five Misunderstandings About Case Study Research." (http:/ / flyvbjerg.plan. aau.dk/ Publications2006/
0604FIVEMISPUBL2006. pdf) Qualitative Inquiry, vol. 12, no. 2, April, pp. 219-245.; Bent Flyvbjerg, 2011, "Case Study," (http:/ / www.
sbs. ox. ac.uk/ centres/ bt/ directory/Documents/ CaseStudy4 2HBQR11PRINT. pdf) in Norman K. Denzin and Yvonna S. Lincoln, eds., The
Sage Handbook of Qualitative Research, 4th Edition (Thousand Oaks, CA: Sage), pp. 301-316.
[6] Siegfried Lamnek. Qualitative Sozialforschung. Lehrbuch. 4. Auflage. Beltz Verlag. Weihnhein, Basel, 2005
[7] R. Fenno (1986) Observation, context, and sequence in the study of politics. American Political Science Review, 80, 1, 3-15
[8] M. Wieviorka (1992) Case studies: history or sociology? In C.C. Ragin and H.S. Becker (Eds) What is a case? Exploring the foundations of
social inquiry. New York: Cambridge University Press.
Case study
[9] Gary Thomas, How to do your Case Study (Thousand Oaks: Sage, 2011)
[10] G. Thomas (2011) A typology for the case study in social science following a review of definition, discourse and structure. Qualitative
Inquiry, 17, 6, 511-521
[11] http:/ / flyvbjerg.plan. aau. dk/ Publications2006/ 0604FIVEMISPUBL2006.pdf
[12] Bent Flyvbjerg, 2011, "Case Study," (http:/ / www.sbs. ox. ac.uk/ centres/ bt/ directory/Documents/ CaseStudy4 2HBQR11PRINT. pdf) in
Norman K. Denzin and Yvonna S. Lincoln, eds., The Sage Handbook of Qualitative Research, 4th Edition (Thousand Oaks, CA: Sage), pp.
[13] Sister Mary Edward Healy, C. S. J. (1947). "Le Play's Contribution to Sociology: His Method". The American Catholic Sociological Review
8 (2): 97–110.
[14] Robert E. Stake, The Art of Case Study Research (Thousand Oaks: Sage, 1995). ISBN 080395767X
Useful Sources
• Baxter, P and Jack, S. (2008) Qualitative Case Study Methodology: Study design and implementation for novice
researchers, in The Qualitative Report, 13(4): 544-559. Available from (http:// www.nova. edu/ ssss/ QR/
QR13-4/ baxter.pdf)
• Dul, J. and Hak, T (2008). Case Study Methodology in Business Research. Oxford: Butterworth-Heinemann.
ISBN 978-0-7506-8196-4.
• Eisenhardt, K. M. (1989). Building theories from case study research. The Academy of Management Review, 14
(4), Oct, 532-550. doi:10.2307/258557
• Flyvbjerg, Bent, Making Social Science Matter: Why Social Inquiry Fails and How It Can Succeed Again
(Cambridge: Cambridge University Press, 2001). ISBN 052177568X
• Flyvbjerg, Bent. (2006). Five Misunderstandings About Case-Study Research, in Qualitative Inquiry, 12(2):
219-245. Available: (http:/ / flyvbjerg.plan.aau. dk/ Publications2006/ 0604FIVEMISPUBL2006.pdf)
• Flyvbjerg, Bent. (2011) " Case Study (http:/ / www.sbs. ox. ac. uk/ centres/ bt/ directory/Documents/
CaseStudy4 2HBQR11PRINT. pdf)," in Norman K. Denzin and Yvonna S. Lincoln, eds., The Sage Handbook of
Qualitative Research, 4th Edition. Thousand Oaks, CA: Sage, pp. 301-316.
• George, Alexander L. and Bennett,Andrew. (2005). Case studies and theory development in the social sciences.
London, MIT Press 2005. ISBN 0-262-57222-2
• Gerring, John. (2005) Case Study Research. New York: Cambridge University Press. ISBN 978-0-521-67656-4
• Hancké, Bob. (2009) Intelligent Research Design. A guide for beginning researchers in the social sciences.
Oxford University Press.
• Lijphart, Arend.(1971)Comparative Politics and the Comparative Method,in The American Political Science
Review, 65(3): 682-693. Available from (http:// www.jstor. org/stable/ 1955513)
• Ragin, Charles C. and Becker, Howard S. eds. (1992) What is a Case? Exploring the Foundations of Social
Inquiry Cambridge: Cambridge University Press. ISBN 0521421888
• Scholz, Roland W. and Tietje, Olaf. (2002) Embedded Case Study Methods. Integrating Quantitative and
Qualitative Knowledge. Sage Publications. Thousand Oaks 2002, Sage. ISBN 0761919465 ,9337270973
• Straits, Bruce C. and Singleton, Royce A. (2004) Approaches to Social Research, 4th ed.Oxford University Press.
ISBN 0195147944 Available from: (http:/ / www.oup. com/ us/ catalog/ general/subject/ Sociology/
TheoryMethods/ ?view=usa& ci=0195147944)
• Thomas, Gary (2011) How to do your Case Study: A Guide for Students and Researchers. Thousand Oaks: Sage.
Case study
External links
• Case Studies (http:/ /writing. colostate. edu/ guides/ research/ casestudy/ )
• Darden Business Case Studies (http:/ / store.darden.virginia.edu/ )
• ETH Zurich: Case studies in Environmental Sciences (http:// www.uns.ethz. ch/ translab/ )
• Globalens Business Case Studies (http:/ / globalens. com/ cases1. aspx)
• Case Studies in Science (http:// sciencecases. lib. buffalo.edu/ )
Central composite design
In statistics, a central composite design is an experimental design, useful in response surface methodology, for
building a second order (quadratic) model for the response variable without needing to use a complete three-level
factorial experiment.
After the designed experiment is performed, linear regression is used, sometimes iteratively, to obtain results. Coded
variables are often used when constructing this design.
The design consists of three distinct sets of experimental runs:
1. A factorial (perhaps fractional) design in the factors studied, each having two levels;
2. A set of center points, experimental runs whose values of each factor are the medians of the values used in the
factorial portion. This point is often replicated in order to improve the precision of the experiment;
3. A set of axial points, experimental runs identical to the centre points except for one factor, which will take on
values both below and above the median of the two factorial levels, and typically both outside their range. All
factors are varied in this way.
Design matrix
The design matrix for a central composite design experiment involving k factors is derived from a matrix, d,
containing the following three different parts corresponding to the three types of experimental runs:
1. The matrix F obtained from the factorial experiment. The factor levels are scaled so that its entries are coded as
+1 and −1.
2. The matrix C from the center points, denoted in coded variables as (0,0,0,...,0), where there are k zeros.
3. A matrix E from the axial points, with 2k rows. Each factor is sequentially placed at ±α and all other factors are
at zero. The value of α is determined by the designer; while arbitrary, some values may give the design desirable
properties. This part would look like:
Then d is the vertical concatenation:
Central composite design
The design matrix X used in linear regression is the horizontal concatenation of a column of 1s (intercept), d, and all
elementwise products of a pair of columns of d:
where d(i) represents the i
column in d.
Choosing α
There are many different methods to select a useful value of α. Let F be the number of points due to the factorial
design and T = 2k + n, the number of additional points, where n is the number of central points in the design.
Common values are as follows (Myers, 1971):
1. Orthogonal design:: , where ;
2. Rotatable design: α = F
(the design implemented by MATLAB’s ccdesign function).
Myers, Raymond H. Response Surface Methodology. Boston: Allyn and Bacon, Inc., 1971
Challenge-dechallenge-rechallenge (CDR) is a medical testing protocol in which a medicine or drug is
administered, withdrawn, then re-administered, while being monitored for adverse effects at each stage. The protocol
is used when statistical testing is inappropriate due to an idiosyncratic reaction by a specific individual, or a lack of
sufficient test subjects and unit of analysis is the individual. During the withdraw phase, the medication is allowed to
wash out of the system in order to determine what effect the medication is having on an individual.
Use in drug testing
CDR is one means of establishing the validity and benefits of medication in treating specific conditions
as well as
any adverse drug reactions. The Food and Drug Administration of the United States lists positive dechallenge
reactions (an adverse event which disappears on withdrawal of the medication) as well as negative (an adverse event
which continues after withdrawal), as well as positive rechallenge (symptoms re-occurring on re-administration) and
negative rechallenge (failure of a symptom to re-occur after re-administration).
It is one of the standard means of
assessing adverse drug reactions in France.
Fluoxetine and suicide
Peter Breggin asserted that there was an association between fluoxetine (Prozac) use and suicidal ideation. While his
research group were investigating the effectiveness and side effects of the medication, Breggin noticed that only
certain individuals responded to the medication with increased thoughts of suicide, and used the
challenge-dechallenge-rechallenge protocol in an effort to verify the link. Given the low occurrence rate of
sucidality, statistical testing was considered inappropriate.
Other researchers have similarly suggested that the
CDR is useful for researching the adverse effect of suicidality while taking fluoxetine, and Eli Lilly adopted the
protocol rather than randomized controlled trials when testing for increased risk of suicide.
In addition to
suicidality, akathisia is a reaction to medication which is suggested as amendable to a CDR protocol.

Clinical trials using a CDR protocol are also reported for clinicians attempting to assess the effects of a medication
on patients.
CDR has been suggested as a means of patients self-diagnosis, -treatment and -monitoring their own reactions to
[1] Spitzer WO (1986). "Importance of valid measurements of benefit and risk". Med Toxicol 1 Suppl 1: 74–8. PMID 3821430.
[2] "Guideline for Adverse Experience Reporting for Licensed Biological Products: Definitions" (http:// web. archive. org/web/
20071007215452/http:/ / www.fda.gov/ medwatch/ report/cberguid/ define.htm). Food and Drug Administration. Archived from the
original (http:// www. fda.gov/ medwatch/ report/cberguid/define.htm) on 2007-10-07. . Retrieved 2008-03-15.
[3] Begaud B (1984). "Standardized assessment of adverse drug reactions: the method used in France. Special workshop--clinical". Drug Inf J 18
(3–4): 275–81. PMID 10268556.
[4] Breggin, Ginger Ross; Breggin, Peter Roger (1995). Talking back to Prozac: what doctors won't tell you about today's most controversial
drug. New York: St. Martin's Paperbacks. ISBN 0-312-95606-1.
[5] Maris, RWM (2002-10-04). "Suicide and Neuropsychiatric Adverse Effects of SSRI Medications: Methodological Issues" (http:// www.
oism. info/en/ therapy/theory/ suicide_and_neuropsychiatric_adverse_effects_of_ssri.htm). Philadelphia, Pennsylvania. . Retrieved
[6] Healy D, Whitaker C (2003). "Antidepressants and suicide: risk-benefit conundrums". J Psychiatry Neurosci 28 (5): 331–7. PMC 193979.
PMID 14517576.
[7] Healy D (2003). "Lines of evidence on the risks of suicide with selective serotonin reuptake inhibitors". Psychother Psychosom 72 (2): 71–9.
doi:10.1159/000068691. PMID 12601224.
[8] Rothschild AJ, Locke CA (1991). "Reexposure to fluoxetine after serious suicide attempts by three patients: the role of akathisia". J Clin
Psychiatry 52 (12): 491–3. PMID 1752848.
[9] Charlton BG (2005). "Self-management of psychiatric symptoms using over-the-counter (OTC) psychopharmacology: the S-DTM therapeutic
model--Self-diagnosis, self-treatment, self-monitoring". Med. Hypotheses 65 (5): 823–8. doi:10.1016/j.mehy.2005.07.013. PMID 16111835.
Check weigher
Check weigher
Example checkweigher. Product passes on the
conveyor belt where it is weighed
A checkweigher is an automatic machine for checking the weight of
packaged commodities. It is normally found at the offgoing end of a
production process and is used to ensure that the weight of a pack of
the commodity is within specified limits. Any packs that are outside
the tolerance are taken out of line automatically.
A checkweigher can weigh in excess of 500 items per minute
(depending on carton size and accuracy requirements).
Checkweighers often incorporate additional checking devices such as
metal detectors and X-ray machines to enable other attributes of the
pack to be checked and acted upon accordingly.
A typical machine
A checkweigher incorporates a series of conveyor belts. Checkweighers are known also as belt weighers, in-motion
scales, conveyor scales, dynamic scales, and in-line scales. In filler applications, they are known as check scales.
Typically, there are three belts or chain beds:
• An infeed belt that may change the speed of the package and to bring it up or down to a speed required for
weighing. The infeed is also sometimes used as an indexer, which sets the gap between products to an optimal
distance for weighing. It sometimes has special belts or chains to position the product for weighing.
• A weigh belt. This is typically mounted on a weight transducer which can typically be a strain-gauge load cell or
a servo-balance (also known as a force-balance), or sometimes known as a split-beam. Some older machines may
pause the weigh bed belt before taking the weight measurement. This may limit line speed and throughput.
• A reject belt that provides a method of removing an out-of-tolerance package from the conveyor line. The reject
can vary by application. Some require an air-amplifier to blow small products off the belt, but heavier
applications require a linear or radial actuator. Some fragile products are rejected by "dropping" the bed so that
the product can slide gently into a bin or other conveyor.
For high-speed precision scales, a load cell using electromagnetic force restoration(EMFR) is appropriate. This kind
of system charges an inductive coil, effectively floating the weigh bed in an electromagnetic field. When the weight
is added, the movement of a ferrous material through that coil causes a loss of ElectroMagnetic Force. A precision
circuit charges the coil back to its original charge. The amount added to the coil is precisely measured. The voltage
produced is filtered and sampled into digital data. That voltage is then passed through a Digital Signal Processor
(DSP) filter and ring-buffer to further reduce ambient and digital noise and delivered to a computerized controller.
It is usual for a built-in computer to take many weight readings from the transducer over the time that the package is
on the weigh bed to ensure an accurate weight reading.
Calibration is critical. A lab scale, which usually is in an isolated chamber pressurized with dry nitrogen(pressurized
at sea level) can weigh an object within plus or minus 100th of a gram, but ambient air pressure is a factor. This is
straightforward when there is no motion, but in motion there is a factor that is not obvious-noise from the motion of
a weigh belt, vibration, air-conditioning or refrigeration which can cause drafts. Torque on a load cell causes erratic
A dynamic, in-motion checkweigher takes samples, and analyzes them to form an accurate weight over a given time
period. In most cases, there is a trigger from an optical(or ultrasonic) device to signal the passing of a package. Once
Check weigher
the trigger fires, there is a delay set to allow the package to move to the "sweet spot" (center) of the weigh bed to
sample the weight. The weight is sampled for a given duration. If either of these times are wrong, the weight will be
wrong. There seems to be no scientific method to predict these timings. Some systems have a "graphing" feature to
do this, but it is generally more of an empirical method that works best.
• A reject conveyor to enable the out-of-tolerance packages to be removed from the normal flow while still moving
at the conveyor velocity. The reject mechanism can be one of several types. Among these are a simple pneumatic
pusher to push the reject pack sideways from the belt, a diverting arm to sweep the pack sideways and a reject belt
that lowers or lifts to divert the pack vertically. A typical checkweigher usually has a bin to collect the
out-of-tolerance packs.
Tolerance methods
There are several tolerance methods:
• The traditional "minimum weight" system where weights below a specified weight are rejected. Normally the
minimum weight is the weight that is printed on the pack or a weight level that exceeds that to allow for weight
losses after production such as evaporation of commodities that have a moisture content. The larger wholesale
companies have mandated that any product shipped to them have accurate weight checks such that a customer can
be confident that they are getting the amount of product for which they paid. These wholesalers charge large fees
for inaccurately filled packages.
• The European Average Weight System which follows three specified rules known as the "Packers Rules".
• Other published standards and regulations such as NIST Handbook 133
Data Collection
There is also a requirement under the European Average Weight System that data collected by checkweighers is
archived and is available for inspection. Most modern checkweighers are therefore equipped with communications
ports to enable the actual pack weights and derived data to be uploaded to a host computer. This data can also be
used for management information enabling processes to be fine-tuned and production performance monitored.
Checkweighers that are equipped with high speed communications such as Ethernet ports are capable of integrating
themselves in to groups such that a group of production lines that are producing identical products can be considered
as one production line for the purposes of weight control. For example, a line that is running with a low average
weight can be complemented by another that is running with a high average weight such that the aggregate of the
two lines will still comply with rules.
An alternative is to program the checkweigher to check bands of different weight tolerances. For instance, the total
valid weight is 100 grams ±15 grams. This means that the product can weigh 85 g - 115 g. However, it is obvious
that if you are producing 10,000 packs a day, and most of your packs are 110 g, you are losing 100 kg of product. If
you try to run closer to 85 g, you may have a high rejection rate.
EXAMPLE: A checkweigher is programmed to indicate 5 zones with resolution to 1 g:
1. Under Reject.... the product weighs 84.9 g or less
2. Under OK........ the product weighs 85 g, but less than 95 g
3. Valid........... the product weighs 96 g, but less than 105 g
4. Over OK......... the product weighs 105 g, and less than 114 g
5. Over Reject..... the product weighs over the 115 g limit
With a check weigher programmed as a zone checkweigher, the data collection over the networks, as well as local
statistics, can indicate the need to check the settings on the upstream equipment to better control flow into the
packaging. In some cases the dynamic scale sends a signal to a filler, for instance, in real-time, controlling the actual
flow into a barrel, can, bag, etc. In many cases a checkweigher has a light-tree with different lights to indicate the
Check weigher
variation of the zone weight of each product.
Application considerations
Speed and accuracy that can be achieved by a checkweigher is influenced by the following:
• Pack length
• Pack weight
• Line speed required
• Pack content (solid or liquid)
• Motor technology
• Stabilization time of the weight transducer
• Airflow causing readings in error
• Vibrations from machinery causing unnecessary rejects
• Sensitivity to temperature, as the load cells can be temperature sensitive
• Quality control tape
• Band with the principles of quality control weight
• Weighing tape quality control principles
• Automatic weighing tape
• Weight control tape
• Food weight control tape
• Drug for weight control tape
• Detergent industry for the quality control tape
• Weight control tape for chemical industry
• Weight control tape for the pasta industry
• Flour weight control tape for the industry
In-motion scales are dynamic machines that can be designed to perform thousands of tasks. Some are used as simple
caseweighers at the end of the conveyor line to ensure the overall finished package product is within its target
An in motion conveyor checkweigher can be used to detect missing pieces of a kit, such as a cell phone package that
is missing the manual, or other collateral. Checkweighers are typically used on the incoming conveyor chain, and the
output pre-packaging conveyor chain in a poultry processing plant. The bird is weighed when it comes onto the
conveyor, then after processing and washing at the end, the network computer can determine whether or not the bird
absorbed too much water, which as it is further processed, will be drained, making the bird under its target weight.
A high speed conveyor scale can be used to change the pacing, or pitch of the products on the line by speeding, or
slowing the product speed to change the distance between packs before reaching a different speed going into a
conveyor machine that is boxing multiple packs into a box.
A checkweigher can be used to count packs, and the aggregate (total) weight of the boxes going onto a pallet for
shipment, including the ability to read each package's weight and cubic dimensions. The controller computer can
print a shipping label and a bar-code label to identify the weight, the cubic dimensions, ship-to address, and other
data for machine ID through the shipment of the product. A receiving checkweigher for the shipment can read the
label with a bar code scanner, and determine if the shipment is as it was before the transportation carrier received it
from the shipper's loading dock, and determine if a box is missing, or something was pilfered or broken in transit.
Checkweighers are also used for Quality management. For instance, raw material for machining a bearing is weighed
prior to beginning the process, and after the process, the quality inspector expects that a certain amount of metal was
Check weigher
removed in the finishing process. The finished bearings are checkweighed, and bearings over- or underweight are
rejected for physical inspection. This is a benefit to the inspector, since he can have a high confidence that the ones
not rejected are within machining tolerance.A common usage is for throttling plastic extruders such that a bottle used
to package detergent meets that requirements of the finished packager.
Quality management can use a checkweigher for Nondestructive testing to verify finished goods using common
Evaluation methods to detect pieces missing from a "finished" product, such as grease from a bearing, or a missing
roller within the housing.
Checkweighers can be built with metal detectors, x-ray machines, open-flap detection, bar-code scanners,
holographic scanners, temperature sensors, vision inspectors, timing screws to set the timing and spacing between
product, indexing gates and concentrator ducts to line up the product into a designated area on the conveyor. An
industrial motion checkweigher can sort products from a fraction of a gram to many, many kilograms. In English
units, is this from less than 100th of an ounce to as much as 500 lbs or more. Specialized checkweighers can weigh
commercial aircraft, and even find their center-of-gravity.
Checkweighers can be very high speed, processing products weighing fractions of a gram at over 100m/m (meters
per minute, such as pharmaceuticals, and 200 lb bags of produce at over 100fpm(feet per minute). They can be
designed in many shapes and sizes, hung from ceilings, raised on mezzanines, operated in ovens or in refrigerators.
Their conveying medium can be industrial belting, low-static belting, chains similar to bicycle chains(but much
smaller), or interlocked chain belts of any width. They can have chain belts made of special materials, different
polymers, metals, etc.
Checkweighers are used in cleanrooms, dry atmosphere environments, wet environments, produce barns, food
processing, drug processing, etc. Checkweighers are specified by the kind of environment, and the kind of cleaning
will be used. Typically, a checkweigher for produce is made of mild steel, and one that will be cleaned with harsh
chemicals, such as bleach, will be made with all stainless steel parts, even the Load cells. These machines are labeled
"full washdown", and must have every part and component specified to survive the washdown environment.
Checkweighers are operated in some applications for extremely long periods of time- 24/7 year round. Generally,
conveyor lines are not stopped unless there is maintenance required, or there is an emergency stop, called an E-stop.
Checkweighers operating in high density conveyor lines may have numerous special equipments in their design to
ensure that if an E-stop occurs, all power going to all motors is removed until the E-stop is cleared and reset.
[1] "The Weights and Measures(Packaged Goods)Regulations 2006" (http:/ / www.nmo. bis. gov.uk/ Documents/ PGR guidance 13 august
2007.pdf), NWML, Dept for Innovation, Universities & Skills URN 07/1343, 2006,
[2] Checking the Net Contents of Packaged Goods, NIST Handbook 133 (http:// ts. nist. gov/ WeightsAndMeasures/ h1334-05.cfm), Fourth
Edition, 2005,
• Yam, K. L., "Encyclopedia of Packaging Technology", John Wiley & Sons, 2009, ISBN 978-0-470-08704-6
Class rank
Class rank
Class rank is a measure of how a student's performance compares to other students in his or her class. It is
commonly also expressed as a percentile. For instance, a student may have a GPA better than 750 of his or her
classmates in a graduating class of 800. In this case, his or her class rank would be 49, and his or her class percentile
would be 94.
Use in high schools
The use of class rank is currently in practice at about 45% of American high schools. Large public schools are more
likely to rank their students than small private schools.
Because many admissions officers were frustrated that
many applications did not contain a rank, some colleges are using other information provided by high schools, in
combination with a student's G.P.A. to estimate a student's class rank. Many colleges say that the absence of a class
rank forces them to put more weight on standardized test scores.
Use in college admissions
Colleges often use class rank as a factor in college admissions, although because of differences in grading standards
between schools, admissions officers have begun to attach less weight to this factor, both for granting admission, and
for awarding scholarships. Class Rank is more likely to be used at large schools that are more formulaic in their
admissions programs.
Percent plans
Some U.S. states guarantee that students who achieve a high enough class rank at their high school will be admitted
into a state university, in a practice known as percent plans. Students in California who are in the top four percent of
their graduating class, and students in Florida who are in the top twenty percent of their graduating class are
guaranteed admission to some state school, but not necessarily any particular institution. Valedictorians at Alaskan
high schools are all given free tuition through University of Alaska. The top eight percent of students in Texas high
schools are guaranteed admission to the state school of their choice,
excluding the University of Texas which only
allocates 75% of its incoming freshman class seats to top 8% members.
[1] "Counselor's Connection - Apply to College: Class Rank and College Admissions" (http:/ / www.collegeboard. com/ prof/counselors/ apply/
14. html). The College Board. . Retrieved July 5, 2007.
[2] Finder, Alex (March 5, 2006). "Schools Avoid Class Ranking, Vexing Colleges" (http:// www.nytimes.com/ 2006/ 03/ 05/ education/
05rank.html). The New York Times. . Retrieved July 5, 2007.
[3] Lang, David M. (Spring 2007). "Class Rank, GPA, and Valedictorians: How High Schools Rank Students" (http:// www.highbeam. com/
doc/1P3-1298329431.html) (PDF (login required - free trial available)). American Secondary Education 35 (2): 36–48. ISSN 0003-1003. .
Retrieved 2007-07-06.
[4] "The University of Texas at Austin to Automatically Admit Top 10 Percent of High School Graduates for 2011" (http:// www.utexas. edu/
news/ 2009/ 09/ 16/ top8_percent/). . Retrieved 2010-04-05.
External links
• DESPERATELY SEEKING DIVERSITY; The 10 Percent Solution (http:// query.nytimes. com/ gst/ fullpage.
html?res=9507E5D6123DF937A25757C0A9649C8B63& partner=rssnyt&emc=rss)
Clerk of Works
Clerk of Works
Clerks of Works are the most highly qualified non-commissioned tradesmen in the Royal Engineers. The
qualification can be held in three specialisations: Electrical, Mechanical and Construction. The clerk of works (or
clerk of the works), often abbreviated CoW, is employed by the architect or client on a construction site. The role is
primarily to represent the interests of the client in regard to ensuring the quality of both materials and workmanship
are in accordance with the design information such as specification and engineering drawings, in addition to
recognized quality standards. The role is defined in standard forms of contract such as those published by the Joint
Contracts Tribunal.
Historically the CoW was employed by the architect on behalf of a client, or by Local Authorities to oversee public
Maître d'Oeuvre (master of work) is a term used in many European jurisdictions for the office that carries out this
job in major projects; the Channel Tunnel project had such an office. In Italy the term used is direttore dei lavori
(manager of the works).
Origins of the title
The job title Clerk of Works is believed to derive from the thirteenth century when Monks and Priests (i.e., "clerics"
or "clerks") were accepted as being more literate than the builders of the age and took on the responsibility of
supervising the works associated with the erection of churches and other religious property.
As craftsmen and
masons became more educated they in turn took on the role, but the title did not change. By the nineteenth century
the role had expanded to cover the majority of building works, and the Clerk of Works was drawn from experienced
tradesmen who had wide knowledge and understanding of the building process.
The role
The role, to this day, is based on the impartiality of the Clerk of Works in ensuring value for money for the client -
rather than the contractor - is achieved through rigorous and detailed inspection of materials and workmanship
throughout the build process. In many cases, the traditional title has been discarded to comply with modern trends,
such as Site Inspector, Architectural Inspector and Quality Inspector, but the requirement for the role remains
unchanged since the origins of the title.
The Clerk of Works is a very isolated profession on site. He/she is the person that must ensure quality of both
materials and workmanship and, to this end, must be absolutely impartial and independent in his decisions and
judgements. He/she cannot normally, by virtue of the quality role, be employed by the contractor - only the client,
normally by the architect on behalf of the client. His/her role is not to judge, but simply to report all occurrences that
are relevant to the role.
Notable Clerks of Works
• Geoffrey Chaucer (1343–1400) was an English author, poet, philosopher, bureaucrat, courtier, diplomat and Clerk
of the King's Works.
• John Louth was appointed first Clerk of Works of the Board of Ordnance by Henry V in 1414 along with
Nicholas Merbury, Master of Ordnance (the Royal Artillery, Royal Engineers & Royal Army Ordnance Corps can
all trace their origins to this date).
• Lord Chancellor and Bishop of Winchester, William of Wykeham (1323–1404) was Clerk of the King's Works.
Clerk of Works
The Institute of Clerks of Works and Construction Inspectorate of Great
Britain Incorporated
The ICWCI - motto: Potestate, Probitate et Vigilantia (Ability, Integrity and Vigilance) - is the professional body
that supports quality construction through inspection. As a membership organisation it provides a support network of
meeting centres, technical advice, publications and events to help keep members up to date with the ever changing
construction industry.
Post nominals for members are FICWCI (Fellow), MICWCI (Member) and LICWCI (Licentiate).
The Institute was founded in 1882 as the the Clerk of Works Association, becoming the Incorporated Clerk of Works
Association of Great Britain in 1903. In 1947, its name was amended again to the Institute of Clerks of Works of
Great Britain Incorporated, a title it retained until 2009 when it was expanded to the Institute of Clerks of Works and
Construction Inspectorate of Great Britain Incorporated.
The organisation was originally founded to allow those that were required to operate in isolation on site, a central
organisation to look after the interests of their chosen profession, be it through association with other professional
bodies, educational means or simply through social intercourse amongst their own peers and contemporaries.
Essential to this, as the Institute developed, was the development of a central body that could lobby Parliament in
relation to their profession, and the quality issues that it stands for.
Although the means of construction, the training of individuals and the way in which individuals are employed have
changed dramatically over the years, the principles for which the Institute was originally formed remain sacrosanct.
Experience in the many facets of the building trade is essential and, in general terms, most practitioners will have
"come from the tools", though further third level education in the Built Environment is essential.
'Building on Quality' Awards
The Institute of Clerks of Works and Construction Inspectorate hold the biannual Building on Quality Awards, and
nominations are accepted from all involved in quality site inspection regardless whether they are members of the
institute or not. Categories include New Build, Civil Engineering and Refurbishment/Mechanical and Electrical.
Judging is based on the Clerk of Works ability, his/her contribution to the projects he/she is involved with, his/her
record keeping and reports, and his/her commitment to the role of Clerk of Works.
Awards given in each category are Winner, Highly Commended and Commended. The Overall Winner is chosen
from all categories and is widely regarded to be the highest accolade that can be awarded to a Clerk of Works in
recognition of his work.
2009 Award Winners:
• Overall Winner - Les Howard MICWCI of Leixlip, County Kildare, Ireland for his involvement with the New
Eircom Headquarters in Dublin.
• New Build – Peter McGuone FICWCI for involvement with Altnagelvin Hospital, Londonderry, Northern
• Refurbishment – Peter Airey MICWCI for involvement with Eden Court Theatre, Inverness, Scotland.
• New Build / Refurbishment – Allan Sherwood MICWCI for involvement with The Spa, Bridlington, England.
• Civil Engineering – Mike Readman FICWCI for involvement with the A590 High and Low Newton Bypass,
Cumbria, England.
• Special Judges Award – Carol Heidschuster MICWCI for involvement with Lincoln Cathedral, England.
Clerk of Works
ICWCI Meeting Centres
Cumbria and North Lancashire, Deeside, Dublin, East Anglia, East Midlands, Gibraltar, Home Counties North,
Hong Kong, Isle of Man, London, Merseyside, North Cheshire, North East, Northern, Northern Ireland, Scotland,
South Wales, Southern, Staffordshire and District, Western Counties.
External links
• The Institute of Clerks of Works and Construction Inspectorate
• JCT Website
[1] http:/ / www. icwgb. org/page_viewer. asp?page=History+ of+ICWCI& pid=27
[2] http:// www. icwgb. org/
[3] http:/ / www. jctltd. co. uk/ stylesheet. asp?file=18062003153316
Clinical trial
Clinical trials are a set of procedures in medical research conducted to allow safety (or more specifically,
information about adverse drug reactions and adverse effects of other treatments) and efficacy data to be collected
for health interventions (e.g., drugs, diagnostics, devices, therapy protocols). These trials can take place only after
satisfactory information has been gathered on the quality of the non-clinical safety, and Health Authority/Ethics
Committee approval is granted in the country where the trial is taking place.
Depending on the type of product and the stage of its development, investigators enroll healthy volunteers and/or
patients into small pilot studies initially, followed by larger scale studies in patients that often compare the new
product with the currently prescribed treatment. As positive safety and efficacy data are gathered, the number of
patients is typically increased. Clinical trials can vary in size from a single center in one country to multicenter trials
in multiple countries.
Due to the sizable cost a full series of clinical trials may incur, the burden of paying for all the necessary people and
services is usually borne by the sponsor who may be a governmental organization, a pharmaceutical, or
biotechnology company. Since the diversity of roles may exceed resources of the sponsor, often a clinical trial is
managed by an outsourced partner such as a contract research organization or a clinical trials unit in the academic
Clinical trials often involve patients with specific health conditions who then benefit from receiving otherwise
unavailable treatments. In early phases, participants are healthy volunteers who receive financial incentives for their
inconvenience. During dosing periods, study subjects typically remain on site at the unit for durations of anything
from 1 to 30 nights, occasionally longer, although this is not always required.
In planning a clinical trial, the sponsor or investigator first identifies the medication or device to be tested. Usually,
one or more pilot experiments are conducted to gain insights for design of the clinical trial to follow. In medical
jargon, effectiveness is how well a treatment works in practice and efficacy is how well it works in a clinical trial. In
the U.S., the elderly comprise only 14% of the population but they consume over one-third of drugs.
Despite this,
they are often excluded from trials because their more frequent health issues and drug use produce unreliable data.
Women, children, and people with unrelated medical conditions are also frequently excluded.
Clinical trial
In coordination with a panel of expert investigators (usually physicians well known for their publications and clinical
experience), the sponsor decides what to compare the new agent with (one or more existing treatments or a placebo),
and what kind of patients might benefit from the medication or device. If the sponsor cannot obtain enough patients
with this specific disease or condition at one location, then investigators at other locations who can obtain the same
kind of patients to receive the treatment would be recruited into the study.
During the clinical trial, the investigators: recruit patients with the predetermined characteristics, administer the
treatment(s), and collect data on the patients' health for a defined time period. These patients are volunteers and they
are not paid for participating in clinical trials. These data include measurements like vital signs, concentration of the
study drug in the blood, and whether the patient's health improves or not. The researchers send the data to the trial
sponsor who then analyzes the pooled data using statistical tests.
Some examples of what a clinical trial may be designed to do:
• Assess the safety and effectiveness of a new medication or device on a specific kind of patient (e.g., patients who
have been diagnosed with Alzheimer's disease)
• Assess the safety and effectiveness of a different dose of a medication than is commonly used (e.g., 10 mg dose
instead of 5 mg dose)
• Assess the safety and effectiveness of an already marketed medication or device for a new indication, i.e. a
disease for which the drug is not specifically approved
• Assess whether the new medication or device is more effective for the patient's condition than the already used,
standard medication or device ("the gold standard" or "standard therapy")
• Compare the effectiveness in patients with a specific disease of two or more already approved or common
interventions for that disease (e.g., Device A vs. Device B, Therapy A vs. Therapy B)
Note that while most clinical trials compare two medications or devices, some trials compare three or four
medications, doses of medications, or devices against each other.
Except for very small trials limited to a single location, the clinical trial design and objectives are written into a
document called a clinical trial protocol. The protocol is the 'operating manual' for the clinical trial and ensures that
researchers in different locations all perform the trial in the same way on patients with the same characteristics. (This
uniformity is designed to allow the data to be pooled.) A protocol is always used in multicenter trials.
Because the clinical trial is designed to test hypotheses and rigorously monitor and assess what happens, clinical
trials can be seen as the application of the scientific method, and specifically the experimental step, to understanding
human or animal biology.
The most commonly performed clinical trials evaluate new drugs, medical devices (like a new catheter), biologics,
psychological therapies, or other interventions. Clinical trials may be required before the national regulatory
approves marketing of the drug or device, or a new dose of the drug, for use on patients.
The history of clinical trials before 1750 is brief.

The concepts behind clinical trials, however, are ancient. The Book of Daniel verses 12 through 15, for instance,
describes a planned experiment with both baseline and follow-up observations of two groups who either partook of,
or did not partake of, "the King's meat" over a trial period of ten days. Persian physician and philosopher, Avicenna,
gave such inquiries a more formal structure.
In The Canon of Medicine in 1025 AD, he laid down rules for the
experimental use and testing of drugs and wrote a precise guide for practical experimentation in the process of
discovering and proving the effectiveness of medical drugs and substances.
He laid out the following rules and
principles for testing the effectiveness of new drugs and medications:

1. The drug must be free from any extraneous accidental quality.
2. It must be used on a simple, not a composite, disease.
Clinical trial
3. The drug must be tested with two contrary types of diseases, because sometimes a drug cures one disease by its
essential qualities and another by its accidental ones.
4. The quality of the drug must correspond to the strength of the disease. For example, there are some drugs whose
heat is less than the coldness of certain diseases, so that they would have no effect on them.
5. The time of action must be observed, so that essence and accident are not confused.
6. The effect of the drug must be seen to occur constantly or in many cases, for if this did not happen, it was an
accidental effect.
7. The experimentation must be done with the human body, for testing a drug on a lion or a horse might not prove
anything about its effect on man.
One of the most famous clinical trials was James Lind's demonstration in 1747 that citrus fruits cure scurvy.
compared the effects of various different acidic substances, ranging from vinegar to cider, on groups of afflicted
sailors, and found that the group who were given oranges and lemons had largely recovered from scurvy after 6 days.
Frederick Akbar Mahomed (d. 1884), who worked at Guy's Hospital in London,
made substantial contributions to
the process of clinical trials during his detailed clinical studies, where "he separated chronic nephritis with secondary
hypertension from what we now term essential hypertension." He also founded "the Collective Investigation Record
for the British Medical Association; this organization collected data from physicians practicing outside the hospital
setting and was the precursor of modern collaborative clinical trials."
One way of classifying clinical trials is by the way the researchers behave.
• In an observational study, the investigators observe the subjects and measure their outcomes. The researchers do
not actively manage the study. An example is the Nurses' Health Study.
• In an interventional study, the investigators give the research subjects a particular medicine or other intervention.
Usually, they compare the treated subjects to subjects who receive no treatment or standard treatment. Then the
researchers measure how the subjects' health changes.
Another way of classifying trials is by their purpose. The U.S. National Institutes of Health (NIH) organizes trials
into five (5) different types:
• Prevention trials: look for better ways to prevent disease in people who have never had the disease or to prevent a
disease from returning. These approaches may include medicines, vitamins, vaccines, minerals, or lifestyle
• Screening trials: test the best way to detect certain diseases or health conditions.
• Diagnostic trials: conducted to find better tests or procedures for diagnosing a particular disease or condition.
• Treatment trials: test experimental treatments, new combinations of drugs, or new approaches to surgery or
radiation therapy.
• Quality of life trials: explore ways to improve comfort and the quality of life for individuals with a chronic illness
(a.k.a. Supportive Care trials).
• Compassionate use trials or expanded access: provide partially tested, unapproved therapeutics prior to a small
number of patients that have no other realistic options. Usually, this involves a disease for which no effective
therapy exists, or a patient that has already attempted and failed all other standard treatments and whose health is
so poor that he does not qualify for participation in randomized clinical trials.
Usually, case by case approval
must be granted by both the FDA and the pharmaceutical company for such exceptions.
Clinical trial
A fundamental distinction in evidence-based medicine is between observational studies and randomized controlled
trials. Types of observational studies in epidemiology such as the cohort study and the case-control study provide
less compelling evidence than the randomized controlled trial. In observational studies, the investigators only
observe associations (correlations) between the treatments experienced by participants and their health status or
A randomized controlled trial is the study design that can provide the most compelling evidence that the study
treatment causes the expected effect on human health.
Currently, some Phase II and most Phase III drug trials are designed as randomized, double blind, and
• Randomized: Each study subject is randomly assigned to receive either the study treatment or a placebo.
• Blind: The subjects involved in the study do not know which study treatment they receive. If the study is
double-blind, the researchers also do not know which treatment is being given to any given subject. This 'blinding'
is to prevent biases, since if a physician knew which patient was getting the study treatment and which patient
was getting the placebo, he/she might be tempted to give the (presumably helpful) study drug to a patient who
could more easily benefit from it. In addition, a physician might give extra care to only the patients who receive
the placebos to compensate for their ineffectiveness. A form of double-blind study called a "double-dummy"
design allows additional insurance against bias or placebo effect. In this kind of study, all patients are given both
placebo and active doses in alternating periods of time during the study.
• Placebo-controlled: The use of a placebo (fake treatment) allows the researchers to isolate the effect of the study
Although the term "clinical trials" is most commonly associated with the large, randomized studies typical of Phase
III, many clinical trials are small. They may be "sponsored" by single physicians or a small group of physicians, and
are designed to test simple questions. In the field of rare diseases sometimes the number of patients might be the
limiting factor for a clinical trial. Other clinical trials require large numbers of participants (who may be followed
over long periods of time), and the trial sponsor is a private company, a government health agency, or an academic
research body such as a university.
Active comparator studies
Of note, during the last ten years or so it has become a common practice to conduct "active comparator" studies (also
known as "active control" trials). In other words, when a treatment exists that is clearly better than doing nothing for
the subject (i.e. giving them the placebo), the alternate treatment would be a standard-of-care therapy. The study
would compare the 'test' treatment to standard-of-care therapy.
A growing trend in the pharmacology field involves the use of third-party contractors to obtain the required
comparator compounds. Such third parties provide expertise in the logistics of obtaining, storing, and shipping the
comparators. As an advantage to the manufacturer of the comparator compounds, a well-established comparator
sourcing agency can alleviate the problem of parallel importing (importing a patented compound for sale in a country
outside the patenting agency's sphere of influence).
Clinical trial
Clinical trial protocol
A clinical trial protocol is a document used to gain confirmation of the trial design by a panel of experts and
adherence by all study investigators, even if conducted in various countries.
The protocol describes the scientific rationale, objective(s), design, methodology, statistical considerations, and
organization of the planned trial. Details of the trial are also provided in other documents referenced in the protocol
such as an Investigator's Brochure.
The protocol contains a precise study plan for executing the clinical trial, not only to assure safety and health of the
trial subjects, but also to provide an exact template for trial conduct by investigators at multiple locations (in a
"multicenter" trial) to perform the study in exactly the same way. This harmonization allows data to be combined
collectively as though all investigators (referred to as "sites") were working closely together. The protocol also gives
the study administrators (often a contract research organization or CRO) as well as the site team of physicians,
nurses and clinic administrators a common reference document for site responsibilities during the trial.
The format and content of clinical trial protocols sponsored by pharmaceutical, biotechnology or medical device
companies in the United States, European Union, or Japan has been standardized to follow Good Clinical Practice
issued by the International Conference on Harmonization of Technical Requirements for Registration of
Pharmaceuticals for Human Use (ICH).
Regulatory authorities in Canada and Australia also follow ICH
guidelines. Some journals, e.g. Trials, encourage trialists to publish their protocols in the journal.
Design features
Informed consent
An essential component of initiating a clinical trial is to recruit study subjects following procedures using a signed
document called "informed consent".
Informed consent is a legally-defined process of a person being told about key facts involved in a clinical trial before
deciding whether or not to participate. To fully describe participation to a candidate subject, the doctors and nurses
involved in the trial explain the details of the study using terms the person will understand. Foreign language
translation is provided if the participant's native language is not the same as the study protocol.
The research team provides an informed consent document that includes trial details, such as its purpose, duration,
required procedures, risks, potential benefits and key contacts. The participant then decides whether or not to sign
the document in agreement. Informed consent is not an immutable contract, as the participant can withdraw at any
time without penalty.
Statistical power
The number of patients enrolled in a study has a large bearing on the ability of the study to reliably detect the size of
the effect of the study intervention. This is described as the "power" of the trial. The larger the sample size or
number of participants in the trial, the greater the statistical power.
However, in designing a clinical trial, this consideration must be balanced with the fact that more patients make for a
more expensive trial. The power of a trial is not a single, unique value; it estimates the ability of a trial to detect a
difference of a particular size (or larger) between the treated (tested drug/device) and control (placebo or standard
treatment) groups. By example, a trial of a lipid-lowering drug versus placebo with 100 patients in each group might
have a power of .90 to detect a difference between patients receiving study drug and patients receiving placebo of
10 mg/dL or more, but only have a power of .70 to detect a difference of 5 mg/dL.
Clinical trial
Placebo groups
Merely giving a treatment can have nonspecific effects, and these are controlled for by the inclusion of a placebo
group. Subjects in the treatment and placebo groups are assigned randomly and blinded as to which group they
belong. Since researchers can behave differently to subjects given treatments or placebos, trials are also
doubled-blinded so that the researchers do not know to which group a subject is assigned.
Assigning a person to a placebo group can pose an ethical problem if it violates his or her right to receive the best
available treatment. The Declaration of Helsinki provides guidelines on this issue.
Clinical trials involving new drugs are commonly classified into four phases. Each phase of the drug approval
process is treated as a separate clinical trial. The drug-development process will normally proceed through all four
phases over many years. If the drug successfully passes through Phases I, II, and III, it will usually be approved by
the national regulatory authority for use in the general population. Phase IV are 'post-approval' studies.
Before pharmaceutical companies start clinical trials on a drug, they conduct extensive pre-clinical studies.
Pre-clinical studies
It involves in vitro (test tube or cell culture) and in vivo (animal) experiments using wide-ranging doses of the study
drug to obtain preliminary efficacy, toxicity and pharmacokinetic information. Such tests assist pharmaceutical
companies to decide whether a drug candidate has scientific merit for further development as an investigational new
Phase 0
Phase 0 is a recent designation for exploratory, first-in-human trials conducted in accordance with the United States
Food and Drug Administration's (FDA) 2006 Guidance on Exploratory Investigational New Drug (IND) Studies.
Phase 0 trials are also known as human microdosing studies and are designed to speed up the development of
promising drugs or imaging agents by establishing very early on whether the drug or agent behaves in human
subjects as was expected from preclinical studies. Distinctive features of Phase 0 trials include the administration of
single subtherapeutic doses of the study drug to a small number of subjects (10 to 15) to gather preliminary data on
the agent's pharmacodynamics (what the drug does to the body) and pharmacokinetics (what the body does to the
A Phase 0 study gives no data on safety or efficacy, being by definition a dose too low to cause any therapeutic
effect. Drug development companies carry out Phase 0 studies to rank drug candidates in order to decide which has
the best pharmacokinetic parameters in humans to take forward into further development. They enable go/no-go
decisions to be based on relevant human models instead of relying on sometimes inconsistent animal data.
Questions have been raised by experts about whether Phase 0 trials are useful, ethically acceptable, feasible, speed
up the drug development process or save money, and whether there is room for improvement.
Phase I
Phase I trials are the first stage of testing in human subjects. Normally, a small (20-100) group of healthy volunteers
will be selected. This phase includes trials designed to assess the safety (pharmacovigilance), tolerability,
pharmacokinetics, and pharmacodynamics of a drug. These trials are often conducted in an inpatient clinic, where
the subject can be observed by full-time staff. The subject who receives the drug is usually observed until several
half-lives of the drug have passed. Phase I trials also normally include dose-ranging, also called dose escalation,
studies so that the appropriate dose for therapeutic use can be found. The tested range of doses will usually be a
fraction of the dose that causes harm in animal testing. Phase I trials most often include healthy volunteers. However,
Clinical trial
there are some circumstances when real patients are used, such as patients who have terminal cancer or HIV and lack
other treatment options. "The reason for conducting the trial is to discover the point at which a compound is too
poisonous to administer."
Volunteers are paid an inconvenience fee for their time spent in the volunteer centre.
Pay ranges from a small amount of money for a short period of residence, to a larger amount of up to approx $6000
depending on length of participation.
There are different kinds of Phase I trial:
Single Ascending Dose studies are those in which small groups of subjects are given a single dose of the drug
while they are observed and tested for a period of time. If they do not exhibit any adverse side effects, and the
pharmacokinetic data is roughly in line with predicted safe values, the dose is escalated, and a new group of
subjects is then given a higher dose. This is continued until pre-calculated pharmacokinetic safety levels are
reached, or intolerable side effects start showing up (at which point the drug is said to have reached the
Maximum tolerated dose (MTD)).
Multiple Ascending Dose studies are conducted to better understand the pharmacokinetics &
pharmacodynamics of multiple doses of the drug. In these studies, a group of patients receives multiple low
doses of the drug, while samples (of blood, and other fluids) are collected at various time points and analyzed
to acquire information on how the drug is processed within the body. The dose is subsequently escalated for
further groups, up to a predetermined level.
Food effect
A short trial designed to investigate any differences in absorption of the drug by the body, caused by eating
before the drug is given. These studies are usually run as a crossover study, with volunteers being given two
identical doses of the drug while fasted, and after being fed.
Phase II
Once the initial safety of the study drug has been confirmed in Phase I trials, Phase II trials are performed on larger
groups (20-300) and are designed to assess how well the drug works, as well as to continue Phase I safety
assessments in a larger group of volunteers and patients. When the development process for a new drug fails, this
usually occurs during Phase II trials when the drug is discovered not to work as planned, or to have toxic effects.
Phase II studies are sometimes divided into Phase IIA and Phase IIB.
• Phase IIA is specifically designed to assess dosing requirements (how much drug should be given).
• Phase IIB is specifically designed to study efficacy (how well the drug works at the prescribed dose(s)).
Some trials combine Phase I and Phase II, and test both efficacy and toxicity.
Trial design
Some Phase II trials are designed as case series, demonstrating a drug's safety and activity in a selected group
of patients. Other Phase II trials are designed as randomized clinical trials, where some patients receive the
drug/device and others receive placebo/standard treatment. Randomized Phase II trials have far fewer patients
than randomized Phase III trials.
Clinical trial
Phase III
Phase III studies are randomized controlled multicenter trials on large patient groups (300–3,000 or more depending
upon the disease/medical condition studied) and are aimed at being the definitive assessment of how effective the
drug is, in comparison with current 'gold standard' treatment. Because of their size and comparatively long duration,
Phase III trials are the most expensive, time-consuming and difficult trials to design and run, especially in therapies
for chronic medical conditions.
It is common practice that certain Phase III trials will continue while the regulatory submission is pending at the
appropriate regulatory agency. This allows patients to continue to receive possibly lifesaving drugs until the drug can
be obtained by purchase. Other reasons for performing trials at this stage include attempts by the sponsor at "label
expansion" (to show the drug works for additional types of patients/diseases beyond the original use for which the
drug was approved for marketing), to obtain additional safety data, or to support marketing claims for the drug.
Studies in this phase are by some companies categorised as "Phase IIIB studies."

While not required in all cases, it is typically expected that there be at least two successful Phase III trials,
demonstrating a drug's safety and efficacy, in order to obtain approval from the appropriate regulatory agencies such
as FDA (USA), or the EMA (European Union), for example.
Once a drug has proved satisfactory after Phase III trials, the trial results are usually combined into a large document
containing a comprehensive description of the methods and results of human and animal studies, manufacturing
procedures, formulation details, and shelf life. This collection of information makes up the "regulatory submission"
that is provided for review to the appropriate regulatory authorities
in different countries. They will review the
submission, and, it is hoped, give the sponsor approval to market the drug.
Most drugs undergoing Phase III clinical trials can be marketed under FDA norms with proper recommendations and
guidelines, but in case of any adverse effects being reported anywhere, the drugs need to be recalled immediately
from the market. While most pharmaceutical companies refrain from this practice, it is not abnormal to see many
drugs undergoing Phase III clinical trials in the market.
Phase IV
Phase IV trial is also known as Post-Marketing Surveillance Trial. Phase IV trials involve the safety surveillance
(pharmacovigilance) and ongoing technical support of a drug after it receives permission to be sold. Phase IV studies
may be required by regulatory authorities or may be undertaken by the sponsoring company for competitive (finding
a new market for the drug) or other reasons (for example, the drug may not have been tested for interactions with
other drugs, or on certain population groups such as pregnant women, who are unlikely to subject themselves to
trials). The safety surveillance is designed to detect any rare or long-term adverse effects over a much larger patient
population and longer time period than was possible during the Phase I-III clinical trials. Harmful effects discovered
by Phase IV trials may result in a drug being no longer sold, or restricted to certain uses: recent examples involve
cerivastatin (brand names Baycol and Lipobay), troglitazone (Rezulin) and rofecoxib (Vioxx).
Clinical trial
Phase V
Phase V is a growing term used in the literature of translational research to refer to comparative effectiveness
research and community-based research; it is used to signify the integration of a new clinical treatment into
widespread public health practice. [25]
Clinical trials are only a small part of the research that goes into developing a new treatment. Potential drugs, for
example, first have to be discovered, purified, characterized, and tested in labs (in cell and animal studies) before
ever undergoing clinical trials. In all, about 1,000 potential drugs are tested before just one reaches the point of being
tested in a clinical trial. For example, a new cancer drug has, on average, 6 years of research behind it before it even
makes it to clinical trials. But the major holdup in making new cancer drugs available is the time it takes to complete
clinical trials themselves. On average, about 8 years pass from the time a cancer drug enters clinical trials until it
receives approval from regulatory agencies for sale to the public. Drugs for other diseases have similar timelines.
Some reasons a clinical trial might last several years:
• For chronic conditions like cancer, it takes months, if not years, to see if a cancer treatment has an effect on a
• For drugs that are not expected to have a strong effect (meaning a large number of patients must be recruited to
observe any effect), recruiting enough patients to test the drug's effectiveness (i.e., getting statistical power) can
take several years.
• Only certain people who have the target disease condition are eligible to take part in each clinical trial.
Researchers who treat these particular patients must participate in the trial. Then they must identify the desirable
patients and obtain consent from them or their families to take part in the trial.
The biggest barrier to completing studies is the shortage of people who take part. All drug and many device trials
target a subset of the population, meaning not everyone can participate. Some drug trials require patients to have
unusual combinations of disease characteristics. It is a challenge to find the appropriate patients and obtain their
consent, especially when they may receive no direct benefit (because they are not paid, the study drug is not yet
proven to work, or the patient may receive a placebo). In the case of cancer patients, fewer than 5% of adults with
cancer will participate in drug trials. According to the Pharmaceutical Research and Manufacturers of America
(PhRMA), about 400 cancer medicines were being tested in clinical trials in 2005. Not all of these will prove to be
useful, but those that are may be delayed in getting approved because the number of participants is so low.
For clinical trials involving a seasonal indication (such as airborne allergies, Seasonal Affective Disorder, influenza,
and others), the study can only be done during a limited part of the year (such as Spring for pollen allergies), when
the drug can be tested. This can be an additional complication on the length of the study, yet proper planning and the
use of trial sites in the southern as well as northern hemispheres allows for year-round trials can reduce the length of
the studies.

Clinical trials that do not involve a new drug usually have a much shorter duration. (Exceptions are epidemiological
studies like the Nurses' Health Study.)
Clinical trials designed by a local investigator and (in the U.S.) federally funded clinical trials are almost always
administered by the researcher who designed the study and applied for the grant. Small-scale device studies may be
administered by the sponsoring company. Phase III and Phase IV clinical trials of new drugs are usually
administered by a contract research organization (CRO) hired by the sponsoring company. (The sponsor provides the
drug and medical oversight.) A CRO is a company that is contracted to perform all the administrative work on a
clinical trial. It recruits participating researchers, trains them, provides them with supplies, coordinates study
Clinical trial
administration and data collection, sets up meetings, monitors the sites for compliance with the clinical protocol, and
ensures that the sponsor receives 'clean' data from every site. Recently, site management organizations have also
been hired to coordinate with the CRO to ensure rapid IRB/IEC approval and faster site initiation and patient
At a participating site, one or more research assistants (often nurses) do most of the work in conducting the clinical
trial. The research assistant's job can include some or all of the following: providing the local Institutional Review
Board (IRB) with the documentation necessary to obtain its permission to conduct the study, assisting with study
start-up, identifying eligible patients, obtaining consent from them or their families, administering study treatment(s),
collecting and statistically analyzing data, maintaining and updating data files during followup, and communicating
with the IRB, as well as the sponsor and CRO.
Ethical conduct
Clinical trials are closely supervised by appropriate regulatory authorities. All studies that involve a medical or
therapeutic intervention on patients must be approved by a supervising ethics committee before permission is granted
to run the trial. The local ethics committee has discretion on how it will supervise noninterventional studies
(observational studies or those using already collected data). In the U.S., this body is called the Institutional Review
Board (IRB). Most IRBs are located at the local investigator's hospital or institution, but some sponsors allow the use
of a central (independent/for profit) IRB for investigators who work at smaller institutions.
To be ethical, researchers must obtain the full and informed consent of participating human subjects. (One of the
IRB's main functions is ensuring that potential patients are adequately informed about the clinical trial.) If the patient
is unable to consent for him/herself, researchers can seek consent from the patient's legally authorized representative.
In California, the state has prioritized the individuals who can serve as the legally authorized representative.
In some U.S. locations, the local IRB must certify researchers and their staff before they can conduct clinical trials.
They must understand the federal patient privacy (HIPAA) law and good clinical practice. International Conference
of Harmonisation Guidelines for Good Clinical Practice (ICH GCP) is a set of standards used internationally for the
conduct of clinical trials. The guidelines aim to ensure that the "rights, safety and well being of trial subjects are
The notion of informed consent of participating human subjects exists in many countries all over the world, but its
precise definition may still vary.
Informed consent is clearly a necessary condition for ethical conduct but does not ensure ethical conduct. The final
objective is to serve the community of patients or future patients in a best-possible and most responsible way.
However, it may be hard to turn this objective into a well-defined quantified objective function. In some cases this
can be done, however, as for instance for questions of when to stop sequential treatments (see Odds algorithm), and
then quantified methods may play an important role.
Additional ethical concerns are present when conducting clinical trials on children (pediatrics).
Clinical trial
Responsibility for the safety of the subjects in a clinical trial is shared between the sponsor, the local site
investigators (if different from the sponsor), the various IRBs that supervise the study, and (in some cases, if the
study involves a marketable drug or device) the regulatory agency for the country where the drug or device will be
For safety reasons, many clinical trials of drugs are designed to exclude women of childbearing age, pregnant
women, and/or women who become pregnant during the study. In some cases the male partners of these women are
also excluded or required to take birth control measures.
• Throughout the clinical trial, the sponsor is responsible for accurately informing the local site investigators of the
true historical safety record of the drug, device or other medical treatments to be tested, and of any potential
interactions of the study treatment(s) with already approved medical treatments. This allows the local
investigators to make an informed judgment on whether to participate in the study or not.
• The sponsor is responsible for monitoring the results of the study as they come in from the various sites, as the
trial proceeds. In larger clinical trials, a sponsor will use the services of a Data Monitoring Committee (DMC,
known in the U.S. as a Data Safety Monitoring Board). This is an independent group of clinicians and
statisticians. The DMC meets periodically to review the unblinded data that the sponsor has received so far. The
DMC has the power to recommend termination of the study based on their review, for example if the study
treatment is causing more deaths than the standard treatment, or seems to be causing unexpected and study-related
serious adverse events.
• The sponsor is responsible for collecting adverse event reports from all site investigators in the study, and for
informing all the investigators of the sponsor's judgment as to whether these adverse events were related or not
related to the study treatment. This is an area where sponsors can slant their judgment to favor the study
• The sponsor and the local site investigators are jointly responsible for writing a site-specific informed consent that
accurately informs the potential subjects of the true risks and potential benefits of participating in the study, while
at the same time presenting the material as briefly as possible and in ordinary language. FDA regulations and ICH
guidelines both require that “the information that is given to the subject or the representative shall be in language
understandable to the subject or the representative." If the participant's native language is not English, the sponsor
must translate the informed consent into the language of the participant.
Local site investigators
• A physician's first duty is to his/her patients, and if a physician investigator believes that the study treatment may
be harming subjects in the study, the investigator can stop participating at any time. On the other hand,
investigators often have a financial interest in recruiting subjects, and can act unethically in order to obtain and
maintain their participation.
• The local investigators are responsible for conducting the study according to the study protocol, and supervising
the study staff throughout the duration of the study.
• The local investigator or his/her study staff are responsible for ensuring that potential subjects in the study
understand the risks and potential benefits of participating in the study; in other words, that they (or their legally
authorized representatives) give truly informed consent.
• The local investigators are responsible for reviewing all adverse event reports sent by the sponsor. (These adverse
event reports contain the opinion of both the investigator at the site where the adverse event occurred, and the
sponsor, regarding the relationship of the adverse event to the study treatments). The local investigators are
responsible for making an independent judgment of these reports, and promptly informing the local IRB of all
Clinical trial
serious and study-treatment-related adverse events.
• When a local investigator is the sponsor, there may not be formal adverse event reports, but study staff at all
locations are responsible for informing the coordinating investigator of anything unexpected.
• The local investigator is responsible for being truthful to the local IRB in all communications relating to the
Approval by an IRB, or ethics board, is necessary before all but the most informal medical research can begin.
• In commercial clinical trials, the study protocol is not approved by an IRB before the sponsor recruits sites to
conduct the trial. However, the study protocol and procedures have been tailored to fit generic IRB submission
requirements. In this case, and where there is no independent sponsor, each local site investigator submits the
study protocol, the consent(s), the data collection forms, and supporting documentation to the local IRB.
Universities and most hospitals have in-house IRBs. Other researchers (such as in walk-in clinics) use
independent IRBs.
• The IRB scrutinizes the study for both medical safety and protection of the patients involved in the study, before
it allows the researcher to begin the study. It may require changes in study procedures or in the explanations given
to the patient. A required yearly "continuing review" report from the investigator updates the IRB on the progress
of the study and any new safety information related to the study.
Regulatory agencies
• If a clinical trial concerns a new regulated drug or medical device (or an existing drug for a new purpose), the
appropriate regulatory agency for each country where the sponsor wishes to sell the drug or device is supposed to
review all study data before allowing the drug/device to proceed to the next phase, or to be marketed. However, if
the sponsor withholds negative data, or misrepresents data it has acquired from clinical trials, the regulatory
agency may make the wrong decision.
• In the U.S., the FDA can audit the files of local site investigators after they have finished participating in a study,
to see if they were correctly following study procedures. This audit may be random, or for cause (because the
investigator is suspected of fraudulent data). Avoiding an audit is an incentive for investigators to follow study
Different countries have different regulatory requirements and enforcement abilities. "An estimated 40 percent of all
clinical trials now take place in Asia, Eastern Europe, central and south America. “There is no compulsory
registration system for clinical trials in these countries and many do not follow European directives in their
operations”, says Dr. Jacob Sijtsma of the Netherlands-based WEMOS, an advocacy health organisation tracking
clinical trials in developing countries."
Beginning in the 1980s, harmonization of clinical trial protocols was shown as feasible across countries of the
European Union. At the same time, coordination between Europe, Japan and the United States led to a joint
regulatory-industry initiative on international harmonization named after 1990 as the International Conference on
Harmonisation of Technical Requirements for Registration of Pharmaceuticals for Human Use (ICH)
most clinical trial programs follow ICH guidelines, aimed at "ensuring that good quality, safe and effective
medicines are developed and registered in the most efficient and cost-effective manner. These activities are pursued
in the interest of the consumer and public health, to prevent unnecessary duplication of clinical trials in humans and
to minimize the use of animal testing without compromising the regulatory obligations of safety and
Clinical trial
The cost of a study depends on many factors, especially the number of sites that are conducting the study, the
number of patients required, and whether the study treatment is already approved for medical use. Clinical trials
follow a standardized process.
The costs to a pharmaceutical company of administering a Phase III or IV clinical trial may include, among others:
• manufacturing the drug(s)/device(s) tested
• staff salaries for the designers and administrators of the trial
• payments to the contract research organization, the site management organization (if used) and any outside
• payments to local researchers (and their staffs) for their time and effort in recruiting patients and collecting data
for the sponsor
• study materials and shipping
• communication with the local researchers, including onsite monitoring by the CRO before and (in some cases)
multiple times during the study
• one or more investigator training meetings
• costs incurred by the local researchers such as pharmacy fees, IRB fees and postage.
• any payments to patients enrolled in the trial (all payments are strictly overseen by the IRBs to ensure that
patients do not feel coerced to take part in the trial by overly attractive payments)
These costs are incurred over several years.
In the U.S. there is a 50% tax credit for sponsors of certain clinical trials.
National health agencies such as the U.S. National Institutes of Health offer grants to investigators who design
clinical trials that attempt to answer research questions that interest the agency. In these cases, the investigator who
writes the grant and administers the study acts as the sponsor, and coordinates data collection from any other sites.
These other sites may or may not be paid for participating in the study, depending on the amount of the grant and the
amount of effort expected from them.
Clinical trials are traditionally expensive and difficult to undertake. Using internet resources can, in some cases,
reduce the economic burden.
Many clinical trials do not involve any money. However, when the sponsor is a private company or a national health
agency, investigators are almost always paid to participate. These amounts can be small, just covering a partial salary
for research assistants and the cost of any supplies (usually the case with national health agency studies), or be
substantial and include 'overhead' that allows the investigator to pay the research staff during times in between
clinical trials.
In Phase I drug trials, participants are paid because they give up their time (sometimes away from their homes) and
are exposed to unknown risks, without the expectation of any benefit. In most other trials, however, patients are not
paid, in order to ensure that their motivation for participating is the hope of getting better or contributing to medical
knowledge, without their judgment being skewed by financial considerations. However, they are often given small
payments for study-related expenses like travel or as compensation for their time in providing follow-up information
about their health after they are discharged from medical care.
Clinical trial
Participating in a clinical trial
Newspaper advertisements seeking patients and healthy volunteers to
participate in clinical trials.
Phase 0 and Phase I drug trials seek healthy volunteers.
Most other clinical trials seek patients who have a
specific disease or medical condition.
Locating trials
Depending on the kind of participants required,
sponsors of clinical trials use various recruitment
strategies, including patient databases, newspaper and
radio advertisements, flyers, posters in places the
patients might go (such as doctor's offices), and
personal recruitment of patients by investigators.
Volunteers with specific conditions or diseases have
additional online resources to help them locate clinical
trials. For example, people with Parkinson's disease can
use PDtrials to find up-to-date information on Parkinson's disease trials currently enrolling participants in the U.S.
and Canada, and search for specific Parkinson’s clinical trials using criteria such as location, trial type, and
Other disease-specific services exist for volunteers to find trials related to their condition.
Volunteers may search directly on ClinicalTrials.gov to locate trials using a registry run by the U.S. National
Institutes of Health,National Library of Medicine, and CenterWatch Patient Resources.
However, many clinical trials will not accept participants who contact them directly to volunteer as it is believed this
may bias the characteristics of the population being studied. Such trials typically recruit via networks of medical
professionals who ask their individual patients to consider enrollment.
Steps for volunteers
Before participating in a clinical trial, interested volunteers should speak with their doctors, family members, and
others who have participated in trials in the past. After locating a trial, volunteers will often have the opportunity to
speak or e-mail the clinical trial coordinator for more information and to answer any questions. After receiving
consent from their doctors, volunteers then arrange an appointment for a screening visit with the trial coordinator.
All volunteers being considered for a trial are required to undertake a medical screen. There are different
requirements for different trials, but typically volunteers will have the following tests:
• Measurement of the electrical activity of the heart (ECG)
• Measurement of blood pressure, heart rate and temperature
• Blood sampling
• Urine sampling
• Weight and height measurement
• Drugs abuse testing
• Pregnancy testing (females only)
Clinical trial
Information technology
The last decade has seen a proliferation of information technology use in the planning and conduct of clinical trials.
Clinical trial management systems (CTMS) are often used by research sponsors or CROs to help plan and manage
the operational aspects of a clinical trial, particularly with respect to investigational sites. Web-based electronic data
capture (EDC) and clinical data management systems (CDMS) are used in a majority of clinical trials
to collect
case report data from sites, manage its quality and prepare it for analysis. Interactive voice response systems (IVRS)
are used by sites to register the enrollment of patients using a phone and to allocate patients to a particular treatment
arm (although phones are being increasingly replaced with web-based (IWRS) tools which are sometimes part of the
EDC system). Patient-reported outcome measures are being increasingly collected using hand-held, sometimes
wireless ePRO (or eDiary) devices. Statistical software is used to analyze the collected data and prepare it for
regulatory submission. Access to many of these applications are increasingly aggregated in web-based clinical trial
In 2001, the editors of 12 major journals issued a joint editorial, published in each journal, on the control over
clinical trials exerted by sponsors, particularly targeting the use of contracts which allow sponsors to review the
studies prior to publication and withhold publication. They strengthened editorial restrictions to counter the effect.
The editorial noted that contract research organizations had, by 2000, received 60% of the grants from
pharmaceutical companies in the U.S. Researchers may be restricted from contributing to the trial design, accessing
the raw data, and interpreting the results.
Seeding trials are particularly controversial.
[1] Avorn J. (2004). Powerful Medicines, pp. 129-133. Alfred A. Knopf.
[2] Van Spall HG, Toren A, Kiss A, Fowler RA (March 2007). "Eligibility criteria of randomized controlled trials published in high-impact
general medical journals: a systematic sampling review". JAMA 297 (11): 1233–40. doi:10.1001/jama.297.11.1233. PMID 17374817.
[3] The regulatory authority in the USA is the Food and Drug Administration; in Canada, Health Canada; in the European Union, the European
Medicines Agency; and in Japan, the Ministry of Health, Labour and Welfare
[4] " Clinical trials in oncology (http:/ / books.google. com/ books?id=Zke8ocubNXAC& pg=PA1& dq& hl=en#v=onepage&q=& f=false)".
Stephanie Green, Jacqueline Benedetti, John Crowley (2003). CRC Press. p.1. ISBN 1-58488-302-2
[5] " Clinical Trials Handbook (http:/ / books. google. com/ books?id=d8GxG0d9rpgC& pg=PA118& dq& hl=en#v=onepage&q=& f=false)".
Shayne Cox Gad (2009). John Wiley and Sons. p.118. ISBN 0-471-21388-8
[6] Curtis L. Meinert, Susan Tonascia (1986). Clinical trials: design, conduct, and analysis (http:/ / books.google.com/ ?id=i1oAxuE29MUC&
pg=PA3&lpg=PA3&q). Oxford University Press, USA. p. 3. ISBN 978-0195035681. .
[7] Toby E. Huff (2003), The Rise of Early Modern Science: Islam, China, and the West, p. 218. Cambridge University Press, ISBN
[8] Tschanz, David W. (May/June 1997). "The Arab Roots of European Medicine". Saudi Aramco World 48 (3): 20–31.
[9] D. Craig Brater and Walter J. Daly (2000), "Clinical pharmacology in the Middle Ages: Principles that presage the 21st century", Clinical
Pharmacology & Therapeutics 67 (5), p. 447-450 [448].
[10] "James Lind: A Treatise of the Scurvy (1754)" (http:// www.bruzelius.info/Nautica/ Medicine/ Lind(1753).html). 2001. . Retrieved
[11] O'Rourke, Michael F. (1992). "Frederick Akbar Mahomed". Hypertension (American Heart Association) 19: 212–217 [213]
[12] O'Rourke, Michael F. (1992). "Frederick Akbar Mahomed". Hypertension (American Heart Association) 19: 212–217 [212]
[13] Glossary of Clinical Trial Terms, NIH Clinicaltrials.gov (http:/ / clinicaltrials. gov/ ct2/ info/ glossary)
[14] Helene S (2010). "EU Compassionate Use Programmes (CUPs): Regulatory Framework and Points to Consider before CUP
Implementation" (http:// adisonline. com/ pharmaceuticalmedicine/Fulltext/ 2010/ 24040/
EU_Compassionate_Use_Programmes__CUPs___Regulatory. 4. aspx). Pharm Med 24 (4): 223-229. .
[15] ICH Guideline for Good Clinical Practice: Consolidated Guidance (http:/ / www.ich.org/ LOB/ media/ MEDIA482.pdf)
[16] International Conference on Harmonization of Technical Requirements for Registration of Pharmaceuticals for Human Use (http:// www.
[17] What is informed consent? US National Institutes of Health, Clinicaltrials.gov (http:// clinicaltrials.gov/ ct2/ info/understand)
Clinical trial
[18] "Guidance for Industry, Investigators, and Reviewers" (http:// www.fda. gov/downloads/ Drugs/
GuidanceComplianceRegulatoryInformation/Guidances/ ucm078933. pdf). Food and Drug Administration. January 2006. . Retrieved
[19] The Lancet (2009). "Phase 0 trials: a platform for drug development?". Lancet 374 (9685): 176. doi:10.1016/S0140-6736(09)61309-X.
[20] Silvia Camporesi (October 2008). "Phase 0 workshop at the 20th EORT-NCI-AARC symposium, Geneva" (http:/ / www.
ecancermedicalscience. com/ blog.asp?postId=27). ecancermedicalscience. . Retrieved 2008-11-07.
[21] http:/ / www. medscape. com/ viewarticle/582554_2
[22] "Guidance for Institutional Review Boards and Clinical Investigators" (http:// www.fda.gov/ oc/ ohrt/ irbs/ drugsbiologics. html). Food and
Drug Administration. 1999-03-16. . Retrieved 2007-03-27.
[23] "Periapproval Services (Phase IIIb and IV programs)" (http:// www.covance.com/ periapproval/svc_phase3b. php). Covance Inc.. 2005. .
Retrieved 2007-03-27.
[24] Arcangelo, Virginia Poole; Andrew M. Peterson (2005). Pharmacotherapeutics for Advanced Practice: A Practical Approach. Lippincott
Williams & Wilkins. ISBN 0781757843.
[25] http:/ / www. asha. org/academic/ questions/ PhasesClinicalResearch/
[26] Web Site Editor; Crossley, MJ; Turner, P; Thordarson, P (2007). "Clinical Trials - What Your Need to Know" (http:/ / www.cancer.org/
docroot/ ETO/content/ ETO_6_3_Clinical_Trials_-_Patient_Participation.asp). American Cancer Society 129 (22): 7155.
doi:10.1021/ja0713781. PMID 17497782. .
[27] Yamin Khan and Sarah Tilly. "Seasonality: The Clinical Trial Manager's Logistical Challenge" (http:/ / www.pharm-olam.com/ pdf/
POI-Seasonality.pdf). Pharm-Olam International (POI) (http:// www. pharm-olam.com). . Retrieved 26 April 2010.
[28] Yamin Khan and Sarah Tilly. "Flu, Season, Diseases Affect Trials" (http:// appliedclinicaltrialsonline.findpharma.com/
appliedclinicaltrials/ Drug+Development/ Flu-Season-Diseases-Affect-Trials/ArticleStandard/Article/ detail/ 652128). Applied Clinical
Trials Online. . Retrieved 26 February 2010.
[29] Assembly Bill No. 2328 (http:/ / irb.ucsd. edu/ ab_2328_bill_20020826_enrolled. pdf)
[30] Back Translation for Quality Control of Informed Consent Forms (http:/ / www.gts-translation.com/ medicaltranslationpaper.pdf)
[31] Common Dreams (http:// www.commondreams. org/archive/ 2007/ 12/ 14/ 5838/ )
[32] Pmda.go.jp 独 立 行 政 法 人 医 薬品 医 療 機 器 総 合 機 構 (http:// www.pmda.go. jp/ ich/ s/ s1b_98_7_9e. pdf) (Japanese)
[33] ICH (http:// www. ich.org/ cache/ compo/ 276-254-1.html)
[34] "Tax Credit for Testing Expenses for Drugs for Rare Diseases or Conditions" (http:/ / www.fda.gov/ orphan/ taxcred.htm). Food and Drug
Administration. 2001-04-17. . Retrieved 2007-03-27.
[35] Paul, J. .; Seib, R. .; Prescott, T. . (Mar 2005). "The Internet and clinical trials: background, online resources, examples and issues" (http:/ /
www.jmir.org/ 2005/ 1/ e5/ ) (Free full text). Journal of medical Internet research 7 (1): e5. doi:10.2196/jmir.7.1.e5. PMC 1550630.
PMID 15829477. .
[36] http:// www. pdtrials. org/en/ about_PDtrials_what
[37] http:// www. mlanet. org/resources/ hlth_tutorial/mod4c. html
[38] http:/ / www. pdtrials. org/en/ participate_clinicalresearch_how
[39] Life on a Trial - What to Expect (http:// www.beavolunteer.co. uk/ index. php?option=com_content& view=article&id=25& Itemid=21)
[40] Life Sciences Strategy Group, "Clinical Trial Technology Utilization, Purchasing Preferences & Growth Outlook" Syndicated Publication,
May, 2009
[41] Davidoff F, DeAngelis CD, Drazen JM, et al (September 2001). "Sponsorship, authorship and accountability" (http:// www.cmaj.ca/ cgi/
pmidlookup?view=long&pmid=11584570). CMAJ 165 (6): 786–8. PMC 81460. PMID 11584570. .
[42] Sox HC, Rennie D (August 2008). "Seeding trials: just say "no"" (http:// www. annals. org/ cgi/ pmidlookup?view=long&
pmid=18711161). Ann. Intern. Med. 149 (4): 279–80. PMID 18711161. . Retrieved 2008-08-21.
• Rang HP, Dale MM, Ritter JM, Moore PK (2003). Pharmacology 5 ed. Edinburgh: Churchill Livingstone. ISBN
• Finn R, (1999). Cancer Clinical Trials: Experimental Treatments and How They Can Help You., Sebastopol:
O'Reilly & Associates. ISBN 1-56592-566-1
• Chow S-C and Liu JP (2004). Design and Analysis of Clinical Trials : Concepts and Methodologies, ISBN
• Pocock SJ (2004), Clinical Trials: A Practical Approach, John Wiley & Sons, ISBN 0-471-90155-5
Clinical trial
External links
• The International Conference on Harmonization of Technical Requirements for Registration of Pharmaceuticals
for Human Use (ICH) (http:// www.ich. org)
• The International Clinical Trials Registry Platform (ICTRP) (http:// www.who.int/ trialsearch)
• IFPMA Clinical Trials Portal (IFPMA CTP) (http:/ / clinicaltrials. ifpma.org) to Find Ongoing & Completed
Trials of New Medicines
• ClinicalTrials.gov (http:/ / clinicaltrials. gov)
• Clinical Trials for cancer research (http:// www. cancer.gov/ clinicaltrials) - National Cancer Institute
Cohort study
A cohort study or panel study is a form of longitudinal study (a type of observational study) used in medicine,
social science, actuarial science, and ecology. It is an analysis of risk factors and follows a group of people who do
not have the disease, and uses correlations to determine the absolute risk of subject contraction. It is one type of
clinical study design and should be compared with a cross-sectional study. Cohort studies are largely about the life
histories of segments of populations, and the individual people who constitute these segments.

A cohort is a group of people who share a common characteristic or experience within a defined period (e.g., are
born, are exposed to a drug or a vaccine, etc.). Thus a group of people who were born on a day or in a particular
period, say 1948, form a birth cohort. The comparison group may be the general population from which the cohort is
drawn, or it may be another cohort of persons thought to have had little or no exposure to the substance under
investigation, but otherwise similar. Alternatively, subgroups within the cohort may be compared with each other.
Randomized controlled trials, or RCTs are a superior methodology in the hierarchy of evidence in therapy, because
they limit the potential for any biases by randomly assigning one patient pool to an intervention and another patient
pool to non-intervention (or placebo). This minimizes the chance that the incidence of confounding (particularly
unknown confounding) variables will differ between the two groups. However, it is important to note that RCTs may
not be suitable in all cases and other methodologies would be much more suitable to investigate the study's objective
Cohort studies can either be conducted prospectively, or retrospectively from archived records.
In medicine, a cohort study is often undertaken to obtain evidence to try to refute the existence of a suspected
association between cause and effect; failure to refute a hypothesis strengthens confidence in it. Crucially, the cohort
is identified before the appearance of the disease under investigation. The study groups follow a group of people who
do not have the disease for a period of time and see who develops the disease (new incidence). The cohort cannot
therefore be defined as a group of people who already have the disease. Prospective (longitudinal) cohort studies
between exposure and disease strongly aid in studying causal associations, though distinguishing true causality
usually requires further corroboration from further experimental trials.
The advantage of prospective cohort study data is that it can help determine risk factors for contracting a new disease
because it is a longitudinal observation of the individual through time, and the collection of data at regular intervals,
so recall error is reduced. However, cohort studies are expensive to conduct, are sensitive to attrition and take a long
follow-up time to generate useful data. Nevertheless, the results that are obtained from long-term cohort studies are
of substantially superior quality to retrospective/cross-sectional studies, and cohort studies are considered the gold
standard in observational epidemiology. Moreover, cohort studies are informative for efficiently studying a
wide-range of exposure-disease associations.
Cohort study
Some cohort studies track groups of children from their birth, and record a wide range of information (exposures)
about them. The value of a cohort study depends on the researchers' capacity to stay in touch with all members of the
cohort. Some of these studies have continued for decades.
Unlike previously stated, a Cohort Analysis is NOT a longitudinal study but a multiple cross-sectional study design.
The cohort refers to the group of respondents who experience the same event within the same interval (Malhota &
Birks, 2010). For instance, comparing the preferences of an age category (10–20 years old). For this reason, it is
unlikely that a respondent who took part in the first test will be taking part in the second one.
An example of an epidemiological question that can be answered by the use of a cohort study is: does exposure to X
(say, smoking) associate with outcome Y (say, lung cancer)? Such a study would recruit a group of smokers and a
group of non-smokers (the unexposed group) and follow them for a set period of time and note differences in the
incidence of lung cancer between the groups at the end of this time. The groups are matched in terms of many other
variables such as economic status and other health status so that the variable being assessed, the independent variable
(in this case, smoking) can be isolated as the cause of the dependent variable (in this case, lung cancer). In this
example, a statistically significant increase in the incidence of lung cancer in the smoking group as compared to the
non-smoking group is evidence in favor of the hypothesis. However, rare outcomes, such as lung cancer, are
generally not studied with the use of a cohort study, but are rather studied with the use of a case-control study.
Shorter term studies are commonly used in medical research as a form of clinical trial, or means to test a particular
hypothesis of clinical importance. Such studies typically follow two groups of patients for a period of time and
compare an endpoint or outcome measure between the two groups.
Randomized controlled trials, or RCTs are a superior methodology in the hierarchy of evidence, because they limit
the potential for bias by randomly assigning one patient pool to an intervention and another patient pool to
non-intervention (or placebo). This minimizes the chance that the incidence of confounding variables will differ
between the two groups.
Nevertheless, it is sometimes not practical or ethical to perform RCTs to answer a clinical question. To take our
example, if we already had reasonable evidence that smoking causes lung cancer then persuading a pool of
non-smokers to take up smoking in order to test this hypothesis would generally be considered quite unethical.
Two examples of cohort studies that have been going on for more than 50 years are the Framingham Heart Study and
the National Child Development Study (NCDS), the most widely-researched of the British birth cohort studies.
Key findings of NCDS and a detailed profile of the study appear in the International Journal of Epidemiology.
International Journal of Epidemiology
comparison of two Cohorts, Millennium Cohort Study (United States) and
the The King’s Cohort (United Kingdom)
The largest cohort study in women is the Nurses' Health Study. Started in 1976, it is tracking over 120,000 nurses
and has been analyzed for many different conditions and outcomes.
The largest cohort study in Africa is the Birth to Twenty Study which began in 1990 and tracks a cohort of over
3,000 children born in the weeks following Nelson Mandela's release from prison.
Other famous examples are the Grant Study tracking a number of Harvard graduates from ca. 1950.77, and the
Whitehall Study tracking 10,308 British civil servants.
Cohort study
Retrospective cohort
A "prospective cohort" defines the groups before the study is done, while a "retrospective cohort" defines the
grouping after the data is collected. Examples of a retrospective cohort are Long-Term Mortality after Gastric
Bypass Surgery.
and The Lothian Birth Cohort Studies.
Nested case-control study
An example of a nested case-control study is Inflammatory markers and the risk of coronary heart disease in men
and women, which was a case control analyses extracted from the Framingham Heart Study cohort.
Household panel survey
Household panel surveys are an important sub-type of cohort study. These draw representative samples of
households and survey them, following all individuals through time on a usually annual basis. Examples include the
US Panel Study on Income Dynamics (since 1968), the German Socio-Economic Panel (since 1984), the British
Household Panel Survey (since 1991), the Household, Income and Labour Dynamics in Australia Survey (since
2001) and the European Community Household Panel (1994–2001).
Cohort study
[1] http:/ / www. socialresearchmethods. net/ tutorial/Cho2/ cohort.html
[2] Porta M (editor). A dictionary of epidemiology. 5th. edition. New York: Oxford University Press, 2008. (http:// www.oup. com/ us/ catalog/
general/subject/ Medicine/ EpidemiologyBiostatistics/ ?view=usa& ci=9780195314502)
[3] http:// www. ehib. org/faq. jsp?faq_key=37
[4] Power C and Elliott J (2006). "Cohort profile: 1958 British Cohort Study". International Journal of Epidemiology 35 (1): 34–41.
doi:10.1093/ije/dyi183. PMID 16155052.
[5] http:// ije.oxfordjournals.org/content/ early/ 2011/ 06/ 29/ije.dyr096. full
[6] Adams TD, Gress RE, Smith SC, et al. (2007). "Long-term mortality after gastric bypass surgery". N. Engl. J. Med. 357 (8): 753–61.
doi:10.1056/NEJMoa066603. PMID 17715409.
[7] "The Lothian Birth Cohort Studies" (http:/ / www.psy. ed. ac.uk/ research/ lbc/ LBC. html). University of Edinburgh. . Retrieved 8 May
[8] Pai JK, Pischon T, Ma J, et al. (2004). "Inflammatory markers and the risk of coronary heart disease in men and women". N. Engl. J. Med.
351 (25): 2599–610. doi:10.1056/NEJMoa040967. PMID 15602020.
External links
• Prospective cohorts (http:// clio. stanford. edu:7080/ cocoon/ cliomods/ trailmaps/ design/ design/
prospectiveCohort/ index. html)
• Retrospective cohorts (http:/ / clio. stanford.edu:7080/ cocoon/ cliomods/ trailmaps/ design/ design/
retrospectiveCohort/index. html)
• Study Design Tutorial (http:// www. vet. cornell.edu/ imaging/ tutorial/4studydesigns/ cohort. html) Cornell
University College of Veterinary Medicine
• Birth cohort study timelines (ESDS Longitudinal) (http:// www.esds. ac. uk/longitudinal/ resources/
international. asp)
• Centre for Longitudinal Studies (http:// www.cls. ioe. ac.uk)
Common-cause and special-cause
Type of variation Synonyms
Common cause Chance cause
Natural pattern
Special cause Assignable cause
Unnatural pattern
Common- and special-causes are the two distinct origins of variation in a process, as defined in the statistical
thinking and methods of Walter A. Shewhart and W. Edwards Deming. Briefly, "common-cause" is the usual,
historical, quantifiable variation in a system, while "special-causes" are unusual, not previously observed,
non-quantifiable variation.
The distinction is fundamental in philosophy of statistics and philosophy of probability, with different treatment of
these issues being a classic issue of probability interpretations, being recognised and discussed as early as 1703 by
Gottfried Leibniz; various alternative names have been used over the years.
The distinction has been particularly important in the thinking of economists Frank Knight, John Maynard Keynes
and G. L. S. Shackle.
Common-cause and special-cause
Origins and concepts
In 1703, Jacob Bernoulli wrote to Gottfried Leibniz to discuss their shared interest in applying mathematics and
probability to games of chance. Bernoulli speculated whether it would be possible to gather mortality data from
gravestones and thereby calculate, by their existing practice, the probability of a man currently aged 20 years
outliving a man aged 60 years. Leibniz replied that he doubted this was possible as:
Nature has established patterns originating in the return of events but only for the most part. New illnesses flood the
human race, so that no matter how many experiments you have done on corpses, you have not thereby imposed a
limit on the nature of events so that in the future they could not vary.
This captures the central idea that some variation is predictable, at least approximately in frequency. This
common-cause variation is evident from the experience base. However, new, unanticipated, emergent or previously
neglected phenomena (e.g. "new diseases") result in variation outside the historical experience base. Shewhart and
Deming argued that such special-cause variation is fundamentally unpredictable in frequency of occurrence or in
John Maynard Keynes emphasised the importance of special-cause variation when he wrote:
By “uncertain” knowledge ... I do not mean merely to distinguish what is known for certain from what is only
probable. The game of roulette is not subject, in this sense, to uncertainty ... The sense in which I am using the term
is that in which the prospect of a European war is uncertain, or the price of copper and the rate of interest twenty
years hence, or the obsolescence of a new invention ... About these matters there is no scientific basis on which to
form any calculable probability whatever. We simply do not know!
Common-cause variation
Common-cause variation is characterised by:
• Phenomena constantly active within the system;
• Variation predictable probabilistically;
• Irregular variation within an historical experience base; and
• Lack of significance in individual high or low values.
The outcomes of a perfectly balanced roulette wheel are a good example of common-cause variation. Common-cause
variation is the noise within the system.
Walter A. Shewhart originally used the term chance-cause.
The term common-cause was coined by Harry Alpert
in 1947. The Western Electric Company used the term natural pattern.
Shewhart called a process that features
only common-cause variation as being in statistical control. This term is deprecated by some modern statisticians
who prefer the phrase stable and predictable.
Special-cause variation
Special-cause variation is characterised by:
• New, unanticipated, emergent or previously neglected phenomena within the system;
• Variation inherently unpredictable, even probabilistically;
• Variation outside the historical experience base; and
• Evidence of some inherent change in the system or our knowledge of it.
Special-cause variation always arrives as a surprise. It is the signal within a system.
Walter A. Shewhart originally used the term assignable-cause.
The term special-cause was coined by W. Edwards
Deming. The Western Electric Company used the term unnatural pattern.
Common-cause and special-cause
Common causes
• Inappropriate procedures
• Poor design
• Poor maintenance of machines
• Lack of clearly defined standing operating procedures
• Poor working conditions, e.g. lighting, noise, dirt, temperature, ventilation
• Substandard raw materials
• Assurement error
• Quality control error
• Vibration in industrial processes
• Ambient temperature and humidity
• Normal wear and tear
• Variability in settings
• Computer response time
Special causes
• Poor adjustment of equipment
• Operator falls asleep
• Faulty controllers
• Machine malfunction
• Computer crashes
• Poor batch of raw material
• Power surges
• High healthcare demand from elderly people
• Abnormal traffic (click-fraud) on web ads
• Extremely long lab testing turnover time due to switching to a new computer system
• Operator absent
Importance to economics
In economics, this circle of ideas is referred to under the rubric of "Knightian uncertainty". John Maynard Keynes
and Frank Knight both discussed the inherent unpredictability of economic systems in their work and used it to
criticise the mathematical approach to economics, in terms of expected utility, developed by Ludwig von Mises and
others. Keynes in particular argued that economic systems did not automatically tend to the equilibrium of full
employment owing to their agents' inability to predict the future. As he remarked in The General Theory of
Employment, Interest and Money:
... as living and moving beings, we are forced to act ... [even when] our existing knowledge does not provide a
sufficient basis for a calculated mathematical expectation.
Keynes's thinking was at odds with the classical liberalism of the Austrian school of economists, but G. L. S. Shackle
recognised the importance of Keynes's insight and sought to formalise it within a free-market philosophy.
In financial economics, the black swan theory of Nassim Nicholas Taleb is based on the significance and
unpredictability of special-causes.
Common-cause and special-cause
Importance to industrial and quality management
A special-cause failure is a failure that can be corrected by changing a component or process, whereas a
common-cause failure is equivalent to noise in the system and specific actions cannot be made to prevent for the
Harry Alpert observed:
A riot occurs in a certain prison. Officials and sociologists turn out a detailed report about the prison, with a
full explanation of why and how it happened here, ignoring the fact that the causes were common to a majority
of prisons, and that the riot could have happened anywhere.
The quote recognises that there is a temptation to react to an extreme outcome and to see it as significant, even where
its causes are common to many situations and the distinctive circumstances surrounding its occurrence, the results of
mere chance. Such behaviour has many implications within management, often leading to interventions in processes
that merely increase the level of variation and frequency of undesirable outcomes.
Deming and Shewhart both advocated the control chart as a means of managing a business process in an
economically efficient manner.
Importance to statistics
Deming and Shewhart
Within the frequency probability framework, there is no process whereby a probability can be attached to the future
occurrence of special cause. However the Bayesian approach does allow such a probability to be specified. The
existence of special-cause variation led Keynes and Deming to an interest in bayesian probability but no formal
synthesis has ever been forthcoming. Most statisticians of the Shewhart-Deming school take the view that special
causes are not embedded in either experience or in current thinking (that's why they come as a surprise) so that any
subjective probability is doomed to be hopelessly badly calibrated in practice.
It is immediately apparent from the Leibniz quote above that there are implications for sampling. Deming observed
that in any forecasting activity, the population is that of future events while the sampling frame is, inevitably, some
subset of historical events. Deming held that the disjoint nature of population and sampling frame was inherently
problematic once the existence of special-cause variation was admitted, rejecting the general use of probability and
conventional statistics in such situations. He articulated the difficulty as the distinction between analytic and
enumerative statistical studies.
Shewhart argued that, as processes subject to special-cause variation were inherently unpredictable, the usual
techniques of probability could not be used to separate special-cause from common-cause variation. He developed
the control chart as a statistical heuristic to distinguish the two types of variation. Both Deming and Shewhart
advocated the control chart as a means of assessing a process's state of statistical control and as a foundation for
Common-cause and special-cause
Keynes identified three domains of probability:
• Frequency probability;
• Subjective or Bayesian probability; and
• Events lying outside the possibility of any description in terms of probability (special causes)
- and sought to base a probability theory thereon.
In engineering
Common mode, or common cause, failure has a more specific meaning in engineering. It refers to events which are
not statistically independent. That is, failures in multiple parts of a system caused by a single fault, particularly
random failures due to environmental conditions or aging. An example is when all of the pumps for a fire sprinkler
system are located in one room. If the room becomes too hot for the pumps to operate, they will all fail at essentially
the same time, from one cause (the heat in the room).
For example, in an electronic system, a fault in a power supply which injects noise onto a supply line may cause
failures in multiple subsystems.
This is particularly important in safety-critical systems using multiple redundant channels. If the probability a failure
in one subsystem is p, then it would expected that an N channel system would have a probability of failure of p
However, in practice, the probability of failure is much higher because they are not statistically independent
; for
example ionizing radiation or electromagnetic interference (EMI) may affect both channels.
The principle of redundancy states that, when events of failure of a component are statistically independent, the
probabilities of their joint occurrence multiply. Thus, for instance, if the probability of failure of a component of a
system is one in one thousand per year, the probability of the joint failure of two of them is one in one million per
year, provided that the two events are statistically independent. This principle favors the strategy of the redundancy
of components. One place this strategy is implemented is in RAID 1, where two hard disks store a computer's data
But even so there can be many common modes: consider a RAID1 where two disks are purchased online and are
installed in a computer, there can be many common modes:
• The disks are likely to be from the same manufacturer and of the same model, therefore they share the same
design flaws.
• The disks are likely to have similar serial numbers, thus they may share any manufacturing flaws affecting
production of the same batch.
• The disks are likely to have been shipped at the same time, thus they are likely to have suffered from the same
transportation damage.
• As installed both disks are attached to the same power supply, making them vulnerable to the same power supply
• As installed both disks are in the same case, making them vulnerable to the same overheating events.
• They will be both attached to the same card or motherboard, and driven by the same software, which may have
the same bugs.
• Because of the very nature of RAID1, both disks will be subjected to the same workload and very closely similar
access patterns, stressing them in the same way.
Also, if the events of failure of two components are maximally statistically dependent, the probability of the joint
failure of both is identical to the probability of failure of them individually. In such a case, the advantages of
redundancy are negated. Strategies for the avoidance of common mode failures include keeping redundant
components physically isolated.
Common-cause and special-cause
A prime example of redundancy with isolation is a nuclear power plant. The new ABWR has three divisions of
Emergency Core Cooling Systems, each with its own generators and pumps and each isolated from the others. The
new European Pressurized Reactor has two containment buildings, one inside the other. However, even here it is not
impossible for a common mode failure to occur (for example, caused by a highly-unlikely Richter 9 earthquake).
[1] Shewhart, Walter A. (1931), Economic control of quality of manufactured product, New York, New York: D. Van Nostrand Company, Inc,
p. 7, OCLC 1045408
[2] Western Electric Company (1956), Introduction to Statistical Quality Control handbook. (1 ed.), Indianapolis, Indiana: Western Electric Co.,
pp. 23–24, OCLC 33858387
[3] Shewhart, Walter A. (1931), Economic control of quality of manufactured product, New York, New York: D. Van Nostrand Company, Inc,
p. 14, OCLC 1045408
[4] "Financial Risk in Healthcare Provision and Contracts" (http:/ / www. decisioneering.com/ cbuc/ 2004/ papers/ CBUC04-Jones. pdf) (PDF). .
Retrieved 13 November 2006.
[5] "Statistical Inference" (http:// web. archive.org/ web/ 20061007055524/ http:/ / www. anu.edu.au/ nceph/ surfstat/ surfstat-home/5-1-2.
html). Archived from the original (http:// www. anu. edu. au/ nceph/ surfstat/ surfstat-home/5-1-2.html) on 7 October 2006. . Retrieved 13
November 2006.
[6] "Common Cause Failures" (http:// www.rgwcherry.co. uk/ download/ Common Cause Failures. pdf) (PDF). . Retrieved 13 April 2011.
• Deming, W E (1975) On probability as a basis for action, The American Statistician, 29(4), pp146–152
• Deming, W E (1982) Out of the Crisis: Quality, Productivity and Competitive Position ISBN 0-521-30553-5
• Keynes, J M (1921) A Treatise on Probability, ISBN 0-333-10733-0
• Keynes, J M (1936) The General Theory of Employment, Interest and Money ISBN 1-57392-139-4
• Knight, F H (1921) Risk, Uncertainty and Profit ISBN 1-58798-126-2
• Shackle, G L S (1972) Epistemics and Economics: A Critique of Economic Doctrines ISBN 1-56000-558-0
• Shewhart, W A (1931) Economic Control of Quality of Manufactured Product ISBN 0-87389-076-0
• Shewhart, W A (1939) Statistical Method from the Viewpoint of Quality Control ISBN 0-486-65232-7
• Wheeler, D J & Chambers, D S (1992) Understanding Statistical Process Control ISBN 0-945320-13-2
Component-Based Usability Testing
Component-Based Usability Testing
Component-based usability testing (CBUT) is a testing approach which aims at empirically testing the usability of
an interaction component. The latter is defined as an elementary unit of an interactive system, on which
behaviour-based evaluation is possible. For this, a component needs to have an independent, and by the user
perceivable and controllable state, such as a radio button, a slider or a whole word processor application. The CBUT
approach can be regarded as part of component-based software engineering branch of software engineering.
CBUT is based on both software architectural views such as Model–View–Controller (MVC),
Presentation-Abstraction-Control (PAC), ICON and CNUCE agent models that split up the software in parts, and
cognitive psychology views where a person’s mental process is split up in smaller mental processes. Both software
architecture and cognitive architecture use the principle of hierarchical layering, in which low level processes are
more elementary and for humans often more physical in nature, such as the coordination movement of muscle
groups. Processes that operate on higher level layers are more abstract and focus on a person’s main goal, such as
writing an application letter to get a job. The Layered Protocol Theory
(LPT), which is a special version of
Perceptual Control Theory (PCT), brings these views together by suggesting that users interact with a system across
several layers by sending messages. Users interact with components on high layers by sending messages, such as
pressing keys, to components operating on lower layers, which on their turn relay a series of these messages into a
single high level message, such as ‘DELETE *.*’, to a component on a higher layer. Components operating on higher
layers, communicate back to the user by sending messages to components operating on lower level layers. Whereas
this layered-interaction model explains how the interaction is established, control loops explain the purpose of the
interaction. LPT sees the purpose of the users’ behaviour as the users’ attempt to control their perception, in this case
the state of the component they perceive. This means that users will only act if they perceive the component to be in
an undesirable state. For example, if a person have an empty glass but want a full glass of water, he or she will act
(e.g. walk to the tap, turning the tap on to fill the glass). The action of filling the glass will continue until the person
perceives the glass as full. As interaction with components takes places on several layers, interacting with a single
device can include several control loops. The amount of effort put into operating a control loop is seen as an
indicator for the usability of an interaction component.
CBUT can be categorized according to two testing paradigms, the Single-Version Testing Paradigm (SVTP) and the
Multiple-Versions Testing Paradigm (MVTP). In SVTP only one version of each interaction component in a system
is tested. The focus is to identify interaction components that might reduce the overall usability of the system. SVTP
is therefore suitable as part of a software-integration test. In MVTP on the other hand, multiple versions of a single
component are tested while the remaining components in the system remain unchanged. The focus is on identifying
the version with the highest usability of specific interaction component. MVTP therefore is suitable for component
development and selection. Different CBUT methods have been proposed for SVTP and MVTP, which include
measures based on recorded user interaction and questionnaires. Whereas in MVTP the recorded data can directly be
interpreted by making a comparison between two versions of the interaction component, in SVTP log file analysis is
more extensive as interaction with both higher and lower components must be considered
. Meta-analysis on the
data from several lab experiments that used CBUT measures suggests that these measures can be statistically more
powerful than overall (holistic) usability measures
Component-Based Usability Testing
Usability questionnaire
Whereas holistic oriented usability questionnaires such as System Usability Scale (SUS) examine the usability of a
system on several dimensions such as defined in ISO 9241 Part 11 standard effectiveness, efficiency and satisfaction,
a Component-Based Usability Questionnaire (CBUQ)
is a questionnaire which can be used to evaluate the
usability of individual interaction components, such as the volume control or the play control of a MP3 player. To
evaluate an interaction component, the six Perceived Ease-Of-Use (PEOU) statements from the Technology
acceptance model are taken with a reference to the interaction component, instead of to the entire system, for
Learning to operate the Volume Control would be easy for me.
Users are asked to rate these statements on a seven point Likert Scale. The average rating on these six statements is
regarded as the user’s usability rating of the interaction component. Based on lab studies with difficult to use
interaction components and easy to use interaction components, a break-even point of 5.29 on seven point Likert
scale has been determined
. Using a One-sample student's t-test it is possible to examine whether users’ rating of
an interaction component deviates from this break-even point. Interaction components that receive rating below this
break-even point can be regarded as more comparable to the set of difficult to use interaction components, whereas
ratings above this break-even point would be more comparable to the set if easy to use interaction components.
If engineers like to evaluate multiple interaction components simultaneously, the CBUQ questionnaire exists of
separate sections, one for each interaction component, each with their own 6 PEOU statements.
[1] Farrell, P.S.E., Hollands, J.G., Taylor, M.M., Gamble, H.D., (1999). Perceptual control and layered protocols in interface design: I.
Fundamental concepts. International Journal of Human-Computer Studies 50 (6), 489-520. online (http:/ / dx. doi.org/doi:10. 1006/ ijhc.
1998. 0259)
[2] Brinkman, W.-P., Haakma, R., & Bouwhuis, D.G. (2007), Towards an empirical method of efficiency testing of system parts: a
methodological study, Interacting with Computers, vol. 19, no. 3, pp. 342-356. preliminary version (http:// mmi. tudelft.nl/ ~willem-paul/
Towards_an_empirical_method_of_efficiency_testing_of_system_parts_a_methodological_study_preliminary_version.pdf) online (http://
dx.doi.org/doi:10. 1016/ j. intcom. 2007. 01. 002)
[3] Brinkman, W.-P., Haakma, R., & Bouwhuis, D.G. (2008). Component-Specific Usability Testing, IEEE Transactions on Systems, Man, and
Cybernetics - Part A, vol. 38, no. 5, pp. 1143-1155, September 2008. preliminary version (http:// mmi. tudelft.nl/ ~willem-paul/
WP_Papers_online_versie/Component_specific_usability_testing_preliminary_version. pdf) online (http:// dx. doi.org/doi:10. 1109/
TSMCA. 2008.2001056)
[4] Brinkman, W.-P., Haakma, R., & Bouwhuis, D.G. (2009), Theoretical foundation and validity of a component-based usability questionnaire,
Behaviour and Information Technology, 2, no. 28, pp. 121 - 137. preliminary version (http:/ / mmi. tudelft.nl/ ~willem-paul/
pdf) MP3 example study (http:// mmi. tudelft. nl/ ~willem-paul/mp3player/ Intro.htm) online (http:// dx. doi.org/ DOI:10.1080/
External links
• Example study (http:// mmi. tudelft.nl/ ~willem-paul/ mp3player/Intro.htm) that uses Component-based
Usability Questionnaire including instructions, questionnaires (http:/ / mmi. tudelft.nl/ ~willem-paul/mp3player/
study. htm) and data analysis (http:/ / mmi. tudelft.nl/ ~willem-paul/mp3player/ results. htm).
Computer-based assessment
Computer-based assessment
A Computer-Based Assessment (CBA), also known as Computer-Based Testing (CBT), e-assessment,
computerized testing and computer-administered testing, is a method of administering tests in which the responses
are electronically recorded, assessed, or both. As the name implies, Computer-Based Assessment makes use of a
computer or an equivalent electronic device such as a cell phone or PDA. CBA systems enable educators and trainers
to author, schedule, deliver, and report on surveys, quizzes, tests and exams.
Computer-Based Assessment may be
a stand-alone system or a part of a virtual learning environment, possibly accessed via the World Wide Web.
General advantages of CBA systems over traditional paper-and-pencil testing (PPT) have been demonstrated in
several comparative works and include: increased delivery, administration and scoring efficiency; reduced costs for
many elements of the testing lifecycle; improved test security resulting from electronic transmission and encryption;
consistency and reliability; faster and more controlled test revision process with shorter response time; faster
decision-making as the result of immediate scoring and reporting; unbiased test administration and scoring; fewer
response entry and recognition errors; fewer comprehension errors caused by the testing process; improved
translation and localization with universal availability of content; new advanced and flexible item types; increased
candidate acceptance and satisfaction; evolutionary step toward future testing methodologies.
In addition to traditional testing approaches carried out in a PPT mode, there are a variety of aspects needed to be
taken into account when CBA is deployed, such as software quality, secure delivery, reliable network (if
Internet-based), capacities, support, maintenance, software costs for development and test delivery, including
licenses. Any of the delivery modes, whether Paper-Pencil and/or computer-based, comprises advantages and
challenges which can hardly be compared, especially in relation to estimated costs. The use of CBA includes
additional benefits which can be achieved from an organisational, psychological, analytical and pedagogical
perspective. Many experts agree on the overall added value and advantages of e-testing in large scale assessments.
It is also envisaged that computer-based formative assessment, in particular, will play an increasingly important role
in learning,
with the increased use of banks of question items for the construction and delivery of dynamic,
on-demand assessments. This can be witnessed by current pioneering projects such as the SQA's SOLAR Project.
[1] TCExam
[2] Asuni 2008
[3] Scheuermann 2008
[4] Gomersall 2005
[5] Scottish Qualifications Authority 2008
• Asuni, Nicola. "TCExam :: Computer-Based Assessment" (http:/ / www.tcexam. org). Retrieved 2008-07-15.
• Gomersall, Bob (2005-12-10). "Practical implementation of e-testing on a large scale, and implications for future
eassessment and e-learning" (http:// www. btl. com/ pdfs/ Educa 2005 - Bob Gomersall. pdf). Shipley, West
Yorkshire, UK. Retrieved 2007-10-01.
• Scheuermann, Friedrich; Ângela Guimarães Pereira (2008-04-01). "Towards A Research Agenda On
Computer-Based Assessment" (http:/ / crell.jrc.it/ CBA/ EU-Report-CBA.pdf). Luxembourg, Luxembourg.
Retrieved 2008-07-15.
• Scheuermann, Friedrich; Julius Björnsson (2008-04-01). "The Transition to Computer-Based Assessment - New
Approaches to Skills Assessment and Implications for Large-scale Testing" (http:// crell.jrc.it/ RP/
Computer-based assessment
reporttransition.pdf). Luxembourg, Luxembourg. Retrieved 2009-04-02.
• Scottish Qualifications Authority (2008). "SOLAR White Paper" (http:// www.sqasolar. org.uk/ mini/ files/
SOLARWhitePaperMay2008-master. pdf). Glasgow, UK. Retrieved 2008-02-15.
External links
• Using computers for assessment in medicine (http:/ / www.pubmedcentral.gov/ articlerender.fcgi?artid=516661)
• CAT Central, a comprehensive resource on computerized testing (http:/ / www.psych. umn. edu/ psylabs/
Conformity assessment
Conformity assessment, also known as compliance assessment




, is any activity to determine, directly or
indirectly, that a process, product, or service meets relevant technical standards and fulfills relevant requirements.
Conformity assessment activities may include:
• Testing
• Surveillance
• Inspection
• Auditing
• Certification
• Registration
• Accreditation
Additionally, the World Trade Organisation (WTO) governs conformity assessment through the Agreement on
Mutual Recognition in Relation to Conformity Assessment (Signed July 4, 2000)
The international standards of the topic are published by ISO and covered in the divisions of ICS 03.120.20 for
and ICS 23.040.01 for technical
. Other standalone ISO standards for the topic include
• ISO/TR 13881:2000 Petroleum and natural gas industries—Classification and conformity assessment of
products, processes and services
• ISO 18436-4:2008 Condition monitoring and diagnostics of machines—Requirements for qualification and
assessment of personnel—Part 4: Field lubricant analysis
• ISO/IEC 18009:1999 Information technology—Programming languages—Ada: Conformity assessment of a
language processor
Conformity assessment
Notes and references
[1] International Committee for Information Technology Standards. "Report on Issues for Harmonizing Conformity Assessment to
Biometric Standards" (http:// www. incits. org/tc_home/ m1htm/ docs/ m1050067.pdf). . Retrieved 12 April 2009.
[2] International Software Benchmarking Standards Group. "ISO Standard for Functional Size Measurement" (http:// www.isbsg.
org/ISBSGnew. nsf/ WebPages/ C5908EE2873DA659CA25734E00175B3C). . Retrieved 12 April 2009.
[3] NIST. "Part 3 Testing Requirements - Chapter 5: Test Methods" (http:/ / xsun.sdct. itl. nist. gov/ tgdc/ vvsg/ part3/chapter05.php). .
Retrieved 12 April 2009.
[4] ISO/IEC Guide 2:2004 (http:// www.iso. ch/ iso/ en/ CatalogueDetailPage. CatalogueDetail?CSNUMBER=39976& ICS1=1&
[5] International Organization for Standardization. "03.120.20: Product and company certification. Conformity assessment" (http:/ /
www.iso. org/iso/ products/ standards/ catalogue_ics_browse. htm?ICS1=03& ICS2=120&ICS3=20& ). . Retrieved 10 April 2009.
[6] International Organization for Standardization. "23.040.01: Pipeline components and pipelines in general" (http:// www.iso. org/
iso/products/ standards/ catalogue_ics_browse. htm?ICS1=23& ICS2=040&ICS3=01& ). . Retrieved 10 April 2009.
[7] International Organization for Standardization. "ISO/TR 13881:2000 Petroleum and natural gas industries -- Classification and
conformity assessment of products, processes and services" (http:/ / www.iso.org/ iso/ iso_catalogue/ catalogue_tc/ catalogue_detail.
htm?csnumber=23137). . Retrieved 10 April 2009.
[8] International Organization for Standardization. "ISO 18436-4:2008 Condition monitoring and diagnostics of machines --
Requirements for qualification and assessment of personnel -- Part 4: Field lubricant analysis" (http:// www.iso. org/iso/
iso_catalogue/catalogue_tc/ catalogue_detail. htm?csnumber=37395). . Retrieved 10 April 2009.
[9] International Organization for Standardization. "ISO/IEC 18009:1999 Information technology -- Programming languages -- Ada:
Conformity assessment of a language processor" (http:// www. iso. org/iso/ iso_catalogue/ catalogue_tc/ catalogue_detail.
htm?csnumber=31051). . Retrieved 10 April 2009.
Consensus decision-making
Consensus decision-making is a group decision making process that seeks not only the agreement of most
participants but also the resolution or mitigation of minority objections. Consensus is defined by Merriam-Webster
as, first – general agreement and, second – group solidarity of belief or sentiment. It has its origin in a Latin word
meaning literally feel together.
It is used to describe both general agreement and the process of getting to such
agreement. Consensus decision-making is thus concerned primarily with that process.
Consensus should not be confused with unanimity.
As a decision-making process, consensus decision-making aims to be:
• Agreement Seeking: A consensus decision making process attempts to help everyone get what they need.
• Collaborative: Participants contribute to a shared proposal and shape it into a decision that meets the concerns of
all group members as much as possible.
• Cooperative: Participants in an effective consensus process should strive to reach the best possible decision for
the group and all of its members, rather than competing for personal preferences.
• Egalitarian: All members of a consensus decision-making body should be afforded, as much as possible, equal
input into the process. All members have the opportunity to present, and amend proposals.
• Inclusive: As many stakeholders as possible should be involved in the consensus decision-making process.
• Participatory: The consensus process should actively solicit the input and participation of all decision-makers.
Consensus decision-making
Alternative to common decision-making practices
Consensus decision making is an alternative to commonly practiced non-collaborative decision making processes.
Robert's Rules of Order, for instance, is a process used by many organizations. The goal of Robert’s Rules is to
structure the debate and passage of proposals that win approval through majority vote. This process does not
emphasize the goal of full agreement. Critics of Robert’s Rules believe that the process can involve adversarial
debate and the formation of competing factions. These dynamics may harm group member relationships and
undermine the ability of a group to cooperatively implement a contentious decision.
Consensus decision making is also an alternative to “top-down” decision making, commonly practiced in
hierarchical groups. Top-down decision making occurs when leaders of a group make decisions in a way that does
not include the participation of all interested stakeholders. The leaders may (or may not) gather input, but they do not
open the deliberation process to the whole group. Proposals are not collaboratively developed, and full agreement is
not a primary objective. Critics of top-down decision making believe the process fosters incidence of either
complacency or rebellion among disempowered group members. Additionally, the resulting decisions may overlook
important concerns of those directly affected. Poor group relationship dynamics and decision implementation
problems may result.
Consensus decision making addresses the problems of both Robert’s Rules of Order and top-down models. The
outcomes of the consensus process include:
• Better Decisions: Through including the input of all stakeholders the resulting proposals can best address all
potential concerns.
• Better Implementation: A process that includes and respects all parties, and generates as much agreement as
possible sets the stage for greater cooperation in implementing the resulting decisions.
• Better Group Relationships: A cooperative, collaborative group atmosphere fosters greater group cohesion and
interpersonal connection.
Decision rules
The level of agreement necessary to finalize a decision is known as a decision rule.

The range of possible
decision rules varies within the following range:
• Unanimous agreement
• Unanimity minus one vote
• Unanimity minus two votes
• Super majority thresholds (90%, 80%, 75%, two-thirds, and 60% are common).
• Simple majority
• Executive committee decides
• Person-in-charge decides
Some groups require unanimous consent (unanimity) to approve group decisions. If any participant objects, he can
block consensus according to the guidelines described below. These groups use the term consensus to denote both
the discussion process and the decision rule. Other groups use a consensus process to generate as much agreement as
possible, but allow decisions to be finalized with a decision rule that does not require unanimity.
Consensus blocking and other forms of dissent
Groups that require unanimity allow individual participants the option of blocking a group decision. This provision
motivates a group to make sure that all group members consent to any new proposal before it is adopted. Proper
guidelines for the use of this option, however, are important. The ethics of consensus decision making encourage
participants to place the good of the whole group above their own individual preferences. When there is potential for
a group decision to be blocked, both the group and any dissenters in the group are encouraged to collaborate until
Consensus decision-making
agreement can be reached. Simply vetoing a decision is not considered a responsible use of consensus blocking.
Some common guidelines for the use of consensus blocking include:

• Limiting the option to block consensus to issues that are fundamental to the group’s mission or potentially
disastrous to the group.
• Providing an option for those who do not support a proposal to “stand aside” rather than block.
• Requiring two or more people to block for a proposal to be put aside.
• Requiring the blocking party to supply an alternative proposal or a process for generating one.
• Limiting each person’s option to block consensus to a handful of times in one’s life.
Dissent options
When a participant does not support a proposal, he does not necessarily need to block it. When a call for consensus
on a motion is made, a dissenting delegate has one of three options:
• Declare reservations: Group members who are willing to let a motion pass but desire to register their concerns
with the group may choose "declare reservations." If there are significant reservations about a motion, the
decision-making body may choose to modify or re-word the proposal.
• Stand aside: A "stand aside" may be registered by a group member who has a "serious personal disagreement"
with a proposal, but is willing to let the motion pass. Although stand asides do not halt a motion, it is often
regarded as a strong "nay vote" and the concerns of group members standing aside are usually addressed by
modifications to the proposal. Stand asides may also be registered by users who feel they are incapable of
adequately understanding or participating in the proposal.


• Block: Any group member may "block" a proposal. In most models, a single block is sufficient to stop a proposal,
although some measures of consensus may require more than one block (see previous section, "Non-unanimous
or modified consensus"). Blocks are generally considered to be an extreme measure, only used when a member
feels a proposal "endanger[s] the organization or its participants, or violate[s] the mission of the organization"
(i.e., a principled objection). In some consensus models, a group member opposing a proposal must work with its
proponents to find a solution that will work for everyone.

Agreement vs. consent
Unanimity is achieved when the full group consents to a decision. Giving consent does not necessarily mean that the
proposal being considered is one’s first choice. Group members can vote their consent to a proposal because they
choose to cooperate with the direction of the group, rather than insist on their personal preference. Sometimes the
vote on a proposal is framed, “Is this proposal something you can live with?” This relaxed threshold for a yes vote
can help make unanimity more easily achievable.
Another method to achieve unanimity is by using a special kind of voting process under which all members of the
group have a strategic incentive to agree rather than block.
Consensus decision-making
There are multiple stepwise models of how to make decisions by consensus. They vary in the amount of detail the
steps describe. They also vary depending on how decisions are finalized. The basic model involves
• collaboratively generating a proposal,
• identifying unsatisfied concerns, and then
• modifying the proposal to generate as much agreement as possible.
After a concerted attempt at generating full agreement, the group can then apply its final decision rule to determine if
the existing level of agreement is sufficient to finalize a decision.
Consensus decision-making with consensus blocking
Flowchart of basic consensus decision-making process.
Groups that require unanimity commonly use a core set
of procedures depicted in this flow chart.


Once an agenda for discussion has been set and,
optionally, the ground rules for the meeting have been
agreed upon, each item of the agenda is addressed in
turn. Typically, each decision arising from an agenda
item follows through a simple structure:
• Discussion of the item: The item is discussed with
the goal of identifying opinions and information on
the topic at hand. The general direction of the group
and potential proposals for action are often
identified during the discussion.
• Formation of a proposal: Based on the discussion
a formal decision proposal on the issue is presented
to the group.
• Call for consensus: The facilitator of the
decision-making body calls for consensus on the
proposal. Each member of the group usually must
actively state their agreement with the proposal,
often by using a hand gesture or raising a colored
card, to avoid the group interpreting silence or
inaction as agreement.
• Identification and addressing of concerns: If consensus is not achieved, each dissenter presents his or her
concerns on the proposal, potentially starting another round of discussion to address or clarify the concern.
• Modification of the proposal: The proposal is amended, re-phrased or ridered in an attempt to address the
concerns of the decision-makers. The process then returns to the call for consensus and the cycle is repeated until
a satisfactory decision is made.
The consensus decision-making process often has several roles which are designed to make the process run more
effectively. Although the name and nature of these roles varies from group to group, the most common are the
facilitator, a timekeeper, an empath and a secretary or notes taker. Not all decision-making bodies use all of these
roles, although the facilitator position is almost always filled, and some groups use supplementary roles, such as a
Devil's advocate or greeter. Some decision-making bodies opt to rotate these roles through the group members in
order to build the experience and skills of the participants, and prevent any perceived concentration of power.
Consensus decision-making
The common roles in a consensus meeting are:
• Facilitator: As the name implies, the role of the facilitator is to help make the process of reaching a consensus
decision easier. Facilitators accept responsibility for moving through the agenda on time; ensuring the group
adheres to the mutually agreed-upon mechanics of the consensus process; and, if necessary, suggesting alternate
or additional discussion or decision-making techniques, such as go-arounds, break-out groups or role-playing.
Some consensus groups use two co-facilitators. Shared facilitation is often adopted to diffuse the perceived
power of the facilitator and create a system whereby a co-facilitator can pass off facilitation duties if he or she
becomes more personally engaged in a debate.
• Timekeeper: The purpose of the timekeeper is to ensure the decision-making body keeps to the schedule set in
the agenda. Effective timekeepers use a variety of techniques to ensure the meeting runs on time including: giving
frequent time updates, ample warning of short time, and keeping individual speakers from taking an excessive
amount of time.
• Empath or 'Vibe Watch': The empath, or 'vibe watch' as the position is sometimes called, is charged with
monitoring the 'emotional climate' of the meeting, taking note of the body language and other non-verbal cues of
the participants. Defusing potential emotional conflicts, maintaining a climate free of intimidation and being
aware of potentially destructive power dynamics, such as sexism or racism within the decision-making body, are
the primary responsibilities of the empath.
• Note taker: The role of the notes taker or secretary is to document the decisions, discussion and action points of
the decision-making body.
Near-unanimous consensus
Healthy consensus decision-making processes usually encourage and out dissent early, maximizing the chance of
accommodating the views of all minorities. Since unanimity may be difficult to achieve, especially in large groups,
or unanimity may be the result of coercion, fear, undue persuasive power or eloquence, inability to comprehend
alternatives, or plain impatience with the process of debate, consensus decision making bodies may use an
alternative benchmark of consensus. These include the following: (citation needed)
• Unanimity minus one (or U−1), requires all delegates but one to support the decision. The individual dissenter
cannot block the decision although he or she may be able to prolong debate (e.g. via a filibuster). The dissenter
may be the ongoing monitor of the implications of the decision, and their opinion of the outcome of the decision
may be solicited at some future time. Betting markets in particular rely on the input of such lone dissenters. A
lone bettor against the odds profits when his or her prediction of the outcomes proves to be better than that of the
majority. This disciplines the market's odds.
• Unanimity minus two (or U−2), does not permit two individual delegates to block a decision and tends to curtail
debate with a lone dissenter more quickly. Dissenting pairs can present alternate views of what is wrong with the
decision under consideration. Pairs of delegates can be empowered to find the common ground that will enable
them to convince a third, decision-blocking, decision-maker to join them. If the pair are unable to convince a third
party to join them, typically within a set time, their arguments are deemed to be unconvincing.
• Unanimity minus three, (or U−3), and other such systems recognize the ability of four or more delegates to
actively block a decision. U−3 and lesser degrees of unanimity are usually lumped in with statistical measures of
agreement, such as: 80%, mean plus one sigma, two-thirds, or majority levels of agreement. Such measures
usually do not fit within the definition of consensus.
• Rough Consensus is a process with no specific rule for "how much is enough." Rather, the question of consensus
is left to the judgment of the group chair (an example is the IETF working group, discussed below). While this
makes it more difficult for a small number of disruptors to block a decision, it puts increased responsibility on the
chair, and may lead to divisive debates about whether rough consensus has in fact been correctly identified.
Consensus decision-making
Historical examples
Perhaps the oldest example of consensus decision-making is the Iroquois Confederacy Grand Council, or
Haudenosaunee, who have traditionally used consensus in decision-making using a 75% super majority to finalize

potentially as early as 1142.
Examples of consensus decision-making can likely be found among
many indigenous peoples, such as the African Bushmen.
Although the modern popularity of consensus
decision-making in Western society dates from the women's liberation movement
and anti-nuclear movement
of the 1970s, the origins of formal consensus can be traced significantly farther back.
Anthropologically, an early practical example from Babylon appeared during a massive awakening amongst the tribe
of Abraham, which decided to unite around one principle of Mutual Guarantee
("Arvut" in Hebrew)
. First,
Abraham allowed for their cooperative self-organization to form and then he taught them the quality of Mercy
("Hesed" in Hebrew) and how to unite by having each and every member openly express their desire and
due-diligence intention for the acceptance of the rule or law of "Arvut" (mutual guarantee) at their own free will. The
only required commitment (accepted with out force) of each member was to put the collective desires in front of
their own self-interest desires and only out of this summation of agreements between all participants, the guarantee
itself would emerge and would hence promote the well-being of the whole group.
The most notable of early Western consensus practitioners are the Religious Society of Friends, or Quakers, who
adopted the technique as early as the 17th century. The Anabaptists, or Mennonites, too, have a history of using
consensus decision-making
and some believe Anabaptists practiced consensus as early as the Martyrs' Synod of
Some Christians trace consensus decision-making back to the Bible. The Global Anabaptist Mennonite
Encyclopedia references, in particular, Acts 15
as an example of consensus in the New Testament.
Specific models
Quaker model
Quaker-based consensus
is effective because it puts in place a simple, time-tested structure that moves a group
towards unity. The Quaker model has been employed in a variety of secular settings. The process allows for
individual voices to be heard while providing a mechanism for dealing with disagreements.

The following aspects of the Quaker model can be effectively applied in any consensus decision-making process,
and is an adaptation prepared by Earlham College:
• Multiple concerns and information are shared until the sense of the group is clear.
• Discussion involves active listening and sharing information.
• Norms limit number of times one asks to speak to ensure that each speaker is fully heard.
• Ideas and solutions belong to the group; no names are recorded.
• Differences are resolved by discussion. The facilitator ("clerk" or "convenor" in the Quaker model) identifies
areas of agreement and names disagreements to push discussion deeper.
• The facilitator articulates the sense of the discussion, asks if there are other concerns, and proposes a "minute" of
the decision.
• The group as a whole is responsible for the decision and the decision belongs to the group.
• The facilitator can discern if one who is not uniting with the decision is acting without concern for the group or in
selfish interest.
• Dissenters' perspectives are embraced.
Key components of Quaker-based consensus include a belief in a common humanity and the ability to decide
together. The goal is "unity, not unanimity." Ensuring that group members speak only once until others are heard
encourages a diversity of thought. The facilitator is understood as serving the group rather than acting as
In the Quaker model, as with other consensus decision-making processes, by articulating the
Consensus decision-making
emerging consensus, members can be clear on the decision, and, as their views have been taken into account, will be
likely to support it.
CODM Model
The Consensus-Oriented Decision-Making
model offers a detailed step-wise description of consensus process. It
can be used with any type of decision rule. It outlines the process of how proposals can be collaboratively built with
full participation of all stakeholders. This model allows groups to be flexible enough to make decisions when they
need to, while still following a format that is based on the primary values of consensus decision making. The CODM
steps include:
1. Framing the topic
2. Open Discussion
3. Identifying Underlying Concerns
4. Collaborative Proposal Building
5. Choosing a Direction
6. Synthesizing a Final Proposal
7. Closure
Japanese companies normally use consensus decision making, meaning that everyone in the company is consulted on
each decision. A ringi-sho is a circulation document used to obtain agreement. It must first be signed by the lowest
level manager, and then upwards, and may need to be revised and the process started over.
IETF rough consensus model
In the Internet Engineering Task Force (IETF), decisions are assumed to be taken by "rough consensus".
IETF has studiously refrained from defining a mechanical method for verifying such consensus, apparently in the
belief that any such codification will lead to attempts to "game the system." Instead, a working group (WG) chair or
BoF chair is supposed to articulate the "sense of the group."
One tradition in support of rough consensus is the tradition of humming rather than (countable) hand-raising; this
allows a group to quickly tell the difference between "one or two objectors" or a "sharply divided community",
without making it easy to slip into "majority rule".
Much of the business of the IETF is carried out on mailing lists, where all parties can speak their view at all times.
Other modern examples
The ISO process for adopting new standards is called consensus-based decision making,
even though in practice,
it is a complex voting process with significant supermajorities needed for agreement.
Overlaps with deliberative methods
Consensus decision-making models overlap significantly with deliberative methods, which are processes for
structuring discussion that may or may not be a lead-in to a decision.
Tools and methods
Colored cards
Some consensus decision-making bodies use a system of colored cards to speed up and ease the consensus process.
Most often, each member is given a set of three colored cards: red, yellow and green. The cards can be raised during
Consensus decision-making
the process to indicate the member's input. Cards can be used during the discussion phase as well as during a call for
consensus. The cards have different meanings depending on the phase in which they are used.

The meaning of
the colors are:
• Red: During discussion, a red card is used to indicate a point of process or a breach of the agreed upon
procedures. Identifying offtopic discussions, speakers going over allowed time limits or other breaks in the
process are uses for the red card. During a call for consensus, the red card indicates the member's opposition
(usually a "principled objection") to the proposal at hand. When a member, or members, use a red card, it
becomes their responsibility to work with the proposing committee to come up with a solution that will work for
• Yellow: In the discussion phase, the yellow card is used to indicate a member's ability to clarify a point being
discussed or answer a question being posed. Yellow is used during a call for consensus to register a stand aside to
the proposal or to formally state any reservations.
• Green: A group member can use a green card during discussion to be added to the speakers list. During a call for
consensus, the green card indicates consent.
Some decision-making bodies use a modified version of the colored card system with additional colors, such as
orange to indicate a non-blocking reservation stronger than a stand-aside.
Hand signals
Hand signals are often used by consensus decision-making bodies as a way for group members to nonverbally
indicate their opinions or positions. Although the nature and meaning of individual gestures varies from group to
group, there is a widely-adopted core set of hand signals. These include: wiggling of the fingers on both hands, a
gesture sometimes referred to as "twinkling", to indicate agreement; raising a fist or crossing both forearms with
hands in fists to indicate a block or strong disagreement; and making a "T" shape with both hands, the "time out"
gesture, to call attention to a point of process or order.


One common set of hand signals is called the
"Fist-to-Five" or "Fist-of-Five". In this method each member of the group can hold up a fist to indicate blocking
consensus, one finger to suggest changes, two fingers to discuss minor issues, three fingers to indicate willingness to
let issue pass without further discussion, four fingers to affirm the decision as a good idea, and five fingers to
volunteer to take a lead in implementing the decision.
Another common set of hand signals used is the "Thumbs" method, where Thumbs Up = agreement; Thumbs
Sideways = have concerns but won't block consensus; and Thumbs Down = I don't agree and I won't accept this
proposal. This method is also useful for "straw polls" to take a quick reading of the group's overall sentiment for the
active proposal.
Dotmocracy sheets
Completed Dotmocracy sheet
Dotmocracy sheets are designed to compliment a consensus
decision-making process by providing a simple way to visibly
document levels of agreement among participants on a large variety of
Participants write down ideas on paper forms called
Dotmocracy sheets and fill in one dot per sheet to record their opinion
of each idea on a scale of “strong agreement”, “agreement”, “neutral”,
“disagreement”, “strong disagreement” or “confusion”. Participants sign
each sheet they dot and may add brief comments. The result is a
graph-like visual representation of the group's collective opinions on
each idea.
Consensus decision-making
The Step-by-Step Process and Rules defined in the Dotmocracy Handbook
reinforce consensus decision-making
by promoting equal opportunity, open discussion, the drafting of many proposals, the identification of concerns and
the encouragement of idea modification.
Fall-back methods
Sometimes some common form of voting such as First-past-the-post is used as a fall-back method when consensus
cannot be reached within a given time frame.
However, if the potential outcome of the fall-back method can be
anticipated, then those who support that outcome have incentives to block consensus so that the fall-back method
gets applied. Special fall-back methods have been developed that reduce this incentive.
The idea of consensus in the abstract
This article focuses strictly on the idea of consensus in the abstract, not on the implications of consensus for politics
or economics, where follow-up action is required.
Consensus as collective thought
A close equivalent phrase might be the "collective agreement" of a group, keeping in mind that a high degree of
variation is still possible among individuals, and certainly if there must be individual commitment to follow up the
decision with action, this variation remains important. There is considerable debate and research into both collective
intelligence and consensus decision-making.
Consensus usually involves collaboration, rather than compromise. Instead of one opinion being adopted by a
plurality, stakeholders are brought together (often with facilitation) until a convergent decision is developed. If this is
done in a purely mechanical way it can result in simple trading—we'll sacrifice this if you'll sacrifice that. Genuine
consensus typically requires more focus on developing the relationships among stakeholders, so that they work
together to achieve agreements based on willing consent.
Abstract models of consensus
The most common and most successful model of consensus is called the prisoner's dilemma. An introduction and
discussion of this concept can be found in any contemporary introduction to political science. This approach might
be called "algebraic" as opposed to analytic, within mathematics, because it represents an agent by a symbol and then
examines the algebraic properties of that symbol. For example, the question, "Can two agents be combined to make a
new agent?" sounds like an algebraic question. (More formally, "is the operation of consensus closed in the domain
of agents? Is there a larger domain of "abstract agents" in which this operation is closed?")
In a more analytic style, we might naively start by envisioning the distribution of opinions in a population as a
Gaussian distribution in one parameter. We would then say that the initial step in a consensus process would be the
written or spoken synthesis that represents the range of opinions within perhaps three standard deviations of the
mean opinion. Other standards are possible, e.g. two standard deviations, or one, or a unanimity minus a certain
tolerable number of dissenters. The following steps then operate both to check understanding of the different
opinions (parameter values), and then to find new parameters in the multi-dimensional parameter space of all
possible decisions, through which the consensus failure in one-dimensional parameter space can be replaced by a
solution in multi-dimensional parameter space.
An alternative, qualitative, mathematical description is to say that there is an iterative process through
(m+n)-dimensional parameter space, starting from initial guesses at a solution in (m)-dimensional parameter space,
which tries to converge to find a common solution in (m+n)-dimensional parameter space.
A criticism of such modeling is that the opinions or agreements are only theoretical, and that the strength or degree
of conviction as measured is not closely correlated to the willingness of any given individual to take action. In direct
Consensus decision-making
action politics, the consensus is constantly tested by asking those who agree to immediately place their own bodies
'on the line' and in harm's way, to actually demonstrate that they are committed to a consensus. The ecology
movement, peace movement, and labor movement have historically required such demonstrations of commitment.
Some have disdained any attempt at formal models or methods, but others have prepared extensive documentation
on both formal and informal consensus decision-making processes.
Typically, the usefulness of formal models of consensus is confined to cases where follow up action is closely and
centrally controlled, e.g. in a military hierarchy or a set of similar computer programs executing on hardware that it
completely controls. The idea of consensus itself is probably quite different when considering action by a group of
independent human agents, or considering action by those taking orders and committed to executing them all without
question, or suffering great harm or exile for any disobedience.
Consensus upon a particular formal model of consensus can lead to groupthink, by making it harder for those who
reject that formal model (and using informal or different models) to be heard. This recursion suggests the extreme
complexity of reasoning about consensus in a political context. An example is the peace movement's objection to the
game theory logic of mutual assured destruction during the Cold War. Peace activists, objecting to military goals and
spending found the formal models of the military to be major obstacles. As they had not mastered game theory
models they simply were not heard.
In democracy
As this example suggests, the concept of consensus is a particularly important one in the context of society and
government, and forms a cornerstone of the concept of democracy. Democracy, in its most essential form, direct
democracy, has been criticized by a significant number of scholars since the time of Plato as well as adherents to
strict republican principles, and is sometimes referred to as the "tyranny of the majority", with the implication that
one faction of the society is dominating other factions, possibly repressively.
Others, however, argue that if the democracy adheres to principles of consensus, becoming a deliberative democracy,
then party or factional dominance can be minimized and decisions will be more representative of the entire society.
This too is discussed in depth in the article on consensus decision-making, with many actual examples of the
tradeoffs and different tests for consensus used in actual societies and polities.
A major cornerstone of the Westminster System is Cabinet Government. All Cabinet decisions are consensual
collective and inclusive, a vote is never taken in a Cabinet meeting. All ministers, whether senior and in the Cabinet,
or junior ministers, must support the policy of the government publicly regardless of any private reservations. If a
minister does not agree with a decision, he or she may resign from the government; as did several British ministers
over the 2003 Invasion of Iraq. This means that in the Westminster system of government the cabinet always
collectively decides all decisions and all ministers are responsible for arguing in favour of any decision made by the
See also: Criticisms of Consensus decision-making.
Examples within computing
Within the Internet Engineering Task Force (IETF), the concept of "rough consensus and running code" is the basis
for the standardization process. It has proven extremely effective for standardizing protocols for inter-computer
communication, particularly during its early years.
In computer science, consensus is a distributed computing problem in which a group of nodes must reach agreement
on a single value. Achieving consensus is a challenging problem in distributed systems, particularly as the number of
nodes grows or the reliability of links between nodes decreases.
"Consensus" may also refer to the Consensus theorems in Boolean algebra.
Consensus decision-making
Examples of non-consensus
The peer review process in most scientific journals does not use a consensus based process. Referees submit their
opinions individually and there is not a strong effort to reach a group opinion.
Consensus blocking
Critics of consensus blocking often observe that the option, while potentially effective for small groups of motivated
or trained individuals with a sufficiently high degree of affinity, has a number of possible shortcomings, notably
• Preservation of the Status quo: In decision-making bodies that use formal consensus, the ability of individuals
or small minorities to block agreement gives an enormous advantage to anyone who supports the existing state of
affairs. This can mean that a specific state of affairs can continue to exist in an organization long after a majority
of members would like it to change.
The incentive to block can however be removed by using a special kind of
voting process.
• Susceptibility to widespread disagreement: Giving the right to block proposals to all group members may result
in the group becoming hostage to an inflexible minority or individual. When a popular proposal is blocked the
group actually experiences widespread disagreement, the opposite of the consensus process's goal. Furthermore,
"opposing such obstructive behavior [can be] construed as an attack on freedom of speech and in turn [harden]
resolve on the part of the individual to defend his or her position."
As a result, consensus decision-making has
the potential to reward the least accommodating group members while punishing the most accommodating.
Consensus decision-making
• Abilene paradox: Consensus decision-making is susceptible to all forms of groupthink, the most dramatic being
the Abilene paradox. In the Abilene paradox, a group can unanimously agree on a course of action that no
individual member of the group desires because no one individual is willing to go against the perceived will of the
decision-making body.
• Time Consuming: Since consensus decision-making focuses on discussion and seeks the input of all participants,
it can be a time-consuming process. This is a potential liability in situations where decisions need to be made
speedily or where it is not possible to canvass the opinions of all delegates in a reasonable period of time.
Additionally, the time commitment required to engage in the consensus decision-making process can sometimes
act as a barrier to participation for individuals unable or unwilling to make the commitment.
However, once a
decision has been reached it can be acted on more quickly than a decision handed down. American businessmen
complained that in negotiations with a Japanese company, they had to discuss the idea with everyone even the
janitor, yet once a decision was made the Americans found the Japanese were able to act much quicker because
everyone was on board, while the Americans had to struggle with internal opposition.
Majority voting processes
Proponents of consensus decision-making view procedures that use majority rule as undesirable for several reasons.
Majority voting is regarded as competitive, rather than cooperative, framing decision-making in a win/lose
dichotomy that ignores the possibility of compromise or other mutually beneficial solutions.
Carlos Santiago
Nino, on the other hand, has argued that majority rule leads to better deliberation practice than the alternatives,
because it requires each member of the group to make arguments that appeal to at least half the participants.
Lijphart reaches the same conclusion about majority rule, noting that majority rule encourages coalition-building.
Additionally, proponents of consensus argue that majority rule can lead to a 'tyranny of the majority'. Voting
theorists argue that majority rule may actually prevent tyranny of the majority, in part because it maximizes the
potential for a minority to form a coalition that can overturn an unsatisfactory decision.
Consensus decision-making
Advocates of consensus would assert that a majority decision reduces the commitment of each individual
decision-maker to the decision. Members of a minority position may feel less commitment to a majority decision,
and even majority voters who may have taken their positions along party or bloc lines may have a sense of reduced
responsibility for the ultimate decision. The result of this reduced commitment, according to many consensus
proponents, is potentially less willingness to defend or act upon the decision.
[1] http:/ / www. merriam-webster.com/ dictionary/ consensus
[2] Joseph Michael Reagle, Jr.; Lawrence Lessig (30 September 2010). Good Faith Collaboration: The Culture of Wikipedia (http:// books.
google. com/ books?id=ml7SlTq8XvIC& pg=PA100). MIT Press. p. 100. ISBN 9780262014472. . Retrieved 10 June 2011.
[3] http:/ / www. consensusdecisionmaking. org/
[4] Hartnett, T. (2011). Consensus-Oriented Decision Making. Gabriola Island, BC, Canada:New Society Publishers.
[5] Rob Sandelin. "Consensus Basics, Ingredients of successful consensus process" (http:/ / www.ic.org/nica/ Process/ Consensusbasics.
htm#Ingredients). Northwest Intentional Communities Association guide to consensus. Northwest Intentional Communities Association. .
Retrieved 2007-01-17.
[6] http:/ / www. groupfacilitation.net/ Articles%20on%20Meeting%20Facilitation.html
[7] Kaner, S. (2011). Facilitator's Guide to Participatory Decision-making. San Francisco, CA:Jossey-Bass.
[8] Christian, D. Creating a Life Together: Practical Tools to Grow Ecovillages and Intentional Communities. (2003). Gabriola Island, BC,
Canada:New Society Publishers.
[9] Richard Bruneau (2003). "If Agreement Cannot Be Reached" (http:// web.archive.org/web/ 20070927025409/ http:/ / www.augustana. ca/
rdx/ bruneau/ documents/ PDM+ in+an+ Intercultural+context.doc) (DOC). Participatory Decision-Making in a Cross-Cultural Context.
Canada World Youth. p. 37. Archived from the original (http:/ / www.augustana. ca/ rdx/bruneau/ documents/ PDM in an Intercultural
context. doc) on September 27, 2007. . Retrieved 2007-01-17.
[10] Consensus Development Project (1998). "FRONTIER: A New Definition" (http:// www.frontierus.org/ documents/ consensus. htm).
Frontier Education Center. . Retrieved 2007-01-17.
[11] Rachel Williams; Andrew McLeod (2006). "Introduction to Consensus Decision Making" (http:/ / www. nwcdc.coop/ Resources/ CSS/
CSSIntro2Consensus. pdf) (PDF). Cooperative Starter Series. Northwest Cooperative Development Center. . Retrieved 2007-01-17.
[12] Dorcas; Ellyntari (2004). "Amazing Graces' Guide to Consensus Process" (http:// www.webofoz.org/consensus. shtml). . Retrieved
[14] "The Consensus Decision Process in Cohousing" (http:/ / www.cohousing.ca/ consensus. htm). Canadian Cohousing Network. . Retrieved
[15] Heitzig J, Simmons FW (2010). Some Chance For Consensus (http:/ / dx. doi. org/10. 1007/ s00355-010-0517-y) Soc Choice Welf 35.
[16] C.T. Lawrence Butler; Amy Rothstein. "On Conflict and Consensus" (http:/ / www.consensus. net/ ocac2.htm). Food Not Bombs
Publishing. . Retrieved 2007-01-17.
[17] "What is Consensus?" (http:/ / web. archive. org/web/ 20061015105352/ http:/ / www. thecommonplace.org.uk/ information.
php?page=articles&iID=4). The Common Place. 2005. Archived from the original (http:// www.thecommonplace.org. uk/ information.
php?page=articles& iID=4) on October 15, 2006. . Retrieved 2007-01-17.
[18] "The Process" (http:// seedsforchange. org. uk/ free/consens#proc). Consensus Decision Making. Seeds for Change. 2005-12-01. .
Retrieved 2007-01-17.
[19] Sheila Kerrigan (2004). "How To Use a Consensus Process To Make Decisions" (http:// www.communityarts. net/ readingroom/
archivefiles/2004/09/ how_to_use_a_co. php). Community Arts Network. . Retrieved 2007-01-17.
[20] Lori Waller. "Guides: Meeting Facilitation" (http:/ / www. otesha.ca/ bike+ tours/ guides/ meeting+ facilitation.en.
html#toc_putting_on_your_facilitator_hat). The Otesha Project. . Retrieved 2007-01-17.
[21] Berit Lakey (1975). "Meeting Facilitation – The No-Magic Method" (http:// www.reclaiming.org/ resources/ consensus/ blakey. html).
Network Service Collaboration. . Retrieved 2007-01-17.
[22] "How Does the Grand Council Work?" (http:/ / sixnations. buffnet.net/ Great_Law_of_Peace/?article=how_does_grand_council_work).
Great Law of Peace. . Retrieved 2007-01-17.
[23] M. Paul Keesler (2004). "League of the Iroquois" (http:// www. paulkeeslerbooks.com/ Chap5Iroquois. html). Mohawk – Discovering the
Valley of the Crystals. . Retrieved 2007-01-18.
[24] Bruce E. Johansen (1995). "Dating the Iroquois Confederacy" (http:// www.ratical.org/many_worlds/ 6Nations/ DatingIC.html).
Akwesasne Notes. . Retrieved 2007-01-17.
[25] United Nations (2002). "Consensus Tradition can Contribute to Conflict Resolution, Secretary-General Says in Indigenous People's Day
Message" (http:/ / www. un. org/News/ Press/ docs/ 2002/ sgsm8332. doc.htm). Press release. . Retrieved 2007-01-17.
[26] David Graeber; Andrej Grubacic (2004). "Anarchism, Or The Revolutionary Movement Of The Twenty-first Century" (http:/ / www.zmag.
org/ content/ showarticle. cfm?ItemID=4796). ZNet. . Retrieved 2007-01-17.
[27] Sanderson Beck (2003). "Anti-Nuclear Protests" (http:/ / san. beck.org/GPJ29-AntiNuclearProtests.html). Sanderson Beck. . Retrieved
Consensus decision-making
[28] Ethan Mitchell (2006). "Participation in Unanimous Decision-Making: The New England Monthly Meetings of Friends" (http:/ / www.
philica.com/ display_article. php?article_id=14). Philica. . Retrieved 2007-01-17.
[29] "Error: no |title= specified when using {{[[Template:Cite web|Cite web (http:/ / www.aecm.be/ en/ guarantee-societies.
html?IDC=34)]}}"]. .
[30] "Error: no |title= specified when using {{[[Template:Cite web|Cite web (http:/ / books. google.com/ books?id=n3SF58ibuSgC&
lpg=PA251&ots=PP_gXfrngs&dq=mutual guarantee arvut&pg=PA251#v=onepage&q=mutual guarantee arvut&f=false)]}}"]. .
[31] Abe J. Dueck (1990). "Church Leadership: A Historical Perspective" (http:// www.directionjournal.org/article/ ?676). Direction. .
Retrieved 2007-01-17.
[32] Ralph A Lebold (1989). "Consensus" (http:// web. archive.org/web/ 20070313044601/http:/ / www.gameo. org/ index. asp?content=http:/
/www. gameo.org/encyclopedia/ contents/ C6667ME. html). Global Anabaptist Mennonite Encyclopedia Online. Global Anabaptist
Mennonite Encyclopedia Online. Archived from the original (http:/ / www.gameo.org/ index. asp?content=http:/ / www.gameo.org/
encyclopedia/contents/ C6667ME. html) on March 13, 2007. . Retrieved 2007-01-17.
[33] Quaker Foundations of Leadership (1999). A Comparison of Quaker-based Consensus and Robert's Rules of Order. (http:/ / www.earlham.
edu/ ~consense/ rrocomp.shtml) Richmond, Indiana: Earlham College. Retrieved on 2009-03-01.
[34] Woodrow, P. (1999). "Building Consensus Among Multiple Parties: The Experience of the Grand Canyon Visibility Transport
Commission." (http:/ / www.earlham.edu/ ~consense/ peterw.shtml) Kellogg-Earlham Program in Quaker Foundations of Leadership.
Retrieved on 2009-03-01.
[35] Berry, F. and M. Snyder (1999). "Notes prepared for Round table: Teaching Consensus-building in the Classroom." (http:/ / www.earlham.
edu/ ~consense/ pateach. shtml) National Conference on Teaching Public Administration, Colorado Springs, Colorado, March 1998. Retrieved
on 2009-03-01.
[36] Quaker Foundations of Leadership (1999). "Our Distinctive Approach (http:/ / www.earlham.edu/ ~consense/ distfea.shtml). Richmond,
Indiana: Earlham College. Retrieved on 2009-03-01.
[37] Maine.gov. What is a Consensus Process? (http:/ / www.maine.gov/ consensus/ ppcm_consensus_home. htm) State of Maine Best
Practices. Retrieved on: 2009-03-01.
[38] http:/ / www. consensusbook. com/ "Consensus-Oriented Decision-Making: The CODM Model for Facilitating Groups to Widespread
[39] Ringi-Sho (http:// www. japanese123. com/ ringisho. htm)
[40] RFC 2418. "IETF Working Group Guidelines and Procedures."
[41] "The Tao of IETF: A Novice's Guide to the Internet Engineering Task Force" (http:// www. ietf.org/tao.html). The Internet Society. 2006.
. Retrieved 2007-01-17.
[42] International Organization for Standardization (September 28, 2000) Report of the ISO Secretary-General to the ISO General Assembly
(http:// www. iso. org/iso/ livelinkgetfile?llNodeId=21553&llVolId=-2000). Retrieved on: April 6, 2008
[43] Andrew Updegrove (August 31, 2007). "The ISO/IEC Voting Process on OOXML Explained (and What Happens Next)" (http:/ /
consortiuminfo.org/ standardsblog/ article. php?story=20070831151800414). . Retrieved 2008-09-13.
[45] "Color Cards" (http:/ / www.mosaic-commons. org/ node/ 44). Mosaic Commons. . Retrieved 2007-01-17.
[46] Jan H; Erikk, Hester, Ralf, Pinda, Anissa and Paxus. "A Handbook for Direct Democracy and the Consensus Decision Process" (http:/ /
www.zhaba.cz/ uploads/ media/ Shared_Path.pdf) (PDF). Zhaba Facilitators Collective. . Retrieved 2007-01-18.
[47] "Hand Signals" (http:// seedsforchange. org.uk/ free/handsig. pdf) (PDF). Seeds for Change. . Retrieved 2007-01-18.
[48] "Guide for Facilitators: Fist-to-Five Consensus-Building" (http:/ / www. freechild.org/ Firestarter/Fist2Five.htm). . Retrieved 2008-02-04.
[49] http:/ / dotmocracy.org Dotmocracy facilitator’s resource website
[50] http:/ / dotmocracy/handbook
[51] Saint S, Lawson JR (1994) Rules for reaching consensus: a modern approach to decision making. Pfeiffer, San Diego
[52] The Common Wheel Collective (2002). "Introduction to Consensus" (http:/ / web.archive.org/ web/ 20060630154451/ http:/ / geocities.
com/collectivebook/ introductiontoconsensus. html). The Collective Book on Collective Process. Archived from the original (http:// www.
geocities.com/ collectivebook/ introductiontoconsensus. html) on 2006-06-30. . Retrieved 2007-01-17.
[53] Alan McCluskey (1999). "Consensus building and verbal desperados" (http:/ / www.connected.org/ govern/consensus. html). . Retrieved
[54] Harvey, Jerry B. (Summer 1974). "The Abilene Paradox and other Meditations on Management". Organizational Dynamics 3 (1): 63.
[55] "Consensus Team Decision Making" (http:/ / www.au. af.mil/ au/ awc/ awcgate/ ndu/ strat-ldr-dm/pt3ch11.html). Strategic Leadership
and Decision Making. National Defense University. . Retrieved 2007-01-17.
[56] The World's Business Cultures and How to Unlock Them 2008 Barry Tomalin, Mike Nicks pg. 109 "Consensus or individually-driven
decision making" ISBN 978-1-85418-369-9
[57] Friedrich Degenhardt (2006). "Consensus: a colourful farewell to majority rule" (http:// web.archive.org/ web/ 20061206132304/ http:/ /
www. oikoumene. org/en/ news/ news-management/ all-news-english/ display-single-english-news/ browse/ 4/ article/1634/
consensus-a-colourful-fa-1. html). World Council of Churches. Archived from the original (http:// www.oikoumene. org/ en/ news/
news-management/ all-news-english/ display-single-english-news/ browse/ 4/ article/1634/ consensus-a-colourful-fa-1.html) on 2006-12-06.
. Retrieved 2007-01-17.
Consensus decision-making
[58] McGann, Anthony J. The Logic of Democracy: Reconciling, Equality, Deliberation, and Minority Protection. Ann Arbor: University of
Michigan Press. 2006. ISBN 0-472-06949-7.
[59] Anthony J. McGann (2002). "The Tyranny of the Supermajority: How Majority Rule Protects Majorities" (http:// repositories. cdlib. org/
cgi/ viewcontent.cgi?article=1001&context=csd) (PDF). Center for the Study of Democracy. . Retrieved 2008-06-09.
External links
• "Consensus-Oriented Decision Making: The CODM Model for Facilitating Groups to Widespread Agreement"
(http:// www. consensusbook. com/ ) ConsensusBook.com
• "A Virtual Learning Center for People Interested in Making Decisions by Consensus" (http:// www.
consensusdecisionmaking. org/) – ConsensusDecisionMaking.Org
• "Articles on Group Facilitation and Consensus Decision Making (http:/ / www.groupfacilitation.net/ Articles on
Meeting Facilitation.html) – GroupFacilitation.Net
• "Consensus Decision Making" (http:/ / seedsforchange. org.uk/ free/consens) – Seeds for Change
• "On Conflict and Consensus. (http:/ / www. ic. org/pnp/ ocac/ )" – C. T. Lawrence Butler and Amy Rothstein
(1987) Food Not Bombs Publishing. Also available in .pdf format (http:// www.wandreilagh. org/consensus.
• "The Formal Consensus Website" (http:/ / www.consensus. net/ ) – Based on work by C. T. Lawrence Butler and
Amy Rothstein
• "Papers on Cooperative Decision-Making" (http:// www.vernalproject.org/papers/ Process. html) – Randy
• "One Vote for Democracy" (http:// www. diemer.ca/ Docs/ Diemer-OneVoteforDemocracy.htm) – Ulli Diemer
• "Some Materials on Consensus." (http:/ / www.earlham.edu/ ~consense/ mats. htm) – Quaker Foundations of
Leadership, 1999. Richmond, Indiana: Earlham College.
• Theory of Consent (http:// home. arcor.de/ danneskjoeld/ F/ E/T/Consent. html) – in a natural order philosophy
(from an anarchocapitalistic point of view)
• Shared Path, Shared Goal (http:/ / www. zhaba.cz/ uploads/ media/ Shared_Path.pdf) – a short pamphlet on
Consensus-seeking decision-making
Consensus-seeking decision-making
Consensus-seeking decision-making (also known as consensus/voting hybrid decision-making) is a term
sometimes used to describe a formal decision process similar to the consensus decision-making variant known as
Formal Consensus but with the additional option of a fallback voting procedure if consensus appears unattainable
during the consensus-seeking phase of the deliberations.
Ideally the fallback voting option is only exercised after all reasonable attempts to address concerns have been
exhausted, although in practice this might be constrained by time limits imposed on the deliberations. When
consensus is deeme