Você está na página 1de 12

Use of the normal distribution in evaluating the learning process

Prof. Aquiles Fernandez V.


Abstract
Talking about subjects related to assessing the learning process, it is frequent to refer to the normal
model of distribution of grades, linking it always with the type of evaluation known as by norm
and taking it aside in a definite way from the evaluation known as by criteria. In addition, it is
usually maintained that the first evaluation method is traditional, whereas the second is a modern
one. However, such an oversimplification is dangerous and may lead users who are not specialists
in
evaluation
to
make
serious
mistakes.
In this article we attempt to achieve a better understanding of the topic.
INTRODUCTION
Often, in the case of issues related to the assessment of learning, reference is made to the normal
distribution model associating ratings indissolubly to the type of evaluation known as "as a rule" and
him away, also a blunt and definitive way, the type of evaluation known as "by criteria". They say,
too, that the assessment "standard" corresponds to a traditional type of assessment, while the
assessment "criteria" corresponds to an updated point of view. Moreover, it is often said that it
would be logical that the distribution of scores may correspond to a probabilistic model after a
period devoted to teaching and learning; supposedly, chance would have no place where there has
been a (non-random) systematic effort to improve levels of knowledge. Instead, it could be
accommodated before that effort had taken place. Some books on general issues of educational
evaluation (which, naturally, does not delve into the underlying theoretical foundations in some of
its procedures) have helped spread the views we have outlined.
Understood things like that, the normal pattern would only be applicable in those cases where the
qualifications must respond to the need to manage a certain group of individuals "best to worst" to
choose from including the "best" without that in this election there is no reference to specific
learning objectives previously identified as goals that should be met. It would not apply, however,
in those other cases where the grades should reflect the degree of agreement between individual
performance and targets set that is to be achieved, regardless of what happens to other qualified
individuals.
The above remarks describe ways of positioning that are quite widespread among evaluators and
learning in a first approximation seems reasonable. However, understanding these issues in a simple
way is dangerous and can lead users who are not specialists in evaluation fall into gross errors.
In this work we try to get into a better understanding of the subject. We seek to clarify, as far as the
theoretical difficulties permit, the darker areas of the whole situation, starting by explaining what
the general use normal probabilistic model; We see, then under what circumstances it applies to
educational evaluation and finally examine their possible relationships with the types of assessment
"criteria" and "standard".
2. USE OF THE NORMAL DISTRIBUTION

The Central Limit Theorem (which makes explicit reference to the standard model) is perhaps the
most important result of all statistical theory developed so far. Discovered by De Moivre in 1733
and later studied by Laplace and Gauss, its final shape was achieved only at the beginning of this
century (Kendall 1980). Although its theoretical importance is very large, the interest also lies in the
huge number of practical applications that allows its generality. They are precisely those
applications which have contributed to "popularize" the normal pattern to make your knowledge is
universal among users of the statistics. So much so that the term "normal" has come to be
understood (wrongly, in the context of statistical theory) as a synonym for "natural" or "customary".
It is true that this may be the usual or appropriate model in many circumstances, but it is also true
that it is not in many others. Therefore, before you want to apply it or put it aside a priori, it would
be necessary to determine whether the circumstances are properly implemented in each case, they
are given or are not.
We state this theorem in the simplest way we can, but first, for a better understanding of it, give
brief explanations of what is normal and the meaning of some of the technical terms (inevitably)
have to use the statement model.
i. A variable is an observable feature in each of the elements of a population study, the natural form
of expression is a number (the term "variable" is usually used with a broader meaning, covering nonnumerical characteristics such as : marital status, national origin, sex, etc .; however, in this context
we will use only the most restrictive sense outlined here). Generally, the value of a variable derived
from a measurement made in the element observed. For its part, the term "random" means
"dependent on chance"; however, in the statistical context it is also understood as "random" effect
resulting from one or more uncontrolled variables, whether known or unknown, although have
nothing to do with chance. The essence of the concept of "random" is that in any particular case,
the resulting number of observation unpredictable. Thus, a random variable is a characteristic that
is expressed by a number whose value, in a particular case because it is unpredictable (even when
it is known that value should stay within certain preset limits).
For example, when we take a written test in a course, we know in advance what will be exactly the
average of that test, although we do know that must be a number between 1 and 7. Obviously, the
level of knowledge achieved by the students during their preparation for the test results will be
decisive and knowledge allow us to predict "more or less" what might happen with the notes, but
the absolute certainty we will slip from their hands at the inevitable influence of other variables
they could escape our control, such as validity and reliability of the instrument used, choosing
specific items considered and not considered in it; circumstantial state of health, concentration,
motivation of each individual student, etc. That average is thus a random variable; so are the
individual notes of each student.
If we go a little further, we could say that the result of any measurement (of any kind) that can not
be achieved with absolute accuracy (and this happens in most cases) is a random variable.
ii. A probability distribution or probability model is a concrete proposal and anticipated behavior of
a random variable, in terms of their possible distribution. In the example we considered, assuming
that our course has 40 students, we could build on our teaching experience and predicting a
distribution of notes as follows:

Notas
1.0-2.49
2.5-3.99
4.00-5.49
5.5-7.00

Frec.
1
6
21
12

%
02.50
15.00
52.50
30.00

While adjacent table is a prediction of what might eventually occur in the future and not a
description of what has already happened, it is a probability model for the random variable
described.
Naturally, effectively result after taking the test may be different than the previous model indicates.
If so, and to the extent that these differences are significant, this means that our prediction was
wrong. Contrary to this, a good probabilistic model must be characterized by their ability to predict
what can be expected to result "regularly" every time the phenomenon that the model represents
with sufficient accuracy to meet any practical purpose occurs. By "regular" we mean that very rarely
a probabilistic model, however good it describes exactly what happens in a particular time that the
phenomenon happens (obviously, if a variable is always behave in a predictable way, she it would
not be random). However, discrepancies that may arise between a good probabilistic model and
achievements of the phenomenon he represents should never be too big or even less systematic.

It is good by using probabilistic models, for example, a casino or an insurance company with
sufficient approximation can plan your income and expenses in the long term even if, perhaps, can
not do it every day.

With a probabilistic model as the one shown in the following table, which is based on the number
of fires in industrial plants that occur weekly in a particular city, an insurance company could plan
their finances (for example) one-year period. In this example, the random variable is the number of
fires in one week. This variable is denoted with the letter X.
x = Number of fires
occur in one week

n(X) = Number of sem. in


which
X fires occur
12

6
ms

17

13

A model like this (assuming that is right) lets you know in advance, for example, that in one year
about 77 accidents occur. Or know that in a "regular" week would be expected to occur one or two
fires such as the weekly average calculated from the table above, it is equal to 1.48.

Information such as the above or other equally practical value are also deductible in the same
probabilistic model are perfectly usable for purposes of forecasting and planning, given its longterm high regularity, even though they do not constitute absolute and irrevocable truths, much less
the case of what might happen (or not happen) in a particular case given. Similarly, the director of
an educational establishment (which always plans his work and using good probabilistic models to
make decisions) can know in advance, for example, the number of students who will be retained in
each of the subjects of the various levels of study, even if they know exactly what will those students;
also know the number of class hours lost for various reasons, the number of students who achieve
levels of excellence, expenditure arising from the use of gyms and laboratories, etc.
1. The expected value of a random variable (which is a certain probabilistic model) is the average
that this variable is in the model. For example, in the case of fires, we said that the average is 1.48.
That is, the expected value of the weekly number of fires is 1.48. In the example of the distribution
of notes, you can check, making such calculations, the expected value of the notes is 4.875. Do not
forget, however, that once the test has actually been made and evaluated, the actual average of the
notes will almost certainly be something different from the expected value of the model.
If the random variable is denoted by X, then its expected value is denoted by E (X). Since the
existence of a probabilistic model is independent and prior to any actual observation of the variable
(X) involved, we can say that E (X) is the average of the variable a priori, of course a large number of
observations.
2. As a model can be calculated as the average (a priori) of the relevant variable (x), it is possible to
calculate a priori, in relation to the same variable, any of the known statisticians such as the median
, fashion, standard deviation, etc. In particular, we call expected variance to the variance calculated
from the distribution given by the model and denote by V (X). In the examples given is possible to
calculate the expected variance of the notes is 1.397 and the expected variance of the weekly
number of fires is 1,442. (Recall that the variance is calculated as the square of the standard
deviation is a measure of the dispersion of the values taken by the variable).
3. We are now able to recall briefly what is the normal model. In Illustration No. 1 we can see a
drawing depicting our famous "bell".

ilustraccion N1
Its symmetrical shape with a central part (where most of the possible observations accumulates)
and two long "tails" in perpetual decline (there is cornered, by either side, the most exceptional
cases) are some of its features most notorious.

The curved shape so characteristic represents a probability distribution, ie a model that explains the
behavior of many random variables. This model has two parameters (which are fixed numbers that
determine fully the model): one position, which marks the central place of the curve corresponds
to the expected value (which can be any number), and a dispersion, which determines both the
"width" and the "high" of the curve and corresponds to the expected variance (which can only take
positive values). Different combinations of these parameters allow the existence of infinite normal
curves differ in terms of their position, shape, or both aspects (see Figure No. 2). Although in all
cases both tails of the curve are infinitely long, the majority of the observations are found around
the expected value at a distance no greater than three standard deviations. If X is a normal random
variable such that E (X) = 0 and v (x) = 1, then we say that it is a standard normal variable. This
distribution is unique (as anyone who has fixed its parameters) and can be found tabulated in any
statistics book.
Finally, we say that two or more random variables are independent if none of them can explain or
be explained by one (or more) of the other, under any circumstances. Although in practice this is a
difficult condition to reliably establish, as usual, however, it is to assume that independence
whenever this seems acceptable enough common sense. For example, if we take again the examples
we used, we can assume that the result of our test is independent of the weekly number of fires
that are declared in the city. No one could test whether these variables are truly independent, but
until there is found the relationship between the two may well assume that they are.
Consider another example: if launched several times a given well balanced, it will not be possible to
predict the number obtained on every pitch, even when we know all the numbers obtained in
previous releases. If this were so, we say that these results are independent of each other and
constitute, therefore, a set of independent random variables.
From the above explanations, we are able to give a statement of the Central Limit Theorem:
Central limit theorem. If you have a large group of independent random variables, say X1, X2, X3, ...
Xn, then the sum of these is a random variable S, which follows an approximately normal
distribution, the expected value and variance which are, respectively:

E (S) = E (X1) + E (X2) + E (X3) + ... + E (Xn)


V (S) = v (X1) + v (X2) + v (X3) + ... + V (Xn)

The approach to the standard model is the better the larger is the set of variables combined. The
theorem is called "limit" because if the number of variables it were infinite added,
3. APPLICATIONS FOR EDUCATIONAL ASSESSMENT
Say, again, that the fundamental assertion in the Central Limit Theorem is that if we add many
independent random variables, the result of that addition should be a normal random variable,

regardless of how they have been incurred or that involve different variables summed. However
this great generality that fully supports the theory, the usual applications (especially educational) is
that the sum is obtained from a single variable that is measured several times independently. It is
understood to be the sum (and not each of the variables combined) the only variable that identifies
theorem as normal.
For example, if in our class of 40 students we could assume that the grades on a test are independent
of each other, so that the mark obtained by a student unconnected with the note obtained by
another student any, then the sum of all those notes will be a normal random variable. As a result,
the average of these notes (that is their sum divided by 40) is also a normal random variable.
Note that in this example we are not making any reference to the test can be taken at the beginning
or end of the year; We do not know if this is a pretest or is the final exam. The graphics of the
Enlightenment # 2 may relate to models suitable to represent the distribution of the GPA on a
pretest and a final exam respectively. In the first case it is provided (for example) an expected value
equal to 1.75, with expected standard deviation equal to 0.25; in the other, an expected value equal
to 5.5, with expected standard deviation equal to 0.50. Note that any reference to the evaluation
form used does not we do not know if those notes come from written and oral questions, if multiple
choice tests or test, whether of research, lectures, practical steps, etc. You may assume that occur
simultaneously any of these alternatives, and assume that these notes correspond to assessments
"by criteria", and nothing of it can annul the fact that these are normal distributions.
In the example we have given, we refer to the GPA in the course of which we have assumed two
premises: that the course is large (40 students) and the (40) notes are all independent. Those are
the conditions established in the Central Limit Theorem, which assure us that this average should
be a normal random variable, whatever method of production of each of the notes in particular and
whatever the destination or interpretation these notes have.
Here is another example. Consider random variable as the (only) notes that one student can obtain
in a given test. Suppose that this is a multiple choice test in which students score in their favor sum
for each correct answer and score for each error decreases. Again consider two assumptions: that
the number of test questions is quite large (eg, 60) and that the success or failure of any of the
questions does not affect the success or failure of any other. From these premises, which are the
Central Limit Theorem, we conclude that the final score accumulated by the student in the test (and,
consequently, their final grade) should be a normal random variable. And again we can see that it is
irrelevant whether that note corresponds or not to predetermined criteria achievement and it is
irrelevant that the note is used to select people, or to delete them, or for any other purpose.

Ilustraccion N2

Illustration N3
4. MODEL NORMAL AND EVALUATION CRITERIA
According to the examples that we have developed, it should be clear to the reader that the fact
that evaluate "by criteria", ie, clearly setting standards of preset performance to assign the
corresponding notes, is by no means a reason to rule out in advance the possibility of such notes,
or averages, may have a normal distribution. Furthermore, if given exactly the conditions
established by the Central Limit Theorem, such variables will inevitably be normal.
Nevertheless, we must also be clear that the only Central Limit Theorem provides us with an
approximate model to explain the behavior of the variables that interest us (and this is true whether
we evaluated "by criteria" as if we do another mode). It is therefore perfectly possible to conceive
other models, different from normal, which also offer the possibility to give good descriptions (also
approximate, as with any theoretical model) the behavior of our variables. They may alternatively
be used, especially in cases where the aforementioned conditions of the Central Limit Theorem not
be or appear dubious. In this connection, it has been proposed (Fernndez 1987) a probabilistic
model, called Model Edumtrico, specially designed to be applied to distributions of notes in the
context of assessments "by criteria". For example, in Figure No. 3 you can see a distribution of notes
edumtrica whose expected value is 5.5 and whose standard deviation is 0.90 (on a scale of 1-7).
Online softer normal distribution points with equal expected value and standard deviation equal
creeps. Note that a part of the right tail of the normal curve is outside the box, that is, it is outside
the range of the notes, which is certainly not desirable, since the model assumes the existence of
notes where neither can be .
In short, we say that when the distribution of notes is evaluated "by criteria" can be conveniently
explained by a model to be normal or not, depending on this you know it or not, the conditions of
the Central Limit Theorem. If these conditions are not met, the appropriate model could be some
other different from the normal, eg edumtrico model.
5. MODEL FOR NORMAL AND EVALUATION STANDARD

As we know, the essential feature of the evaluation is to relativize the standard grading scale,
suitably adjusting it according to the results obtained by a (relatively large) group of students
subjected to the same test. In this context not so interested students evaluated demonstrate
mastery of certain predetermined goals, but interested in making a selection of students in
connection with the group evaluated, proving to be the best.
Obviously, the simplest for the above purpose is to sort all the scores, from highest to lowest or vice
versa, and make a cut in the place that suits the purposes of the selection. However, if we want to
present the results as notes (for example, on a scale of 7) and so that only students at or above the
average performance of the group have a grade equal to or greater than the pass mark (for eg 4.0),
then the problem is more complicated.
Consider an example. To fill 8 vacancies in certain job, they have given a written proficiency test 52
applicants. Suppose that the test is designed to measure the percentage of achievement of certain
predetermined objectives (criteria) and the results are expressed in a score ranging from 0 to 100
points. The following are the results, sorted, obtained by applicants:

94, 88, 82, 79, 74, 71, 65, 64, 62, 61, 58, 57, 55, 54, 50, 44, 43, 41, 40, 38, 36, 34, 34, 34, 31, 30, 29,
29, 28, 28, 25, 24, 24, 23, 23, 20, 18, 18, 15, 15, 14, 14, 13, 13, 12, 11, 10, 10, 7, 7, 5, 2
It is clear that successful applicants should be those whose scores were: 94, 88, 82, 79, 74, 71, 65
and 64. However, what notes are for them?
The usual answer to the question above mechanism begins by calculating the mean and standard
deviation of all scores. In this case these statisticians are 35.7 and 23.2, respectively. They allow
standardize the scores, that is, subtract the average of each score and then dividing the resulting
difference by the standard deviation. In this case, each note will be subtracted and the difference
should 35.7 divided by 23.2. In the example, the standardized scores are:
2.51, 2.25, 2.00, 1.87, 1.65, 1.52, 1.26, 1.22, 1.13, 1.09, 0.96, 0.92, 0.83, 0.79, 0.62, 0.36, 0.32, 0.23,
0.19, 0.10, 0.01, -0.07, -0.07, -0.07, -0.20, -0.25, -0.29, -0.29, -0.33, -0.33, -0.46, -0.50, -0.50, -0.55, 0.55, -0.68, -0.76, -0.76, -0.89, -0.89, -0.94, -0.94, -0.98, -0.98, -1.02, -1.06, -1.11, -1.15, -1.24, -1.24,
-1.32, -1.45
Once standardized scores, they are converted into final notes adapting to the scale you want to use.
In our case, this is achieved simply by adding 4 points each standardized score. That done, the final
notes turn out to be:

6.51 6.25 6.00 5.87 5.65 5.52 5.26 5.22 5.13 5.09 4.96, 4.92 4.83 4.79 4.62 4.36 4.31 4.23
4.19 4.10 4.01 3.93 3.93 3.93 3.80 3.75 3.71 3.71 3.67 3.67 3.54 3.50 3.50 3.45 3.45 3.32
3.24 3.24 3.11 3.11 3.06 3.06 3.02 3.02 2.98 2.94 2.89 2.85 2.76 2.76 2.68 2.55
The average of these notes is 4.0 and the variance is 1.

It is a theorem of statistical theory that if the distribution of raw scores is normal, then the
distribution of standardized scores should be a standard normal. We have said it's called the
standard model whose expected value is 0 and whose variance is 1. The form of this model is that
of the Enlightenment No. 1, is meant that almost all data (approximately 99.74% of them) must be
between the values -3, on the left, and +3, to the right.
However, if the original non-normal distribution then standardized notes and therefore the
distribution of the final scores, need not be normal.
Illustration No. 4, which graphically represents the above table, expressed better what is said in
relation to it. Attention should be paid to the numbers put on the bottom, representing the three
major brands of class.

Table 3
1
2
3
4
5
6
7
8
9

2.50 - 2.99
3.00 - 3.49
3.50 - 3.99
4.00 - 4.49
4.50 - 4.99
5.00 - 5.49
5.50 - 5.99
6.00 - 6.49
6.50 - 7.00

8
13
10
6
5
4
3
2
1

In the table we grouped adjacent endnotes example into nine classes of equal size, in order to
observe the approximate shape of the target distribution of these notes.
At first glance one can see that this distribution is not symmetrical, as it should be if it were a normal
distribution.
If this distribution was normal, classes 3 and 4, which are adjacent to the average value, should be
of equal size. Also classes 2 and 5, 1 and 6. It is seen that the high notes are more spread.

Illustration No. 4

Compare with the previous profile corresponding to a normal distribution (in dotted line) and a
edumtrica distribution (in thick line):
The normal distribution has drawn the same average (4) and the same variance (1) the set of
endnotes. Meanwhile, the edumtrica distribution which is characterized by a single parameter,
called t (time), in this case is taken to the value t = 2.2. The parametric values listed in both the
normal edumtrico model as the model are those that best fit the data of the example.
At first glance, neither model appears to represent well the dataset. However, the goodness of fit
test Kolmogorov-Smirnov (Canavos 1988) allows both models accept as valid, with a significance
level
a = 0.20. With respect to the original data set (ie the scores before being standardized), the normal
and edumtrico respective models are also acceptable.
In short, we wish to affirm once again that not enough standardized scores, or used to discriminate
the best among a group of students subjected to the same test, so that the standard model is the
best tool available automatically. It does not depend on standardization which endnotes resulting
adjusted or not a normal model. Even more, it is possible that other models are as good as normal,
or even better, even when being evaluated as a rule, as in the example.

Illustration # 5

6. CONCLUSION
In this paper we have tried to establish two ideas:
i) There are two forms of assessment, which serve different purposes, called "by criteria" and
"standard" respectively. Both forms of assessment are current and both are valid in their respective
context.
ii) Several probabilistic models that are applicable to the assessment of learning; among which the
standard model account. The correct use of the standard model is not subject to the evaluation form
to which you want to apply it; its applicability depends on other, independent of the type of
assessment that is used premises.

Av. J.P. Alessandri No. 1701-J


Providencia, Santiago, Chile

7. Bibliography
SMOKED, P. 1983. Principles and procedures of educational evaluation. Editions University of Valparaiso,
Chile.
CANAVOS, G.C. 1988. Probability and statistics. Applications and methods. Mc Graw-Hill, Mexico.

V. FERNANDEZ, A. 1987. "Measuring and time management in a context of mastery learning" Educational
Studies 13: 15-27. Universidad Austral, Valdivia, Chile.
Kendall & Buckland. 1980. Dictionary of statistics. Publishing Pyramid, Madrid.
MEZA, and Pascual Olivares. 1986. Educational Assessment. Manual for educators. Chile Educational Service
Institute.

Você também pode gostar