Você está na página 1de 6

2014 13th International Conference on Machine Learning and Applications

Improving an Early Warning System to


Prediction of Student Examination
Achievement
Hanife GÖKER Halil Ibrahim BÜLBÜL
The Institute of Informatic Department of Computer and
Instructional Technologies

Gazi University Gazi University


Ankara, TURKEY Ankara, TURKEY
hanifegoker@gmail.com bhalil@gazi.edu.tr

Abstract— In Turkey, there are many exams for point of view, it has an important role in a
transition to a higher education institution. These community’s attainment of such socially
exams are all stages of life and have a great importance determined aims as high quality workforce, equal
in the lives of students.As the severity rating of the opportunities, social unity and development of civil
exam, particularly students, parents and teachers are
awareness [2].
affected and exams create anxiety for
students.Examination results are very important in
shaping the lives of future students. Therefore, in this
Students are prepared to take the exam from a
study is aimed to estimate the success of students, which young age, and therefore competition is growing.In
isstudents’ turning point in their lives, in the university this exam;there are many factors that affect success
entrance exam. like knowledge of students' grades, family,
classrooms and tutoring information, test anxiety
The aim of this study, using data mining algorithms on level, age and sex. For students to be successful in
the created student data warehouse, is to estimate the this exam, analysis of these factors that affect
students’ successes, who are taking the university success of the examination is
entrance exam, by data mining. In this study, it has
required.Therefore,factors affecting the success of
been improved a software considering Naive Bayes
algorithms for student data warehouse. By that students in the university entrance examination
developed software by using C# languages, it is aimed to should be identified and secondary school students'
improve an early warning system that may estimate the university entrance examination should be
states of the students’ successes in university entrance estimated they would win or lose.Identification of
exam for students and also for their families. these factors and estimating the exam successwill
play an important role in the increasing success.
Keywords-data mining, data warehouse, naive
bayes, classification, estimation of students’ successes, All students in primary and secondary schools in
educational data mining(EDM). Turkey, students’ transactions are made via e-
school system.Students' records are held regularly
in the e-school information system.Students of all
I. INTRODUCTION: data stored on digital media, with a majority of
Many countries in the education system, higher these data has led to ineffectiveness of traditional
education by getting enough points for a specific query tools.
university entrance exam is required to be
successful.In our country, every year more than half As a result, new tendencies emerged such as getting
a million students in the university entrance exam out valuable knowledge from database by using
to enter, and this number is increasing every year. data mining techniques on students database.
Implementation of data warehouse, data mining
Between high school and university institutions techniques on the student, the student data
have too much supply-demand imbalances. This warehouse hidden in obtaining valuable
imbalance, many sections of society who are information and contributes to improving the
government officials, school administrators, quality of education [3].Therefore, for the solution
teachers, students and students’ families are greatly of different problems, many data mining methods
worried [1]. and algorithms have been developed.

Since higher education ensures high income and In recent years, there has been increasing interest in
better job opportunities, it is directly related to the use of data mining to investigate scientific
individuals’ success in business. From the social questions within educational research, an area of

978-1-4799-7415-3/14 $31.00 © 2014 IEEE 567


568
DOI 10.1109/ICMLA.2014.114
inquiry termed educational data mining (EDM). information.These data were formed by
EDM is defined as the area of scientific inquiry investigating factors that affect academic
centered on the development of methods for achievement such as socio-economic level of the
making discoveries within the unique kinds of data students, education and occupation of their parents,
that come from educational setting and using those individual characteristics, classroom knowledge,
methods to better understand students and the students’ grades and absenteeism knowledge, book
settings which they learn in. For example, in mining reading habits, test anxiety [11-14].
data about how studentschoose to use educational
software, it may be worthwhileto simultaneously
B. Creating the Datawarehouse
consider data at the keystroke level,answer level,
session level, student level, classroom level,and The software; consists of four forms that are to
school level. Issues of time, sequence, and context create the data warehouse, to display data, data
alsoplay important roles in the study of educational mining application and main form.
data [4].
When first run the software; comes the main formin
In the literature on student data warehouse front of the user in Figure 1.
according to data mining applications a variety of
applications were made such as estimation of
students that are inclined to use substance [5]
predicting the student course performance [6],
determination of vocational fields with machine
learning algorithm [7], genetic algorithms in data
mining[8],high-level student modeling with Figure 1: The main form
machinelearning [9] and investigation of the
reasons for students’ academic failure [10]. In the software, the first "Data Warehousing"
option by clicking the students of high school
In this study, at first, data warehouse was formed in students taking information data warehouse is
which there is information about students; then, created. This form allows to create easily a data
cumulatively all the factors were found to affect the warehouse and in a way to make data entry more
degree of success. After early warning systemthat convenient for users. This form consists of four
may estimate the states of the students’ successes in interconnected parts, including Student General
university entrance exam for students and also for Information, Family Information, Personal
their families is developed. Information, Students’ Grades and Absenteeism
Knowledge, Figure 2 is shown in.

II. IMPROVED EARLY WARNING


SYSTEM:
The software is developed using Microsoft Visual
Studio 2008 C # .Net and it has been improved a
software considering Naive Bayes algorithms for
student data warehouse.For database design, data
cleaning, solving the problem of missing values and
data transform procedures, SQL Server
Management Studio Express interface program was
used.

A. Process of Obtaining Data


Figure 2: Creating Data Warehouse Form
In the application process, a data warehouse was
formed in which there is information about Student data warehouse consists of 220 recordshas
students. While forming this data warehouse been created in this form.
information obtained from student identification
registries, primary school student file, the After creating a data warehouseFigure 3 in the form
documentation of guidance unit and the student’s of data display dataare listed in the data, the records
grades was combined. of each feature can be displayed graphically.
In this study, formed student data warehouse
consists of 220 records, 20features and 1 class

569
568
As D9, N9 and N10features contained numeric
value; missing value was filled up with the average
of features in other examples.

Data Integration

In data integration phrase, combination of data


coming from databases, various information
sources such as student identification registries,
primary school student file, the documentation of
guidance unit and the student’s gradesand
absenteeism information, book reading habits, test
anxiety,socio-economic level of the students,
education, occupation of their parents, individual
characteristics and classroom knowledge.
Figure 3:Displaying Data Warehouse Form
Data Transform

On student data warehouse, evaluation process was After data cleaning and data integration step, the
performed by preprocessing data. data transform process is performed for
implementation of Naive Bayes algorithm. Naive
C. Data Preprocessing Bayes algorithm is applied rather than numeric data
are categorized on the data.Therefore, numerical
records have been categorized with SQL
After the creation of student data warehouse, the
commands.Some of the data transform process is
necessary corrections have been made such as
given below:
solving the problem of missing values, correcting
noisy data and data transform procedures.In data
- Update ogrveriambar set n9='n1' where
preprocessing step, "Structured Query Language"
n9<=100 and n9 >=95
(SQL) commands were used.
In this study, process made in data preprocessing
- Update ogrveriambar set n9='n2' where n9<95
step consists of four phases:
and n9 >=90
Data Cleaning
- Update ogrveriambar set kardes='K2' where
kardes<4 and kardes >=2
Within the data obtained from students in the
process of forming the data warehouse, there are
- Update ogrveriambar set kardes='K1' where
missing values in the features ofstudents’ grades
kardes<2 and kardes>=0
and absenteeism informationbecause of the filling
incomplete of students who come from others s
- Update ogrveriambar set ilkogr='I5' where
th
There isn’t 9th and 10 students’ grades and ilkogr<=5 and ilkogr >=4.5
absenteeism informationof 2 students coming with
the transferring procedure. Data correction - Update ogrveriambar set ilkogr='I4' where
procedure was made to the records of which D9, N9 ilkogr<4.5 and ilkogr >=4
and N10features were empty by taking the average
of these fields as showed in the following - Update ogrveriambar set agirbaspuan='A1'
command: where agirbaspuan<=100 and agirbaspuan>=95

- update ogrveriambar set n9=(Select AVG(n9) - Update ogrveriambar set agirbaspuan ='A2'
From ogrveriambar where n9 is not null) where where agirbaspuan<95 and agirbaspuan>=90
n9 is null ;
Data Reduction
- update ogrveriambar set n10=(Select AVG(n10)
From ogrveriambar where n10 is not null) where Data reduction process was made to determine
n10 is null ; the ones which affect the result so much, to reduce
the process complexity and to be able to make
- update ogrveriambar set D9 = ( Select AVG ( accurate generalization.In each grade level D9,
D9 ) From ogrveriambar where D9 is not null ) D10, D11 and D12absenteeism information
where D9 is null ; reflectsthe overall students’
attendance.Therefore,"Devam" featurehas been

570
569
created to make the process lessby taking the feature found in the test data is divided by the
average ofthese grades absent. number of all negative samples.

- update ogrveriambar set In the third stage, probability of each class; is


Devam=((D9+D10+D11+D12) / 4) where Devam calculated by the number of frequencies selected
is null ; from the ComboBox objectsfeatures. When
calculating this probability,the probability of each
features are multiplied.
D. Interpretation / Evaluation
The estimation of students' success in exam is
In this study, the data in the data warehouse predicted depending on the successful and
students to apply Naive Bayes algorithm, is divided unsuccessful examples in the past. If the
into two groups, including training and test specifications of the new samples are completely
data.Training data set and test data set was created different from the past experiences, then the result
using the holdout method. of the prediction may come out to be 0. In other
words, if any feature is 0; the result will be 0.One
The holdout method, sometimes called test sample of the disadvantages of Naive Bayes method is that
estimation, partitions the data two mutually conditional probability is 0. To eliminate this
exclusive subsets called a training set and a test set, problem, the data which leads to 0 is determined
or holdout set. The holdout estimate is a random and the numerator and denominator of each
number that depends on the division into a training criterion are added to a very small value.
set and a test set [15].
In the last stage,the probability that whether the test
The data is selected about 1/5 the test data and4/5 data set is successful or not according to the
the training data.Holdout method, a number is training data set is calculated.
allocated for a sample test data set, the remaining
samples as the training data set is used.175 of them Students’ data is entered to predict the success or
from the training data set of 220 records, 45 of failure in examsin Figure 4.
them were used as the test data set.In determining
the test data set, each data set in one of five records
test data is acceptable. Holdout method is used
because distribution of data in the data set is
balanced and number of samples of each class is
more.

Test and training data sets are formed to which


Naive Bayes algorithm is applied.The algorithm is
trained with the training data set and it is controlled
by the test data set.Test data are listed in the form,
the data is entered in the test data sets can be
controlled by the algorithm's success.

When making a prediction, firstly, the training data


is transferred to the DataRow object is pulled from
the database. Figure 4: Success estimates form

In the second stage,the “Successful” and As a result, if successful rate is greater than
“Unsuccessful” ones ofthe samples found in the unsuccessful then the prediction is reflected to the
selected vocational field are calculated in two individual as correct, ifsmaller then incorrect, in
variables, as p and q.The p (positive) probability Figure 5.
calculation for each feature is found by dividing the
number of samples among all of the samples of
training data which fall into the positive class and
have the specification that is the same as the value
of the feature found in the test data by the number
of all positive samples. While finding the q
(negative) probability calculation the number of
samples among all of the samples of training data
which fall into the negative class and have the Figure 5: Success results message (p rate and q rate,
specification that is the same as the value of the result: successful)

571
570
With the current study, improving an early warning
system that may estimate the states of the students’
successes in university entrance exam is developed. This study provides an opportunity to improve their
success withprediction of student examination
As a result of the implementation of the achievementin advance. Remedial and
software,students’exam results and the estimation developmental studies can be done for the students
of students’ successes in early warning system are whose examination success was predicted to be
given in Table 1. low. Therefore, this study is thought to play an
important rolein increasing students’ achievements.
Table 1: Estimation results for test data
IV. REFERENCES:
Total Number of Records 220
[1] Köse, M. R. , (1999) “University entrance and high
Training data 175
schools”, University of HacettepeJournal of
Education, 15: 51-60.
Test data 45
[2] Tomul, E., and Polat G. ,(2013)"The Effects of
Correctly Classified Instances 39 Socioeconomic Characteristics of Students on Their
Academic Achievement in Higher Education."
Incorrectly Classified Instances 6 American Journal of Educational Research, 1.10:
449-455.
Success Percentage (Accuracy) 86,66 %
[3] Göker, H. , Bülbül, H. I., & Irmak, E. (2013,
December). The Estimation of Students' Academic
Success by Data Mining Methods. In Machine
Analyzing Table 1, the test results are observed that Learning and Applications (ICMLA), 2013 12th
produces accurate results (86, 66 %), a very large International Conference on (Vol. 2, pp. 535-539).
proportion. IEEE.

[4] Baker, R, (2010). Data Mining for Education. To


appear in McGaw, B., Peterson, P., Baker, E. (Eds.)
III. CONCLUSIONS: International Encyclopedia of education (3rd
edition). Oxford, UK: Elsevier.
University entrance exam that affects students’
future status, successful in their professional livesis [5] Bulut, F., (2010) “Detecting students at risk of
the most important thing in their lives. The state of substance abuse by using data mining
classification”, M. Sc. Thesis, Institute of Natural
the students’ successes in university entrance exam and Applied Sciences University of Fatih, İstanbul.
is of great importance for students and also for their
families. So,in this study was estimated the
students’ successes, who are taking the university [6] P.L. Hsu, R. Lai, C.C. Chiuand C.I. Hsu , (2003)The
entrance exam, by data mining. The result obtained hybrid of association rule algorithms and genetic
algorithms for tree induction: an example of
under the study is listed below: predicting the student course performance, Expert
Systems with Applications, 51–62, 25.
- In this study, to estimate student achievement, a
student data warehouse have created including [7] Bülbül, H. İ., Ünsal, Ö., (2010) “Determination of
factors such as the grade of secondary school vocational fields with machine learning algorithm”,
students, family environment, classroom The Ninth International Conference on Machine
information, test anxiety level, in working order, Learning and Applications (ICMLA 2010),IEEE
age, gender and so on. ComputerSociety, Washington D:C:, 710-713.

- In table 2, on estimation of students’ exams [8] Gündoğdu, S., (2007) “Genetic algorithms in data
mining”, M. Sc. Thesis, Institute of Natural and
success by using Naive Bayes classification
Applied Sciences University of Kocaeli, Kocaeli,
algorithm, it was concluded that large portion of 99-101.
it (% 86, 66) could be explained by the features
in the student data warehouse.
[9] Beck, J. E. and WoolfB. P., (2000). “High-level
student modeling with machine learning,” in Proc.
- In this study, an early warning system that 5th Int. Conf. Intell. Tutoring Syst., Alagoas,
predicts the success of students in the university Brazil,pp. 584–593.
entrance exam has been developed for students
and their families. The success of the students in [10] Bırtıl, F.S.,(2011) “Analysis of girls vocational high
the university entrance exam status was school students’ academic failure causes with data
estimated using a Naive Bayes algorithm. mining techniques”, M. Sc. Thesis, Institute of

572
571
Natural and Applied Sciences University of Afyon,
Afyon, 70-71,2.

[11] Berberoğlu, G. ve Kalender, İ. , (2005)


“Investigation of student achievement across years,
school types and regions: the SSE and PISA
analyses”, Journal Of Educational Sciences and
Practice, 4 (7): 21-35.

[12] Pakır, F. , (2006) “The Effects on the success in the


university entrance exam of the students who
graduated from high school of which features on
family social- economic and demographic.”, M. Sc.
Thesis, Institute of Social Sciences University of
Yüzüncü Yıl, Van, 18-22.

[13] Karaman, İ., Dilber, R. and SÖNMEZ, E.., (2004)


“Investigation on relation between secondary
education achievement criteria and points of student
selection examination”, Journal Of Kazım Karabekir
Education Faculty 9, 263-269.

[14] Demirtaş, Z., (2010) “The relationship in high


schools between school culture and student
achievement”, Mustafa Kemal University Journal of
Social Sciences Institute, 7 (13), 208-223.

[15] Kohavi, R. (1995, August). A study of cross-


validation and bootstrap for accuracy estimation and
model selection. In IJCAI (Vol. 14, No. 2, pp. 1137-
1145).

573
572

Você também pode gostar