Escolar Documentos
Profissional Documentos
Cultura Documentos
Abstract— In Turkey, there are many exams for point of view, it has an important role in a
transition to a higher education institution. These community’s attainment of such socially
exams are all stages of life and have a great importance determined aims as high quality workforce, equal
in the lives of students.As the severity rating of the opportunities, social unity and development of civil
exam, particularly students, parents and teachers are
awareness [2].
affected and exams create anxiety for
students.Examination results are very important in
shaping the lives of future students. Therefore, in this
Students are prepared to take the exam from a
study is aimed to estimate the success of students, which young age, and therefore competition is growing.In
isstudents’ turning point in their lives, in the university this exam;there are many factors that affect success
entrance exam. like knowledge of students' grades, family,
classrooms and tutoring information, test anxiety
The aim of this study, using data mining algorithms on level, age and sex. For students to be successful in
the created student data warehouse, is to estimate the this exam, analysis of these factors that affect
students’ successes, who are taking the university success of the examination is
entrance exam, by data mining. In this study, it has
required.Therefore,factors affecting the success of
been improved a software considering Naive Bayes
algorithms for student data warehouse. By that students in the university entrance examination
developed software by using C# languages, it is aimed to should be identified and secondary school students'
improve an early warning system that may estimate the university entrance examination should be
states of the students’ successes in university entrance estimated they would win or lose.Identification of
exam for students and also for their families. these factors and estimating the exam successwill
play an important role in the increasing success.
Keywords-data mining, data warehouse, naive
bayes, classification, estimation of students’ successes, All students in primary and secondary schools in
educational data mining(EDM). Turkey, students’ transactions are made via e-
school system.Students' records are held regularly
in the e-school information system.Students of all
I. INTRODUCTION: data stored on digital media, with a majority of
Many countries in the education system, higher these data has led to ineffectiveness of traditional
education by getting enough points for a specific query tools.
university entrance exam is required to be
successful.In our country, every year more than half As a result, new tendencies emerged such as getting
a million students in the university entrance exam out valuable knowledge from database by using
to enter, and this number is increasing every year. data mining techniques on students database.
Implementation of data warehouse, data mining
Between high school and university institutions techniques on the student, the student data
have too much supply-demand imbalances. This warehouse hidden in obtaining valuable
imbalance, many sections of society who are information and contributes to improving the
government officials, school administrators, quality of education [3].Therefore, for the solution
teachers, students and students’ families are greatly of different problems, many data mining methods
worried [1]. and algorithms have been developed.
Since higher education ensures high income and In recent years, there has been increasing interest in
better job opportunities, it is directly related to the use of data mining to investigate scientific
individuals’ success in business. From the social questions within educational research, an area of
569
568
As D9, N9 and N10features contained numeric
value; missing value was filled up with the average
of features in other examples.
Data Integration
On student data warehouse, evaluation process was After data cleaning and data integration step, the
performed by preprocessing data. data transform process is performed for
implementation of Naive Bayes algorithm. Naive
C. Data Preprocessing Bayes algorithm is applied rather than numeric data
are categorized on the data.Therefore, numerical
records have been categorized with SQL
After the creation of student data warehouse, the
commands.Some of the data transform process is
necessary corrections have been made such as
given below:
solving the problem of missing values, correcting
noisy data and data transform procedures.In data
- Update ogrveriambar set n9='n1' where
preprocessing step, "Structured Query Language"
n9<=100 and n9 >=95
(SQL) commands were used.
In this study, process made in data preprocessing
- Update ogrveriambar set n9='n2' where n9<95
step consists of four phases:
and n9 >=90
Data Cleaning
- Update ogrveriambar set kardes='K2' where
kardes<4 and kardes >=2
Within the data obtained from students in the
process of forming the data warehouse, there are
- Update ogrveriambar set kardes='K1' where
missing values in the features ofstudents’ grades
kardes<2 and kardes>=0
and absenteeism informationbecause of the filling
incomplete of students who come from others s
- Update ogrveriambar set ilkogr='I5' where
th
There isn’t 9th and 10 students’ grades and ilkogr<=5 and ilkogr >=4.5
absenteeism informationof 2 students coming with
the transferring procedure. Data correction - Update ogrveriambar set ilkogr='I4' where
procedure was made to the records of which D9, N9 ilkogr<4.5 and ilkogr >=4
and N10features were empty by taking the average
of these fields as showed in the following - Update ogrveriambar set agirbaspuan='A1'
command: where agirbaspuan<=100 and agirbaspuan>=95
- update ogrveriambar set n9=(Select AVG(n9) - Update ogrveriambar set agirbaspuan ='A2'
From ogrveriambar where n9 is not null) where where agirbaspuan<95 and agirbaspuan>=90
n9 is null ;
Data Reduction
- update ogrveriambar set n10=(Select AVG(n10)
From ogrveriambar where n10 is not null) where Data reduction process was made to determine
n10 is null ; the ones which affect the result so much, to reduce
the process complexity and to be able to make
- update ogrveriambar set D9 = ( Select AVG ( accurate generalization.In each grade level D9,
D9 ) From ogrveriambar where D9 is not null ) D10, D11 and D12absenteeism information
where D9 is null ; reflectsthe overall students’
attendance.Therefore,"Devam" featurehas been
570
569
created to make the process lessby taking the feature found in the test data is divided by the
average ofthese grades absent. number of all negative samples.
In the second stage,the “Successful” and As a result, if successful rate is greater than
“Unsuccessful” ones ofthe samples found in the unsuccessful then the prediction is reflected to the
selected vocational field are calculated in two individual as correct, ifsmaller then incorrect, in
variables, as p and q.The p (positive) probability Figure 5.
calculation for each feature is found by dividing the
number of samples among all of the samples of
training data which fall into the positive class and
have the specification that is the same as the value
of the feature found in the test data by the number
of all positive samples. While finding the q
(negative) probability calculation the number of
samples among all of the samples of training data
which fall into the negative class and have the Figure 5: Success results message (p rate and q rate,
specification that is the same as the value of the result: successful)
571
570
With the current study, improving an early warning
system that may estimate the states of the students’
successes in university entrance exam is developed. This study provides an opportunity to improve their
success withprediction of student examination
As a result of the implementation of the achievementin advance. Remedial and
software,students’exam results and the estimation developmental studies can be done for the students
of students’ successes in early warning system are whose examination success was predicted to be
given in Table 1. low. Therefore, this study is thought to play an
important rolein increasing students’ achievements.
Table 1: Estimation results for test data
IV. REFERENCES:
Total Number of Records 220
[1] Köse, M. R. , (1999) “University entrance and high
Training data 175
schools”, University of HacettepeJournal of
Education, 15: 51-60.
Test data 45
[2] Tomul, E., and Polat G. ,(2013)"The Effects of
Correctly Classified Instances 39 Socioeconomic Characteristics of Students on Their
Academic Achievement in Higher Education."
Incorrectly Classified Instances 6 American Journal of Educational Research, 1.10:
449-455.
Success Percentage (Accuracy) 86,66 %
[3] Göker, H. , Bülbül, H. I., & Irmak, E. (2013,
December). The Estimation of Students' Academic
Success by Data Mining Methods. In Machine
Analyzing Table 1, the test results are observed that Learning and Applications (ICMLA), 2013 12th
produces accurate results (86, 66 %), a very large International Conference on (Vol. 2, pp. 535-539).
proportion. IEEE.
- In table 2, on estimation of students’ exams [8] Gündoğdu, S., (2007) “Genetic algorithms in data
mining”, M. Sc. Thesis, Institute of Natural and
success by using Naive Bayes classification
Applied Sciences University of Kocaeli, Kocaeli,
algorithm, it was concluded that large portion of 99-101.
it (% 86, 66) could be explained by the features
in the student data warehouse.
[9] Beck, J. E. and WoolfB. P., (2000). “High-level
student modeling with machine learning,” in Proc.
- In this study, an early warning system that 5th Int. Conf. Intell. Tutoring Syst., Alagoas,
predicts the success of students in the university Brazil,pp. 584–593.
entrance exam has been developed for students
and their families. The success of the students in [10] Bırtıl, F.S.,(2011) “Analysis of girls vocational high
the university entrance exam status was school students’ academic failure causes with data
estimated using a Naive Bayes algorithm. mining techniques”, M. Sc. Thesis, Institute of
572
571
Natural and Applied Sciences University of Afyon,
Afyon, 70-71,2.
573
572