Você está na página 1de 11

A Review of Educational Data Mining Techniques for

Classification of Students Academics Performance


Muhammad Awais
Reg # 1873134
SZABIST H-8, Islamabad

1. Introduction classifying in different categories using


In the higher Education institutions and decision making algorithms like as Naïve
Universities data stored is so large that is Bayesian, Neural Networks, Decision Tree,
difficult and impossible to fetching data Random Forest Algorithm, SMO, Fuzzy
from large database and other resources. Logic, Genetic Algorithms, K-nearest
Data can be stored in the format of files, neighbor J48, Weka an open source tool
text, images, sound and videos. Data and many others.
Mining is an advance technique to use to This study of research explores the ability
Knowledge Discovery in Databases for the to predicting and classification using the
classification of students on the learning Data Mining Techniques and Knowledge
behavior. Recently Data Mining increases Discovery in Databases (KDD) and
research interest in the field of education decision tree algorithms. Data Mining is
[4, 26] is used to predict the classification powerful tool give the information that
of the student’s behavior on their can be useful for decision making
academic performance; this field is known process. Data mining techniques
as Educational Data mining (EDM). prediction and modeling can be impact to
Educational Data Mining is used to explore better academic performance and help to
the data mining for the usefulness in the teacher to increase the potential of
learning systems [18]. Data Mining is teaching in the future for those students
made the researchers capable to more who are poor academic performance.
efficient discovery, these discoveries are
useful for decision making. Different Data
Mining decision algorithms are used to Literature Review:
classification of students on the academic EDM is a new area of education which
performance. The Educational Data can apply in the different fields like
mining is the procedure that transform sports, accounts etc. The researcher in
the raw sored in educational database in the EDM fixed the journal of Educational
the useful information’s which has an Data Mining (2009) and international
important impact in the educational conference enable thee users to scan
research and to help improve the learning data from different situation classify the
system of the students. outline the association that recognize
However, analysis of students’ academic between the mining process.
performance is complex problem to
Parneet Kaur and Manpreet Singh [1] in K Prasada Rao and MVP Chandra Sekhara
2015 suggest the learning behavior [2] proposed a simple predictive model
classification technique to categorize the for classification of students using the
students using WEKA tool. To perform this previously knowledge of higher education
task the data collected from high school data and Data Mining technique Knowledge
base and then apply the manual techniques. Discovery in database. The data set use of
The data was gathered from high school and 200 that are rural-rest computer Science
varying values are assured of the domain, and Engineering students [2]. This used
procedure files processed in the required for predicting to make better
format assured values that are most performance of the poor academic
meaningful and apply the classifying decision students. The proposed research process
algorithm to get the desired output. The divided on tow section which first section
proposed method applied on the 152 deals the predicting the student’s
students record of the high school [1]. performance and second section examine
Multiple predictor used five classifiers like as the accuracy and modeling constructing
Naïve Bayes, SMO, J48, REP Tree but the on the base of using three algorithms J48,
Multilayer perception (MLP) classifier Naïve Bayes. It examines the comparison
produced the accuracy 75% in the proposed of the data as previously and compared
method. The classifier’s algorithms comprise the different algorithm like as J48
using the WEKA experimenter which show Algorithm, Naïve Bayes and Random
the result with 82% accuracy. The Forest using WEKA 3.7.5 software. The
experiment performed on the 152 records, data collected in the MS Excel file format
but the variables values not changed. The and convert it in required format. The
experiments should be performed increasing data set processing converting the all
or decreasing students record that may allocated values and features are
produce other classifiers algorithms accurate eliminate and all values that are not
result [1]. The proposed system should be founded in database be handled. The
applying in the higher educational students proposed perspective is developed on
record. Some questions are as which answer three learning categories of the students
the proposed technique: Excellent, Good, Average and slow
 What is the technique used to predict the learners. The proposed methodology
student’s classification on their academic experiment on the 200 students
performance? Computer Science and Engineering to
 How can be data academic collected for predict classification using the tree
decision making process? algorithms and comparing them. K
 What can extent it makes the possible to Prasada proposed that random forest
teacher poor learning student behavior algorithm [2] is best with accurate result
should be improved? for classification of students learning
 How can extract the students grouping behavior. The weakness is that it’s not do
control the analyzing student’s academic experiment by decreasing the sample of
performance in the starting phase? the students and other best algorithm
which produce the previously accuracy.
The experiment is applied for CS and Ahmed Mueen and Bassam Zafar et.
Engineering students. The same al (2016) proposed a simple modeling
procedure can be used for different and predicting the academic
perspectives like as Accounts, performance of the students using
transportation is not mentioned. The different data mining techniques like
proposed predict proof that the Random as Naive Bayes, Neural Networks and
Forest Algorithm is produce most efficient Decision Tree algorithms on
and accurate classification result of the undergraduate students to classify
selected sample. If the data sampling the students’ academic performance
increasing the Random Forest Algorithm [3]. The Naïve Bayes algorithm
produce better accuracy as compared to shows the most efficient result with
other two algorithms [2]. In the future 86% accuracy [3]. The EDM is used
the it increases the impact on students to find the educational data mining
learning performance using social media to enable the uses of the learning
and internet access. Some questions are process [18] examine learner
which this proposed classifier provide academic behavior [9] construct a
answers: warning method [17]. Prediction is
 What is the technique to predict the classified in the three types;
classifying students on their learning Classification Regression and Density
behavior in the higher education? Estimation. The famous algorithm
 What type of collected data can be Naïve Bayes, Decision Tree and
used making decisions to predict the Neural Networks and supporting
classification? vector machine are continuous
 How can improve the learning changeable. [23]. In the proposed
behavior of students by decision method dt collected of two
making tree? undergraduate courses and examine
 How can be produce accurate output the practical on the sampling data;
form sampling data comparing the the Naïve Bayes algorithm produce
proposed algorithms? required output with 86% accuracy.
 How can be increase the Two parts are used training and
performance of the students using testing parts for predict the
internet and Social media? classification. In the first part data is
 How its help to the teachers to made collected which has information all of
good learning behavior of the rural- the classes and second Decision tree
based computer science and used internal node as decision node
engineering students? and the end nods known as leaf
 What its only increase the rural which are showing specified class.
students or can be made good The best knowing trees constructing
academic performance of the urban algorithms are ID3, C 4.5, CART [3,7].
students? The equation is used find the value
ID3 using vale of node set S in subset
Si
Gain(S,A)=E(S)-I(S,A)=E(S)∑|si|/|s| randomly tested on 38 subroutines.
(1) Comparison of the different uses
The algorithm C4.5 gain the ratio algorithms is shown in the table in
Gainratio(S,A)=Gain(S,A)/splitIfo (S,A) (2) which Naïve Byes shows the accuracy
CART algorithm uses the Gini Index 86%.
and calculated by equation (3) Classifi Accurac Precisio Reca Specif
Gini (s)=1-∑ip2 i er y n ll y
Naïve Bayes classifiers algorithm is NB 86.0% 88.4 85.5 86.3
used to Bayes Theorem to find the MLP 82.7% 82.5 86.3 79.1
probability. (4) C4.5 79.2% 81.4 78.0 80.2
P(c|x)=P(xIc)P(c)/P(x) (4)
In this proposed system the three-
In the proposed technique four
classifier compared, and the Naïve
stages are performed to classifying
Bayes Classifier show the result with
the students; Data Gathering, Data
more accurate [3]. This research to
Transformation, Classification
motivate the teachers to pre-
according to required Data, and then
understand the poor learning
interpret the result to predict the
behavior students and encourages
student’s classification on academic
them to and help them by provide
performance.
good material of study. There are
output Layer
many different factors that affect the
students learning behaviors that are
Hidden layer
family, teachers, and personals which
make to slow learner. In the future
work this proposed method should
input layer
be apply on the large data sample. To
Artificial Neural Network
increase the learning capability of
The data transformation in the slow Lerner should be an effective
required format WEKA is give the method apply or to overcome the
different features to select the factors. Question to that proposed
efficient algorithms. To reduce classifiers are developed;
misbalancing of the data used the  What are the data mining
SMOTE algorithm. Then data techniques used to
converted in required format used knowledge discovery in
the Weka’s ARFF format. Th Decision database to predict
tree is used to recursive classification of slow learners?
differentiating findings into branches  What are the factors involve
to reach the best accurate output the that effect the learner to
algorithms [20]. In this proposed make slow?
method the 10-fold cross validation is  How can an instructor make
performed for all algorithms. The 10 good learner to a student?
data subsets having equal density
 What are the Decision maker data gathered from the
Algorithms that predict the different five universities in
accurately to the slow the Ethiopia; named are
learner? Bahirdar University, Wollo
 How can data be applied and University, Gondar University,
transform? D/Marks University and Debre
 What are the important Berhan University [4].
features that makes the Gathered data has different
learning system of the attribute related to the
Instituting? students. 2007 and 2012.
Alemu K. Tegegne proposed Using Python 3.0 the
an educational data mining gathered date processed and
model (2018) to predict the transformed.
students’ academic The processed data format
performance that selected for converts in the required
Ethiopian higher educational format with WEKA 5.0
institutions. [4] decision tree classification
The research in the education model [4]. This study is
field of advance data mining divided in two parts; 1st is
made the easy to mine the training and testing is 2nd. The
data and help the teachers’ produced data is categorized
students and the institutions in the 70% for the training
decision make. The students model and 30% is use the
exploring the academic testing to measure the
performance is a more accuracy of the predicted
difficult approach to produce model. The values are used to
the factors that are causes to find information to predict
the failure students. Data the model of academic’s
mining and knowledge behavior is as follows:
Discovery are made the more PSGPA: preparatory School
decision maker efficient with grade Point average result is
accurate result. In the higher three types High, medium
learning institutions, the data and low similar Ethiopian
has increased that is more University Entrance
significant factor. To collected Examination of 700 marks
and analyze the large data divided in three categories
and stored on database is High, Medium and Low; the
difficult task. Data mining other two first year first
techniques are used to semester and second
transform the data in to semesters academic
meaningful required achievement are categories
information. In this paper the the students’ academic
performance to promote at the students’ academic
2.00 and above. Pruning performance using the
technique is used to delete education problems?
the leaves of the node which  What is the different finding
are the less marks than the that are gathered for decision
required of object. The C 4.5 making to classify the
(J48) decision tree is used to students’ academic
predict the classification of performance?
the slow learning behavior of  How can explore the factors
students. The decision tree of prediction model to classify
shows the result with 81.4% the students’ academic
accurately [4]. The weakness behavior in the binging?
of this paper is that it’s not  How the Data mining and
compared the different knowledge discovery used to
algorithm to find the accurate enhance the decision-making
vale. This study answers the procedure?
following questions:
 What is purpose of data
mining techniques to classify
Akarhita Tripathi (2019) tree algorithm [6] support
proposed the data Ming vector machines, artificial
techniques to the increase the neural networks are used to
learning performance. classify the students’ academic
Educational main aim is to data. Educational Data mining
deliver the quality of is used to gather the data from
education. The Educational different resources and
data mining techniques help to surveys. This made the Apache
increase the quality and Prediction IO Machine
academic performance of the Learning Server to predict the
students. Data should be academic performance. Jie Xu
stored on the different sources et. al (2016) proposed the ML
like as database, Cd drives or technique which predict the
other resources. To accessing students’ academic
the data from different performance which has two
resources is much difficult in features [8]. Predicting the
the large amount of size. programing performance of
Educational data mining is the academic students five
used to gain the data from algorithms applied and the
data bases using the experiments result shown
knowledge discovery about over accurately 93%
techniques. Different [15]. Using different literature
Techniques like as decision reviews find the data mining
techniques are useful for
knowledge discovery and
decision making for the
student’s classification on the
base of academic behavior or
performance. Perdition is a
technique which used to
predict the future problems

solution from the current


information. The prediction
can be predicting the
classification techniques using
different decision maker
algorithms. [5] Research
Questions are as follows:
 What techniques are used to
perform predicting the
classification of the students?
 How can we get stored data
from different resources?
 Which techniques are used to
enhance the educational
environment and learning
behavior of
Students’ on the base
academic performance?
Method Domain Conclusion
Research

Use five algorithms MLP, Classification is performed on 152


Naïve Bayes, SMO, J48 and students using classification and
REP Tree for testing on data prediction techniques
Parneet Kaur-2015 values. The MLP give 75% the more
WEKA workbench is used to Education accurate to predict the
comparison the accuracy to classification and F-measure 82%
rank the algorithms

Used the Decision tree classifying the 200 Students in


model to predict different categories Excellent,
classifications like as Good, Average and poor. Shows
Random Forest, J48 the comparison of algorithms,
Algorithm, Naive Bayes Random Forest Algorithm is
K. Prasada Rao, 2016 Classifier Education showing more accuracy on
Use WEKA 3.7.5 to rank increasing the size of datasets, it
algorithms compared each takes much time to make model
other, the Random Forest comparing with other algorithms
Algorithm shows more
accurate
Data Mining techniques to To this research the data is
classification and collected of two semesters of
algorithms are used undergraduate students.
Decision Tree, Education classified three algorithms 38
Ahmed Mueen, 2016 Naive Bayes And Neural attributes of 10 same size data
Network sets are compared. Naïve Bayes
Weka 3.6 is used to show more accurate result with
compare the algorithms, 86%.

ID3 and C4.5 Decision Tress On the collected data the data
used, Education mining techniques applied and get
Alemu K. Tegegne To analyze the data Python the result with 81.4% to predict
2018 3.0 and classification model the classification accuracy.
WEKA 5.0 used
Used Knowledge Discovery The data mining techniques are
and Data Mining more reliable to predict the
Akarishta Tripathi techniques, Education classification of the students’
2019 Decision Tree Algorithms academic performance. Using
and Support Vectors Decision Trees, Clustering
Analyzing, Naïve theorems and
WEKA tools can be classified the
students in the different
institutions
Summary:
There are different techniques of Data Min to knowledge discovery in database to classify the
student’s behavior and their academic performance using different algorithms and Decision-
making trees. Every institute wants to Enhance the educational behavior using the different
classification techniques. Students who are poor learning behavior; instructor can be potentially
increase their academic performance. Many factors that are effect to slow learner behavior can
be reduced. Different algorithms can be used for classification which shows different result with
accuracy on different datasets size. In these research paper the main objective is to improve the
learning behavior of students and categories them into different categories that’s help to
instructors, students and institutes to their improvements.
In the future using social media and internet access the academic performance can be
improved. These data mining techniques can also apply in the other fields like as business,
sports etc.

References:
[1] Parneet Kaur, Manpreet Singh, Gurpreet Singh Josan “Classification and Prediction based
data
mining algorithms to predict slow learners in education sector”3 rd ICRTC-2015

[2] K Prasada Rao, M.V.P. Chandra Sekhara Rao, Phd, B. Ramesh “Predicting Learning Behavior of
Students using Classification Techniques” International Journal of Computer Science, Volume
139- No.7, April 2016
[3] Ahmed Mueen, Bassam Zafar, Umar Manzoor, “Modeling and Predicting Students’ Academic
Performance Using Data Mining Techniques” Modern Education and Computer Science
2016

[4] Alemu K. Tegegne, Tamir A. Alemu, “Educational Data Mining for Students’ Academic
Performance Analysis in Selected Ethiopian Universities” Journal of I& KM 2018, Vol. 9

[5] Akarshita Tripathi, Mr. Amit Kumar “Analysis of Education Data Mining Techniques”
International Journal Computer Science and Mobile Computing, Vol. 8, January 2019
[6]

Você também pode gostar