Você está na página 1de 5

Analysis of data mining techniques

applied to LMS for personalized education


S. Luján-Mora2
1 2
W. Villegas-Ch University of Alicante
Universidad de Las Américas
1 Alicante, Spain
Quito, Ecuador sergio.lujan@ua.es
william.villegas@udla.edu.ec

Abstract - This article describes the models and the use of A. E-learning and LMS platforms
data mining techniques applied to Learning Management E-learning consists of education and training through the
Systems (LMS) which allow institutions to offer the student a web [1]. This online teaching model allows the interaction of
personalized education. It considers the ways in which the user with the material under consideration through the use
the concepts of educational data mining (EDM) are applied of various information and communication technology tools
to the information extracted from the LMS. The data (ICTs).
from these systems can be evaluated to convert the
information collected into useful information to provide an Such an approach is presented as one of the formative
education tailored to the needs of each student. This strategies that can be used to solve several educational
approach seeks to improve the effectiveness and problems [2]. These problems include the geographical
efficiency of education by recognizing patterns in isolation of the student, the access to information, and the
student performance. This article presents an analysis of need for constant improvement.
the data mining techniques that fit LMS, specifically in
terms of a case study applied to the e-learning platform B. Data mining
Moodle. The objective is to provide stakeholders with Data mining is a process that uses statistical techniques,
guidance on the use of EDM tools. mathematics, artificial intelligence and automatic learning to
extract and identify useful information from large databases
Keywords - Data mining, e-learning, Moodle. in order to generate knowledge [3].
Based on the concept of data mining, EDM techniques
I. INTRODUCTION can be used to extract knowledge from e-learning systems.
Learning Management Systems (LMS) store large The objective of this tool is to look for behavior patterns in
volumes of information that are not usually fully exploited by terms of system use by both teachers and students [4]. This
educational institutions. The use of such information serves helps to customize the virtual environment by identifying
for the continuous improvement of the content and structure useful knowledge, and using techniques such as prediction,
of the virtual courses contained in LMS. classification, clustering, fuzzy logic, etc.

Under this scheme, the pedagogical model consists of III. METHOD


web-based virtual courses supervised by a tutor. This work is For the development of this work an analysis of the
developed with the use of online resources, activities, forums different techniques and models of data mining applied to the
and various shared services. The data are gathered in the LMS education was realized by taking into account the stages of
and are based on the activities and didactic resources that are the knowledge discovery process with regard to databases.
offered to the students. The data obtained when evaluated The academic data of students on an undergraduate program
with the use of data mining leads to the possibility of of a university in Ecuador were selected, in order to follow
choosing the most appropriate resources and adapting it to the up their results during 2016. The aim was to determine their
characteristics and personal interests of each stakeholder in strengths and weaknesses in each activity proposed by a tutor
the educational context. within a virtual course generated in the Moodle platform.
In this article, we propose the analysis of the different Moodle is a modular platform which makes it easy to
models and tools that allow the exchange of experiences with manage the users and courses, as well as to add content. This
regard to how students learn: Section II presents several tool supports all records of the activities undertaken by both
concepts necessary to address the problem based on previous teachers and students in a database engine based on Mysql
work; Section III details the stage of pre-processing required [5].
for the analysis of the data using the main techniques of data
mining; Section IV performs an analysis of the results The data obtained were included in a SQL data repository.
obtained; finally section V presents the conclusions and To this data a processing and transformation phase was
makes recommendations for future investigations. applied in order to obtain clean data. The purpose of obtaining
clean data is to be able to apply a number of different data
II. PRELIMINARY CONCEPTS mining techniques, so that the results can be analyzed,
In this article, several key concepts that help the evaluated and interpreted.
management and application of different models and data In the analysis of the proposed EDM techniques, the
mining tools are taken into account. method that each uses to generate information that is useful

978-1-5090-4886-1/17/$31.00 ©2017 IEEE


for decision making is analyzed. Based on this information, statements that allow us to filter the unwanted data and then
preventive or corrective measures can be taken with regard to delete them. This process is automated in order to continue to
the different activities proposed to create a personalized feed the data repository, and by means of a stored procedure,
education environment. Personalized education seeks to the data processing is automatic.
strengthen students’ skills and help reduce their weaknesses
by providing resources and activities that align with their D. Prediction
needs. A predictive model predicts the value of an attribute or
label of a dataset in the light of other attributes referred to as
It must be noted that the application of data mining has descriptive attributes. From data whose label is known, a
several formalities that must be taken into account for the relationship is induced between the mentioned label and
representation of data; probabilities, rules, trees and a series another set of attributes. These relationships serve to perform
of statistical methods [6]. These aspects allow stakeholders to the prediction of data whose label or value is unknown [7].
save time on tasks such as finding one or a group of The use of this tool is key when measuring the behavior of a
individuals with similar characteristics. course with respect to the development of activities. For
A. Main approaches example, in the case of a proposed activity (questionnaire) for
the seventh week in the exposed case of Moodle; the
There are methods, models, and data mining tools that can descriptive traits are that the students who review the material
be applied to data generated on LMS platforms. These of the modules of at least five weeks, approve the
methods require that the data to be processed are in a questionnaire with more than 6/10. Therefore, with these
repository, and that the data is cleaned as mentioned above. characteristics will be possible to predict based on the
For this the following actions are considered: selection and individual income to the platform, the number of students that
data processing. exceed the proposed average. This tool allows the tutor of a
course to develop a method of control that distinguishes
B. Selection of data students who are likely to fail both an activity and an
This stage seeks to collect data from various sources, educational period program.
whether external or internal, to contribute to data mining. For E. Grouping
this work, the data generated in the Moodle platform of the
Data mining tools that use this method allow data to form
course of Ofimatica III was used as a source, taking into
groups that were not previously known. The elements of the
account the notes of all the activities that the students have to variables can be connected to each other according to
carry out during a 16 week period. unknown links. In this way, all variables are treated at the
The grades were downloaded from the management same level, and there is no causality hypothesis. According to
module offered by the Moodle platform. This file is generated the above-mentioned characteristics, the students could be
in an .xlsx format. The file consists of 1,820 records and 58 grouped for an analysis of the similarities and differences
attributes, table 1 shows a summary of the records. The data between them. If we apply this tool to the case mentioned in
was imported into SQL, creating a table entitled TB001OFI3 Moodle, we find that if we group students who solve tasks
that will serve as the basis for the discovery stage relating to but do not take part in virtual tutorials, they have less ability
in terms of the resolution of exercises compared to the group
patterns in the students' academic performance. (Table
of students that solve the tasks and take part in the tutorials.
TB001OFI3, only mentioned as the number of records does This tool is very useful whenever we want to identify the
not allow inclusion in this article) different levels of learning on the part of the students, and
take corrective measures in certain cases.
Table 1 Categories taken into account for table TB001OFI3
The use of models of this type is also very important as
CATEGORIES # Records they help to provide a personalized education to the different
# Course Participants 34 groups generated.
Record by tasks 585 F. Relationship mining
Records by forums 475 The objective is to find out how the largest number of
Partial evaluations 236 variables are related to a set of data. The way to do this is to
locate the variables that are most closely related to a single
Online Questionnaires 490 variable that is of interest, or by discovering the strongest
Total 1820 relationship between two variables. There are four types of
relationship mining: association rule mining, correlation
mining, sequence pattern mining, and causal data mining. In
C. Data processing association rule mining, the goal is to find rules 'if A then B'
Data processing is the stage at which the data is cleaned. so that if a set of variables is found, another variable will have
This implies the elimination of null values, or ones that are a given value. Thus, we might want to determine the sequence
not in the proper format. The analysis performed on table of behaviors of a student who has, as a variable, his interest
TB001OFI3 was a detailed one, taking into account the in learning. In causal data mining, the objective is to find out
relevance of the attributes. For this procedure we used SQL if one event has been the cause of another event. The
relationships found through relationship mining must satisfy and participation in the forums. The report management tool
two criteria: statistical relevance and a certain level of interest offers advantages over native Moodle reports by providing
[8]. comprehensive visualizations of the entire course [11]. This
analysis helps tutors manage data about class learning or data
G. Discovery through models generated from a past period. The benefits offered by this tool
In this model, prediction, clustering or, in some cases, can be used by all Moodle users since it is available as an
knowledge engineering (using human reasoning methods additional module of the platform.
instead of automated methods) is developed. It can be used as
a component in other analyzes such as prediction or data Another tool analyzed is Weka2. This is software that is
mining. In general, any prediction model can be used to create oriented to the extraction of knowledge in databases
rules that are used as input variables against a new resulting incorporating large volumes of information [12]. Weka's
variable. For example, learning by discovery influences how advantage over other tools is that it has been developed under
concepts or content are acquired through an active method, the GPL license. This means that it is a freely distributable
without having primary information about learning content. program; it is useful for applying to the data through the
interfaces that it offers,that allows them to be embedded
Teaching or learning by discovery, focuses on the within any application. It contains tools needed to perform
development of research skills on the part of the individual, transformations of data, classification tasks, regression,
based in particular on the inductive method, which facilitates clustering, association and visualization. It is designed as an
the development of this type of learning. Here the tutor extensibility-oriented tool. Consequently, adding new
proposes a series of exercises, and then the student will work functionality is a simple task. The implementation of the data
to find the criteria or rules necessary to solve them [9]. The mining algorithms can be applied from a command-line
consequences of the discovery by means of models have a interface or from its graphical interface.
novel effect on the student, since it implies a construction
based on the knowledge that he already possesses in facing a IV. ANALYSIS OF RESULTS
learning situation [10]. This effect derives from the In general, the data obtained after the application of
possibility of connecting what has been learned with what the educational data mining techniques can be observed in Table
student already knows. In this way, it establishes meaningful
1, where from a sample of 528 students, it was found that 73%
links with the new information, and with the possibility of
applying this knowledge in new situations. Often, model fulfilled all the proposed activities between the established
discovery emphasizes the generalized validation of a goal of 8-10 / 10. In the following criterion it is found that
prediction model across multiple contexts. 16% of the students fulfilled the minimum base of activities
and obtained a given performance of between 6-8 / 10. In the
H. Data filtering
final criterion, 11% of students who did not meet the
Another area of interest in educational data mining is data minimum base of activities found themselves below the
filtering leading to human interpretation. Humans can make expected goal.
inferences about the data as long as they are presented
properly. The methods commonly used in the EDM area are
information and visualization methods.
The visualizations most used in the educational field are
usually different from those used to solve problems of
information visualization. This is due to the specific structure
and the meaning embedded in that structure, and which is
usually present in educational data. Data are filtered for
human interpretation for two key reasons: identification and
classification. When the data is distilled for identification,
they are shown so that a human being can identify known
patterns which are, however, difficult to formally express.
For example, a classic visualization of educational data
mining is the learning curve, which represents the number of
opportunities to practice a skill on one axis, and shows the
performance (such as the percentage of hits or time taken to
respond), on the other axis [8].
I. Tools for data mining in Moodle
One of the tools with which graphical and interactive
monitoring can be done is GISMO1. This provides a useful
visualization of the activities of the students. It allows tutors
to carry out a detailed follow-up of their students as in terms
of such aspects as course attendance, reading the materials,

1 2
http://gismo.sourceforge.net/ http://www.cs.waikato.ac.nz/ml/weka/
Table II Application of mining of relationships having to treat the data. The management of virtual courses
# Of
# Of
# Of also allows a continuous improvement in terms of the use of
# Of tasks question
forums tutorials data that helps to realize projections or simulations.
Sample per naires Percentages
per per
period per
period period V. CONCLUSIONS
period
Meets all In this paper, we analyze the usefulness of the application
activities of data mining techniques in a course management system
385 16 8 16 14 73%
Expected such as Moodle. The analysis demonstrates how a system
average that, by default, contains tools for very general data
8-10 / 10
management that allows the user to apply data mining in
Meets
minimum
order to obtain useful information in a more efficient and
establishe
agile way.
d base of 86 >12 >6 >12 >12 16% The data mining applied in LMS platforms, provides
activities
useful information from its databases with regard to the
Expected
average
interaction between the student and the system in terms of
6-8 / 10
activities performed, hits and failures, notes and knowledge
Does not levels. In addition to the ability of these systems to support
meet adaptive or personalized teaching, they make the data mining
minimum application process address the specific problem of the
establishe students' learning process.
d base of 57 <12 <6 <12 <12 11%
activities Within the analysis of the data mining tools that allow the
Expected user to manage Moodle information, we have GISMO as a
average monitoring and follow-up system for students which allows
<6/10 us to extract the data associated with an online course, and
generates graphical representations that are explorable by the
tutors. However, this management system does not have the
The technique used for this analysis is that of relationship depth needed for the improvement of the diverse resources
mining, where the largest number of variables is used to available in the courses.
discover an existing relationship. In the application to the
On the other hand, Weka is a tool that allows us greater
specified case, it is possible to indicate that if the students freedom in the management of courses, since, with the use of
fulfill the greater number of activities, they will be able to its libraries, the extraction of knowledge from the database
surpass the expected goal. The analyzed EDM techniques can can be deepened. It allows the user to automatically analyze
be applied at par in the data, so that we can obtain several a large amount of data and decide which information is most
results of the analysis. For example to the results presented in relevant for fast and effective decision-making. This will
Table 1, we can apply the prediction technique with which undoubtedly help to carry out evaluations of the different
we can analyze potential students who are at risk of failing activities that have been considered as priorities in the case
the course with a considerable time. presented.
This information allows the course tutor to take steps to The use of these techniques allows tutors to obtain a
prevent the student from leaving the course and provide the detailed view of the data relating to each student, thereby
student with a personalized education. Based on the personalizing the education provided to the characteristics of
characteristics of a student such as knowledge, motivation each participant. For example, by obtaining detailed
and attitudes. These tools allow stakeholders to analyze what information on the different events of a course with statistical
factors lead the student to make concrete decisions in a values and their progression, a student classifier can be
created.
learning environment.
Data mining tools were applied to measure learning The purpose of classifying the students in this way is to
outcomes, taking into account the types of assessment create groups with similar characteristics, and ensures that the
mechanisms, both in terms of homework and with regard to resources are adapted to their needs. In the same way, the use
examinations. These simulations allow the tutor to predict the of these tools allows the tutor to discover the relationships
between the characteristics of the group under consideration
behavior of a course in terms of the development of an
and external factors.
activity, and allows the establishment of groups of students
with similar learning characteristics so that they can be REFERENCES
offered material that is adapted to their needs. The data [1] J. M. Boneu, «Plataformas abiertas de e-learning,» International
mining tools were applied to measure the results according to journal of educational technology in higher education (ETHE), vol. 1,
the types of evaluation mechanisms in terms of both duties nº 1, pp. 12-16, 2007.
and exams. The use of these evaluation elements helps to
obtain direct data on the criteria to be measured, without
[2] E. Dans, «Educación online: plataformas educativas y el dilema de la
apertura,» RUSC. Universities and Knowledge Society Journal, vol. 6,
nº 1, pp. 1-9, 2009.
[3] Ferruccio, M. A., Alonso, A. I. G., & Gómez, S. X, «Minería de
Datos,» In International Conference on Information Systems and
Technology Management , vol. 1, nº 1, p. 1, 2004.
[4] Morales, C. R., Soto, S. V., & Martínez, C. H., «Estado actual de la
aplicación de la minería de datos a los sistemas de enseñanza basada
en web,» de TAMIDA, SEVILLA, 2005.
[5] Rice, W. H., & William, H., Moodle, Birmingham: packt publishing,
2006.
[6] Klösgen, W., & Zytkow, J. M., Handbook of data mining and
knowledge discovery, New York: Oxford University Press, 2002.
[7] Cabena, P., Hadjinian, P., Stadler, R., Verhees, J., & Zanasi, A.,
Discovering data mining: from concept to implementation, New
Jersey: Prentice-Hall, 1998.
[8] Galindo, Á. J., & García, H., «Minería de Datos en la Educación,»
Universidad Carlos III, pp. 1-8, 2010.
[9] Jiménez Pierre, C. O., Parra Cervantes, P., & Bascuñan Blaset, N. A.,
«Modelo de aprendizaje por descubrimiento para alumnos de química
básica experimental.,» Edusfarm, vol. 1, nº 2, pp. 1-18, 2006.
[10] A. Barrón Ruiz, «Aprendizaje por descubrimiento: principios y
aplicaciones inadecuadas.,» Enseñanza de las Ciencias, vol. 1, nº 11,
pp. 3-11, 1993.
[11] D. V. Mazza R., «Monitoring an Online Course with the GISMO Tool:
A Case Study,» Journal of Interactive Learning Research, vol. 18, nº
2, pp. 251-265, 2007.
[12] Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., &
Witten, I. H., «The WEKA data mining software: an update,»
Newsletter, vol. 11, nº 1, pp. 10-18, 2009.

Você também pode gostar