Você está na página 1de 5

© 2014 IJIRT | Volume 1 Issue 6 | ISSN: 2349-6002

A SURVEY ON FOOD RECOMMENDATION SYSTEM


USING DATA MINING CONCEPTS
Rimoni Patel, Mitesh Patel
Department of computer engineering, Silver Oak College of Engineering and Technology

Abstract—Studies of general population consuming diets high Internet . The main characteristic of recommender systems is
in fat particularly saturated fat have shown increased risk of that they can personalize their interaction to each individual
cancer , diabetes and heart diesease. Worse their diet , more diet- user. Personalization involves the design of systems able to
related diseases people suffer from. Poor diet is the risk factor for infer the needs of each person and then to satisfy those needs.
cancer , heart disease ,etc. Significant health benefits can be
A recommendation system serves as an information filtering
achieved at both population and individual level by enabling a
shift towards recommended balanced diet. A recommended and customization tool. A food recommender system is an
healthy diet can act as a weapon to fight against disease. Data intermediary program which has a user interface that
mining concepts can be applied and used in food automatically and intelligently extract the useful information
recommendation systems. of people’s eating habit according to an individual’s needs.
According to different algorithms, concurrent recommendation
Index Terms : Food Recommendation , Recommender System, systems can be categorized into three categories: content based
Datamining, ID3 ,C4.5 filtering (CBF) recommendation, collaborative filtering (CF)
recommendation, and combined recommendation. By using the
I. INTRODUCTION data mining algorithms, the information filtering processes can
be performed prior to the actual recommending process so that
A balanced diet is crucial to maintaining one's physical system can be improved.
health, while an unbalanced diet may lead to disease and
sickness. The nutrients that need to be ingested vary greatly II. DATA MINING TECHNIQUES
depending on the disease or illness and personalpreferences. There are several major data mining techniques which have
Therefore, how to provide personalized food recommendation been developed and used including association, classification,
according to different personal requirements and diseases is an clustering, prediction and sequential patterns. A brief
important issue. The measurement of population intakes of explanation of such techniques. [1]
foods and nutrients is central to the science of human nutrition.
At present, patterns of dietary intake are studied on a food-by- Association
food basis, given that the base units for analysis using food
composition databases are the individual food components of Association is one of the best known data mining
every meal. Whereas the use of individual foods for the study technique. In association, a pattern is discovered based on a
of dietary patterns has served us well.The concept of analyzing relationship of a particular item on other items in the same
food combinations at the meal level is not entirely new. The transaction. Association searches for relationships between
examination of food combinations at the meal level provides an variables. For example a supermarket might gather data on
approach to deal with the complexity and unpredictability of customer purchasing habits. Using association, the
the diet and aims to overcome the limitations of the study of supermarket can determine which products are frequently
nutrients and foods in isolation [2]. bought together and use this information for marketing
Data mining is the extraction of hidden predictive purposes
information from large databases; it is a powerful technology
with great potential to help organizations focus on the most Classification
important information in their data warehouses . Data mining
tools predict future trends and behaviors, helps organizations Classification is a classic data mining technique which is
to make proactive knowledge-driven decisions . A framework based on machine learning. Classification is used to classify
based on different data mining concepts can be used for building each item in a set of data into one of predefined set of classes
a food recommender system . Application of data mining or groups. Classification method uses mathematical techniques
techniques to World Wide Web called Web mining can be used
such as decision trees, linear programming, neural network,
for recommendation system. Web mining aims to discover useful
statistics etc. For example, we can apply classification in
information and knowledge from the web hyperlink structure,
page contents, and usage data. application that “given all past records of employees who left
the company, predict which current employees are probably to
Recommendation systems represent an effective solution leave in the future.” In this case, we divide the employee’s
for reducing complexity when searching information over the records into two groups that are “leave” and “stay”. And then

IJIRT 101119 INTERNATIONAL JOURNAL OF INNOVATIVE RESEARCH IN TECHNOLOGY 617


© 2014 IJIRT | Volume 1 Issue 6 | ISSN: 2349-6002
we can ask our data mining software to classify the employees B. Data preprocessing
into each group.
Data preprocessing mainly processes and reconstructs the
Clustering source data acquired in data acquisition phase and builds the
data warehouse of related themes to create basic platform for
Clustering is a data mining technique that makes data mining process. Data preprocessing is preparation for
meaningful or useful cluster of objects that have similar data mining and it mainly includes data scrubbing, data
characteristic using automatic technique. Different from integration, data conversion, data reduction, etc.
classification, clustering technique also defines the classes and
put objects in them, while in classification objects are assigned C. Data mining
into predefined classes. For example in a library, books have a
wide range of topics available. We need to keep those books in Data mining module is the core of the whole system. Data
a way that readers can take several books in a specific topic mining is the process of extracting patterns from data, which is
without hassle. Hence by using clustering we can keep books becoming an increasingly important tool to transform this data
that have some kind of similarities in one cluster i.e.in one into information. It is commonly used in a wide range of
shelf and give it a appropriate name. If readers want to grab profiling practices, such as marketing, surveillance, fraud
books in a particular topic, he or she need only go to that shelf detection and scientific discovery. Generally speaking, the
instead of looking in the whole library. ultimate goals of data mining only are description and
prediction, the so-called description is that using a
Prediction comprehensible model to express the attributes and
characteristics information contained in the data; and the
The prediction is one of a data mining techniques that prediction is to find the discipline of the attributes according to
discovers relationship between independent variables and their existing data value and then speculate a possible attribute
relationship between dependent and independent variables. value in the future. Classical data mining techniques include
For instance, prediction analysis technique can be used in sale classification of users, finding associations between different
to predict profit for the future if we consider sale is an product items or customer behavior, and clustering of users
independent variable, profit could be a dependent variable.
Then based on the historical sale and profit data, we can draw D. Analysis and evaluation
a fitted regression curve that is used for profit prediction.
Analysis and evaluation module is to analyze the
Sequential Patterns credibility and effectiveness of the knowledge model the data
mining obtained, and to reduce evaluated conclusions to
Sequential patterns analysis in one of data mining technique provide information support for the management and decision-
that seeks to discover similar patterns in data transaction over a making of users.
business period. The uncover patterns are used for further
business analysis to recognize relationships among data. E. Knowledge formulation

Knowledge expression module refers to the knowledge


III. WEB MINING models mined from the web data by using data mining tools,
Web mining has an aim to extract useful information and and it will be shown with appropriate form to facilitate user
knowledge from the web. The web data mining process can be acceptance and mutual exchange.
described as the five functional modules, namely the data
acquisition, data pre-processing, data mining, analysis and IV. METHODOLOGY
evaluation and knowledge formulation modules. The functions
of each module are given as under. [8] A. Clustering Techniques
A. Data acquisition Food Recommendation system can make use of clustering
analysis. Clustering analysis is a common technique for
In terms of functionality, data acquisition module selectively statistical data analysis that used in various fields including
obtains data from the outside web environment to provide machine learning, pattern recognition, and data mining.
material and resources for the latter data mining. The data Clustering is a method of unsupervised learning which groups
source that the web environment provided includes the web similar objects on the basis of their attributes into a same group
pages data, hyperlinks data and the history data of user called cluster. The purpose of clustering is to group the objects
visiting log. This module is composed by three relatively based on the principle of maximizing the intraclass similarity
independent processes which are data search, data selection and minimizing the interclass similarity.
and data collection.

IJIRT 101119 INTERNATIONAL JOURNAL OF INNOVATIVE RESEARCH IN TECHNOLOGY 618


© 2014 IJIRT | Volume 1 Issue 6 | ISSN: 2349-6002
There are many clustering algorithms which can be knowledge extraction from database records and prediction of
classified into the following categories: Partitioning methods, class label from unknown data set of records. We can define
Hierarchical methods, Density-based methods, Grid-based classification is a development in which specified set of data
methods, and Model-based methods. We can use Self- records is separated into training and test data sets. For
Organizing Map and K-mean clustering for food clustering validating the model we required the test data record and for
analysis. [ 3] constructing the classification model training data set is
required. The constructed classification model is used for
Self-Organizing Map classifying and predicting new data set records. These new data
set records are different from training and test data set. For
The Self-Organizing Map (SOM) or commonly known as getting higher classification accuracy or accurate prediction we
Kohonen network is a type of artificial neural network that is required a prior knowledge of the class label data record which
trained using unsupervised learning for the visualization and makes attribute selection effortless. For higher classification
analysis of high-dimensional data purpose. It was invented by a accuracy supervised learning algorithm (like classification) is
Finnish professor named Teuvo Kohonen. The SOM composed preferred to unsupervised learning algorithm (like clustering).
of map units called nodes or neurons which are connected to A number of widespread classification algorithms used in data
adjacent neurons by a neighborhood relation. The SOM mining and decision support systems is: neural networks,
algorithm will compute a model, mi, for each node. These logistic regression, Decision trees etc. Among these
models are the representation of the input space of the training classification algorithms decision tree algorithms is the most
samples and organized into an n-dimensional ordered map in frequently used because of it is effortless to understand and
which similar models are closer to each other in the grid than cheap to implement .
the more dissimilar one.
The SOM training can be considered to be a competitive Decision tree
learning since when the input vector is presented to the
network, the Euclidean distance to all nodes in the map is Decision tree are generally used for gaining information
calculated to find the node that gives the smallest distance is for the reason of decision -making. Decision tree are
called best matching unit (BMU). Then, the weights of the classifiers on a target attribute (or class) in the form of a tree
BMU and its neighbor are adjusted towards the input vector. structure. The observations (or items) to classify are composed
This adjustment process stretches the prototypes of the BMU of attributes and their target value. The nodes of the tree can
and its topological neighbors towards the input vector. The be: a) decision nodes, in these nodes a single attribute-value is
BMU’s local neighborhood can be determined by using the tested to determine to which branch of the subtree applies. Or
neighborhood radius which will shrink with time. b) leaf nodes which indicate the value of the target attribute.
There are many algorithms for decision tree induction: Hunts
K-mean clustering Algorithm, CART, ID3, C4.5, SLIQ, and SPRINT to mention
the most common. Decision tree induction stops once all
K-means clustering is one of the most well known and observations belong to the same class (or the same range in
commonly used partitioning clustering methods. The k-means the case of continuous attributes). This implies that the
algorithm takes the input parameter, k, and partitions a set of n impurity of the leaf nodes is zero. For practical reasons,
objects into non-overlapping k clusters where k<n. This method however, most decision trees implementations use pruning by
aims to minimize the sum of squared distance between an which a node is no further split if its impurity measure or the
object to the centroid which is called sum of squared error. The number of observations in the node are below a certain
K-means algorithm proceeds as follows. First is to randomly threshold. The most frequently used decision tree algorithms
select the k centroids from a dataset. The centroid represents are ID3, C4.5 and CART.
the mean value of all the objects in the cluster. The next step is
to assign the remaining object to the nearest cluster based on The main advantages of building a classifier using a decision
the distance between the object and the cluster mean, which is tree is that it is inexpensive to construct and it is extremely
the centroid and then calculates the new mean for each cluster. fast at classifying unknown instances. Another appreciated
This process iterates until there is no change of the centroids aspect of decision tree is that they can be used to produce a set
values. In other words, until the criterion function is of rules that are easy to interpret while maintaining an
convergence . accuracy comparable to other basic classification techniques.

B. Classification techniques 1. ID3 (Iterative Dichotomiser 3)

The classification can be described as a supervised learning This is a decision tree algorithm introduced in 1986 by
algorithm. Data records are belong to class on the bases of Quinlan Ross. It is based on Hunts algorithm. The tree is
knowledge of class it assign a class labels to data to co- design constructed in two phases. The two phases are tree building
and co develop software and hardware, and hence, such and pruning. ID3 uses information gain measure to choose the
components. However, incorporation of that deal with splitting attribute. It only accepts categorical attributes in

IJIRT 101119 INTERNATIONAL JOURNAL OF INNOVATIVE RESEARCH IN TECHNOLOGY 619


© 2014 IJIRT | Volume 1 Issue 6 | ISSN: 2349-6002
building a tree model. It does not give accurate result when The perceptron model is a linear classifier that has a simple
there is noise. To remove the noise pre-processing technique and efficient learning algorithm. An ANN can have any
has to be used. number of layers. Layers in an ANN are classified into three
To build decision tree, information gain is calculated for each types: input, hidden, and output. Units in the input layer
and every attribute and select the attribute with the highest respond to data that is fed into the network. Hidden units
information gain to designate as a root node. Label the receive the weighted output from the input units. And the
attribute as a root node and the possible values of the attribute output units respond to the weighted output from the hidden
are represented as arcs. Then all possible outcome instances units and generate the final output of the network. .The main
are tested to check whether they are falling under the same advantages of ANN are that – depending on the activation
class or not. If all the instances are falling under the same function– they can perform non-linear classification tasks, and
class, the node is represented with single class name, that, due to their parallel nature, they can be efficient and even
otherwise choose the splitting attribute to classify the operate if part of the network fails. The main disadvantage is
instances. ID3 does not support pruning.ID3 does not deal that it is hard to come up with the ideal network topology for a
with continuous ,numeric data. It uses training data sets to given problem and once the topology is decided this will act
make decisions. Any adulteration of training data will result in as a lower bound for the classification error.
wrong classification.
CONCLUSION
2. C4.5
Food recommendation system can make use of several data
This algorithm is a successor to ID3 developed by Quinlan mining techniques such as clustering, classification. Survey
Ross . It is also based on Hunt’s algorithm.C4.5 handles both contain study of clustering techniques such as k-mean and
categorical and continuous attributes to build a decision tree. self-organizing map used as a methodology and also
In order to handle continuous attributes, C4.5 splits the classification techniques such as decision tree . ANN can also
attribute values into two partitions based on the selected be used for analysis of patterns.Decision tree algorithms such
threshold such that all the values above the threshold as one as ID3 and C4.5 can be used in food recommendation systems.
child and the remaining as another child. It also handles
missing attribute values. C4.5 uses Gain Ratio as an attribute REFERENCES
selection measure to build a decision tree. It removes the
biasness of information gain when there are many outcome
values of an attribute. [1] Sita Gupta, Vinod Todwal,’ Web Data Mining &
At first, calculate the gain ratio of each attribute. The root Applications’ International Journal of Engineering and
node will be the attribute whose gain ratio is maximum. C4.5 Advanced Technology (IJEAT) , February 2012.
uses pessimistic pruning to remove unnecessary branches in
the decision tree to improve the accuracy of classification. [2]Aine P Hearty and Michael J Gibney ,’Analysis of meal
C4.5 builds decision trees from a set of training data in the patterns with the use of supervised data mining techniques—
same way as ID3, using the concept of information entropy. artificial neural networks and decision trees’ Am J Clin Nutr
At each node of the tree, C4.5 chooses the attribute of the data 2008;88:1632– 42. Printed in USA. © 2008 American Society
that most effectively splits its set of samples into subsets for Nutrition.
enriched in one class or the other. The splitting criterion is the
normalized information gain (difference in entropy). The [3]Maiyaporn Phanich, Phathrajarin Pholkul, and Suphakant
attribute with the highest normalized information gain is Phimoltares’ Food Recommendation System Using Clustering
chosen to make the decision. The C4.5 algorithm then recurs Analysis for Diabetic Patients’IEEE 2010.
on the smaller sublists.
[4] Choochart Haruechaiyasak , Klong Luang, Pathumthani,
ANN(Artificial neural networks) Mei -Ling Shy, Shu-Ching Chen ‘A Data Mining Framework
for Building A Web-Page Recommender System’.
ANN can be used to predict the dietary quality
HEI(healthy eating index) based on dietary intake i.e. using
[5] Wei Peng, Juhua Chen and Haiping Zhou ‘ An
foods consumed together in meals or using meals consumed
Implementation of ID3 --- Decision Tree Learning Algorithm’
together over survey[2]. An Artificial Neural Network (ANN)
is an assembly of inter-connected nodes and weighted links
[6] Abdullah A. Aljumah, Mohammed Gulam Ahamad,
that is inspired in the architecture of the biological brain.
Mohammad Khubeb Siddiqui ,’ Application of data mining:
Nodes in an ANN are called neurons as an analogy with
Diabetes health care in young and old patients, Journal of
biological neurons. These simple functional units are
King Saud University – Computer and Information Sciences
composed into networks that have the ability to learn a
(2013) 25, 127–136.
classification problem after they are trained with sufficient
data. The simplest case of an ANN is the perceptron model.

IJIRT 101119 INTERNATIONAL JOURNAL OF INNOVATIVE RESEARCH IN TECHNOLOGY 620


© 2014 IJIRT | Volume 1 Issue 6 | ISSN: 2349-6002
[7] Shilpa Dharkar, Anand Rajavat ,” Performance Analysis
of Healthy Diet Recommendation System using Web Data
Mining” , International Journal of Scientific & Engineering
Research Volume 3, Issue 5, May-2012.

[8] Xiaocheng Li, Xinliu, Zengjie Zhang, Yongming Xia,


Songrong Qian, “Design of Healthy Eating System based on
Web Data Mining”, International Conference on Information
Engineering,2011Web Data Mining”, International
Conference on Information Engineering,2011.

IJIRT 101119 INTERNATIONAL JOURNAL OF INNOVATIVE RESEARCH IN TECHNOLOGY 621

Você também pode gostar