Você está na página 1de 6

Mrs. V. SUJATHA et al.

/ (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND TECHNOLOGIES


Vol No. 1, Issue No. 2, 112 - 117

AN APPROACH TO USER NAVIGATION


PATTERN BASED ON ANT BASED
CLUSTERING AND CLASSIFICATION
USING DECISION TRESS
Mrs. V. SUJATHA 1* Dr. PUNITHAVALLI2
Computer Science Department Computer Application Department
1* 2
CMS College of Science and Commerce, SNS Arts and Science College women’s

T
Coimbatore, India Coimbatore, India
sujatha.padmakumar@rediffmail.com. mpunitha_srcw@yahoo.co.in

Abstract: Web Usage Mining (WUM) is the automatic mining techniques to automatically discover web
discovery of user access pattern from web servers. documents and services, uncover general pattern on
ES
Organizations collect large volumes of data in their
daily operations, generated automatically by web
servers and collected in server access logs. It can also
provide information on how to restructure a website to
service effectively. This paper presents how to mines the
the web and to observe user behavior (viewing, book
marking and browsing history).Web mining is the
process of finding out what users are looking for on
the internet .Some users might be looking at only
textual data, whereas some others might be interested
secondary data (web logs) derived from the users'
interaction with the web pages during certain period of in multimedia data. Web usage mining is classified
Web sessions. At first Ant-based clustering algorithm is into three and are web content mining, web structure
applied to pre-processed log files to extract frequent mining, web usage mining.
A
patterns, then it is displayed in an interpretable format Web usage mining focuses on
and secondly decision tree method is used to find and
techniques that could predict user behavior while the
predict user’s navigation behavior. Two type of
user interacts with the web. As mentioned before the
approaches are used were the offline phase is based on
mined data in this category are the secondary data on
Ant based clustering and the online phase is based on
IJ

decision trees. The experimental results represent that the web as the result of interaction. These data could
the approach can improve the quality of clustering for range very widely but generally it is classified into
user navigation pattern in web usage mining systems. usage data that resides in the web client, proxy server
These results can be use for predicting user’s next and servers. The aim of understanding the navigation
request in the huge web sites. preferences of the visitors is to enhance the quality of
Keywords -Web usage mining, web mining, web electronic commerce services ecommerce, to
log files, classification and navigation pattern
personalize the Web portals or to improve the Web
structure and Web server performance. The first
I. INTRODUCTION
stage is preprocessing, next stage is pattern discovery
Web mining The term web mining is and the last stage is pattern analysis.
coined by Etzioni in 1996, to signify the use of data

ISSN: 2230-7818 @ 2010 http://www.ijaest.iserp.org. All rights Reserved. Page 112


Mrs. V. SUJATHA et al. / (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND TECHNOLOGIES
Vol No. 1, Issue No. 2, 112 - 117

 Association Rules discover correlations


among pages accessed together by a client.
 Sequential Patterns extract frequently
occurring inter-session patterns such that the
presence of a set of items s followed by
another item in time order.
 Dependency Modeling determines if there
are any significant dependencies among the
variables in the Web.
C. Pattern Analysis

T
Pattern Analysis is the final stage of
WUM (Web Usage Mining), which involves the
Fig 1: General Architecture for Web Usage Mining
validation and interpretation of the mined pattern.
II. WEB USAGE MINING ARCHITECTURE
Validation: to eliminate the irrelevant rules or
A.Preprocessing
ES patterns and to extract the interesting rules or patterns
Pre-processing "consists of converting the
from the output of the pattern discovery process.
usage, content, and structure information contained in
Interpretation: the output of mining algorithms
the various available data sources into the data
is mainly in mathematic form and not suitable for
abstractions necessary for pattern discovery". This
direct human interpretations.
step can break into at least four sub steps: Data
III. RELATED WORK
Cleaning, User Identification, Session Identification
Identifying Web browsing strategies is a
and Formatting. Unneeded data will be deleted from
crucial step in Website design and evaluation, and
A
raw data in web log files in the data cleaning step.
requires approaches that provide information on both
At least two log file formats exists: Common Log
the extent of any particular type of user behavior and
File format (CLF) and Extended Log File format
the motivations for such behavior [9].Pattern
([16] for more details). Our university log file
discovery from web data is the key component of
consists of these fields: Date, Time, client IP address,
web mining and it converge algorithms and
IJ

Method, URI stem, Protocol status, Bytes sent,


techniques from several research areas. Baraglia and
Protocol version, Host, User Agent and Referrer.
Palmerini (2002) proposed a WUM system called
B. Pattern Discovery
SUGGEST that provide useful information to make
 Statistical Analysis such as frequency easier the web user navigation and to optimize the
analysis, mean, median, etc. web server performance. Liu and Keselj (2007)
 Clustering of users help to discover groups proposed the automatic classification of web user
of users with similar navigation patterns navigation patterns and proposed a novel approach to
(provide personalized Web content). classifying user navigation patterns and predicting
 Classification is the technique to map a data users’ future requests and Mobasher (2003) presents
item into one of several predefined classes. a Web Personalizer system which provides dynamic

ISSN: 2230-7818 @ 2010 http://www.ijaest.iserp.org. All rights Reserved. Page 113


Mrs. V. SUJATHA et al. / (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND TECHNOLOGIES
Vol No. 1, Issue No. 2, 112 - 117

recommendations, as a list of hypertext links, to


users. Jespersen et al. (2002) [10] proposed a hybrid
approach for analyzing the visitor click sequences.
Jalali et al. (2008a [7] and 2008b [8]) proposed a
system for discovering user navigation patterns using
a graph partitioning model. An undirected graph
based on connectivity between each pair of Web
pages was considered and weights were assigning to
edges of the graph. Dixit and Gadge (2010) [5]
presented another user navigation pattern mining
system based on the graph partitioning. An

T
Figure 2 Offline & Online phase
undirected graph based on connectivity between
Referrer and URI pages was presented along with a A. Offline phase of the architecture
preprocessing method to process unprocessed web This phase consists of two major
log file and a formula for assigning weights to edges
ES modules Data pretreatment and Navigation Patterns
of the undirected graph. Ant-based clustering due to Mining. In this phase starting with the primary Web-
its flexibility and self-organization has been applied Log Preprocessing (Data pretreatment) to extract user
in a variety of areas from problems arising in e- navigation session from dataset and Clustering
commerce to circuit design, and text-mining to web- algorithm to mining navigational patterns in offline
mining, etc (Jianbin et al., 2000. The various works phase .
proposed in this area with particular emphasize on B. Online phase of the architecture
web usage mining, clustering and classification was During the online phase, when a new
A
provided in this section. In this present work, request arrives at the server, the URL requested and
research work is one another attempt made to the session to which the user belongs are identified,
propose a hybrid system that uses clustering and the underlying knowledge base is updated, and a list
classification methods to discover the user’s of suggestion is appended to the requested page[6].
navigation pattern and analyze them from the server’s C. Prediction Engine.
IJ

web log file. The main objective of prediction engine in this


IV METHODOLOGY part of architecture is to classify user navigation
The refined web log files are given as an input to patterns and predicts users’ future requests.
the ant based clustering algorithm to find the user D. Ant-based Clustering
behavior pattern, then with that classification method In the case of ant-based clustering and sorting,
using decision trees are applied to predict the user’s two related types of natural ant behaviors are
next request in the huge web sites. The hybrid system modeled. When clustering, ants gather items to form
improves the quality of clustering for user navigation heaps. And when sorting, ants discriminate between
pattern in web usage mining systems. different kinds of items and spatially arrange them
according to their properties. Lumer and Faieta in

ISSN: 2230-7818 @ 2010 http://www.ijaest.iserp.org. All rights Reserved. Page 114


Mrs. V. SUJATHA et al. / (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND TECHNOLOGIES
Vol No. 1, Issue No. 2, 112 - 117

proposed ant-based data clustering algorithm (shown Input: training samples, represented by discrete
attributes; the set of candidate Attributes, attribute-list.
in Figure 3), which resembles the ant behavior
Output: set of classes
described in [4]. Method:
1. Create a node N;
2. If samples are all of the same class C, then Return
N as a leaf node labeled with the class C;
3. If attribute list is empty then Return N as a leaf
node labeled with the most common
class in samples (majority voting)
4. Select test attribute, the attribute among
attribute-list with the highest information gain ratio;
5. Label node N with test-attribute;
6. For each known value ai of test-attribute
7. Grow a branch from node N for the condition test-

T
attribute= ai;
8. Let si be the set of samples in samples for
which test-attribute = ai;
9. If si is empty then
10. Attach a leaf labeled with the most common class
in samples;
ES 11. Else attach the node returned by generate
decision- tree

Figure 4: Classification using decision trees

V. EXPERIMENTAL EVALUATION
In order to test the effectiveness of
the proposed system, server web log data file was
Figure 3: Ant based algorithm obtained. The system was tested with several data
A
E. Decision Trees collected from 90 days for easy discussion,

Decision trees are used in experiments projected here are from one day, that is,

classification and prediction. It is simple yet a data collected on 29-12-2009. As mentioned in

powerful way of knowledge representation. The section 3, the preprocessing is conducted in four
IJ

models produced by decision trees are represented in steps, namely (i) Cleaning (ii) User Identification (iii)

the form of tree structure. A leaf node indicates the Session Identification and (iv) formatting

class of the examples. The instances are classified by


sorting them down the tree from the root node to leaf
node.

Figure 5: clusters group

ISSN: 2230-7818 @ 2010 http://www.ijaest.iserp.org. All rights Reserved. Page 115


Mrs. V. SUJATHA et al. / (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND TECHNOLOGIES
Vol No. 1, Issue No. 2, 112 - 117

S.No. IP Address User Profile Unique Pages

116.68.91.110 1  15  3  {1, 15, 3, 8,


1 8  15  17 17}
117.204.97.156 1  8  3  {1, 6, 3, 11,
2 11  15  6 
1  17  23  15, 17, 23}
6

118.94.8.197 1286 {1, 2, 8, 6,


3  17  2 17}
119.27.62.254 149 {1, 4, 9, Figure: 9 interested user & non interested user
4  11  23 11, 23}
121.242.52.2 1  8  {1, 8, 13,
5 13  1  17 17}
VI. CONCLUSION

T
122.178.146.123 1  4  11  {1, 4, 11,
6 15  4 15} In this paper, a new method to extract
Figure 6 Extracted navigation patterns navigational patterns from web logs. The work
focused on group of the frequently accessed patterns
NP Navigational Pattern of interested users. It assists the web site designers to
number ES improve the performance of the web by giving
1 (P1, P15 ,P3 ,P8 ,P17 ) preference to the patterns navigated by the regular
2 (P1, P6 ,P3 ,P11 ,P15 ,P17 ,P23 ) interested users. After the clustering is completed,
3 (P1,,P2 P8, P6 ,P17 ) alignment processing has been applied to the
4 (P1, P4 ,P9 ,P11 ,P23 ) extracted sequences in each cluster and extract the
5 ( P1, P8 ,P13 ,P17 ) representative for each cluster. A Classification
6 ( P1, P4 ,P11 ,P15 ) algorithm is used for online phase to predict the user
Figure 7: Navigation pattern Generated by future request.
clustering algorithm

A. Output VII. REFERENCES


A
[1] Abraham. Natural Computation for Business
Intelligence from Web Usage Mining, Proceeding of
Seventh International Symposium on Symbolic and
Numeric Algorithms for Scientific Computing
(SYNAC2005), pp. 3-11, 2005.
IJ

[2] Baraglia, R. and Palmerini, P. (2002)


SUGGEST: A web usage mining system, Proc. of
IEEE Int’l Conf. on Information Technology: Coding
and Computing, P.282.
Figure 8: Effect of cleaning step on raw web log file [3] Clark, L., Ting, I.H., Kimble, C., Wright, P.
and Kudenko, D. (2006) Combining ethnographic
and clickstream data to identify user Web
browsingstrategies, Information Research, Vol. 11,
No. 2.

ISSN: 2230-7818 @ 2010 http://www.ijaest.iserp.org. All rights Reserved. Page 116


Mrs. V. SUJATHA et al. / (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND TECHNOLOGIES
Vol No. 1, Issue No. 2, 112 - 117

[4] Deneubourg, J.L., Goss, S., Franks, N., Data Warehousing and Knowledge Discovery, LNCS
Sendova–Franks, A., Detrain, C. and Chretien, L. 2454, Y. Kambayashi, W. Winiwarter, M. Arikawa
(1990) The Dynamics of Collective Sorting Robot– (Eds.), Pp. 73-82.
Like Ants and Ant – Like Robots. From Animals to
Animals, Proc. Of the 1st Int. Conf. on simulation of
Adaptive Behaviour, Pp. 356–363.
[5] Dixit, D. and Gadge, J. (2010) A New
Approach for Clustering of Navigation Patterns of
Online Users, International Journal of Engineering
Science and Technology, Vol. 2, No.6, Pp. 1670-
1676.

T
[6] Handl, J. and Meyer, B. (2002) Improved
ant-based clustering and sorting in a document
retrieval interface, Proceedings of the Seventh
International Conference on Parallel Problem Solving
ES
from Nature, Vol. 2439 of LNCS, Springer-Verlag,
Berlin, Germany, and Pp. 913–923.
[7] Jalali, M., Mustapha, M., Mamat, A. and
Sulaiman, M.N.B. (2008a) A new clustering
approach based on graph partitioning for navigation
patterns mining, 9th International Conference on
Pattern Recognition, Pp. 1- 4.
A
[8] Jalali, M., Mustapha, N., Mamat, A.,
Sulaiman, N.B. (2008b) Web user navigation pattern
mining approach based on graph partitioning
algorithm, Journal of Theoretical and Applied
Information Technology, Pp.
IJ

1125-1131
[9] Jalali, M., Mustapha, N., Sulaiman, N.B. and
Mamat, A. (2008c) A web usage mining approach
based on LCS algorithm in online predicting
recommendation systems, 12th International
Conference Information
Visualization, IEEE Computer Society, Pp. 302-
307.
[10] Jespersen S.E., Thorhauge J., and Bach T.
(2002), A Hybrid Approach to Web Usage Mining,

ISSN: 2230-7818 @ 2010 http://www.ijaest.iserp.org. All rights Reserved. Page 117

Você também pode gostar