Você está na página 1de 5

2015 1st International Conference on Futuristic trend in Computational Analysis and Knowledge Management (ABLAZE-2015)

Investigation and Performance Improvement of Web


Cache Recommender System
Priyansha Bangar Kedar Nath Singh
Department of Computer Science Department of Computer Science
TIT Science, Bhopal (M.P), India TIT Science, Bhopal (M.P), India
priyanshabangar@gmail.com cseknsingh@gmail.com

use of data mining is found in web mining where the web


AbstractA number of large and small scale applications are
developed now in these days for fulfilling the users need. In recent related data is evaluated for finding essential patterns.
years the web based applications are also growing rapidly. Due to
this the network performance is affected and browsing experience Web is a rich source of information, there are information
becomes slow. Thus performance improvement of traditional
spread over network which is extracted using the user browsers.
browsing and prefetching techniques are required, by which the
application speed is optimized and delivers the high performance But due to rapid increasing demand of the web applications the
web pages. Thus, in this paper pre-fetching techniques are navigation of web applications become slower than as expected.
investigated, and for cache replacement a recommendation system Therefore, the prefetching techniques are utilized with the
is developed. In order to design recommendation engine a intermediate proxy servers. These schemes help to find the
promising data model is find in [6]. The given system utilizes the frequent user accessed data and pre-fetches the most relevant
proxy access log for data analysis. The main advantage of proxy data that are accessed by a user next.
access log, it contains entire navigations of web pages by a
targeted user. This data model offers high performance outcomes. Therefore, these systems require a predictive technique
But computational complexity is not much adoptable. Thus the which regulates the proxy to pre-fetch required data before user
traditional data model is modified using a new scheme, where the request. Using this methodology required data is extracted
K-mean algorithm is applied for user data personalization. Then before user navigation. Thus the proposed work is focused on
after ID3 algorithm is used, for learning the user navigation finding the most optimum predictive technique which analyse
patterns and KNN and probability theory is utilized for predicting previous browsing history, using proxy server log files. This
the upcoming web URLs for pre-fetching. The proposed data
model is implemented using visual studio framework and the
discovers the most appropriate access patterns of the users
performance of the system are evaluated and compared in terms from available proxy log data. Using this user behaviour
of memory used, time consumption, accuracy and error rate. analysis system predict the next web pages which are accessed
According to the obtained results the proposed predictive system next.
offers high performance results as compared to the traditional
data model.
In order to design such an efficient and accurate technique a
number of research articles are explored and a research article
Keywords KNN, ID3, K-means, pre-fetching, caching [6] found much promising for proposed system design. In this
model the frequent access pattern and K-mean clustering
I. INTRODUCTION algorithm is utilized for finding the next web pages. This
Data mining refers to extracting or discovering knowledge technique provides the adoptable performance for the web pre-
from large amount of data. The term data mining is actually fetching. This paper provides an extension of the approach
misnomer of the knowledge mining. In a data mining process, listed in [6]. Further section describes the presented concept
quality and quantity of data may affect performance of the and improvements of traditional approach.
classification and decision making task [1]. A dataset may II. PROPOSED WORK
contain inappropriate, redundant information instances that are
known as noise. It introduces difficulties in learning processes. The pre-fetching technique is used to enhance the
To overcome this, data pre-processing is performed on dataset browsing experience. On the other hand the increasing
to improve quality of data. It includes data cleaning, traffic load on networks degrade the speed of network
normalization, transformation, feature extraction [3], [4] and therefore improvements on current pre-fetching techniques
feature selection [4], [5] etc. In real-world, the data has many required. The pre-fetching technique [6] requires a
features, out of which only a few may be related to the target predictive system implementation which is not exist in the
concept. There may be redundancy in data, where certain available data model. In addition of that the frequent item
features may be correlated. In such a case, it may not necessary set mining generates the rules by which computational
to includes all of them in modelling. Thus data mining having a complexity increases as the number of transactional patterns
large set of applications of data analysis one of the essential are increases. Thus the given technique is required to

978-1-4799-8433-6/15/$31.00 2015 IEEE

585
2015 1st International Conference on Futuristic trend in Computational Analysis and Knowledge Management (ABLAZE-2015)

enhance in terms of predictive accuracy and the space and That is used for further learning predictive model
time complexity. development.
In order to enhance the previously designed technique the 4. Proposed data model: The pre-processed log data is
following suggestions are provided. used with the proposed learning model for
recommending the web URLs.
1. User wise web access data clustering: In this phase
5. Traditional data model: In this phase the traditional
the K-mean clustering algorithm is applied on the
data model is implemented for web URL
proxy server log data. Therefore the entire
recommendation.
connected users accessed data from the proxy
6. Recommender data: The learned data model is used
server is grouped according to the IP addresses.
to predict the URLs of next web pages for fetching
2. Prepare the supervised learning model based on
the web pages.
the clustered data: In this phase the ID3 decision
7. Performance analysis: In this phase the
tree algorithm is applied on clustered data, using
performance of the predictive system is measured
the clustered data ID3 prepare the learning model
and visualized.
for prediction.
3. Probability based next web page election and
recommendation design: For finding the next web
accessed page. The selection of web page is
performed using the probability estimation and
KNN algorithm.
In order to justify the proposed modification and
effectiveness of model, both the techniques are required to
simulate. Thus for simulation the following simulation
architecture is proposed for comparative study as given in
figure 1.

Fig. 2. Proposed system

The proposed pre-fetching system is given in figure 2, the


subcomponents of the proposed model is described as:
1. Input web access log: In this phase the proxy web
access log is provided as input to the system.
2. Pre-processing: In this phase the input data is
cleaned and transformed into a relational data table.
3. Intermediate storage: The pre-processed data is
stored in this database table for learning and
Fig. 1. Simulation architecture recommendation engine.
4. K-mean clustering: The pre-processed data is
1. Input proxy access log: The historical proxy web
consumed in this phase first, additionally the K-
access log file is provided as input to the system for
mean clustering is applied over data which
analysis.
generates the IP based clusters for learn using ID3
2. Pre-processing: The input log file is cleaned and
algorithm.
transformed in this phase by which the selected
5. ID3: The clustered data is provided into the ID3
attributes and their values are extracted from log
decision tree which learns from the database.
file.
6. Probability computations: The leaf nodes
3. Intermediate storage: The extracted web log files
probability is computed to find the most frequent
attributes are stored in a database table temporarily.

586
2015 1st International Conference on Futuristic trend in Computational Analysis and Knowledge Management (ABLAZE-2015)

URLs from the available historical data for the 800 77.37 71.08 6.29
current input sequence. 1000 79.27 73.81 5.46
7. KNN: K-nearest neighbour algorithm is applied
1500 81.42 74.76 6.66
over the URLs, for finding the less distance URLs
2000 85.43 75.09 10.34
as prediction of next prefetching.
8. Recommended Data: The system predicts the According to the given results the performance of the
upcoming web URLs for the currently generated proposed algorithm is higher than traditional algorithm and
user input sequences. increases as the amount of data is increases for training. In
addition of that the tabular data analysis is given using table 1.
III. RESULT ANALYSIS That is also observed the average gain over the traditional
In this section performance of the proposed pre-fetching technique is found 6.014 %.
system is evaluated and compared with the traditional system. Error Rate
The performance comparison between both the techniques is Error rate of the predictive algorithm demonstrate the amount
performed in terms of memory consumption, time consumption, of data which is not correctly recognized using algorithm. That
accuracy and error rate.
Accuracy can be computed using the given formula:
In the predictive systems the accuracy of the system is given as
the amount of data that correctly recognized during
classification of the input test data. The accuracy of the system Or
can be evaluated using the following formula:

The error rate of the system is given using figure 4 in this


diagram the blue line shows the performance of proposed
algorithm and the red line shows the performance of traditional
algorithm. In order to show the performance of the system X
axis shows the data size for training and Y axis shows the
percentage error rate. According to the obtained results the
error rate is decreases as the amount of data for learning is
increases thus the performance of the proposed classifier is
much efficient than the traditional classification approach.

Fig. 3. Accuracy

The comparative performance between both the techniques


are given using figure 3 in this diagram X axis demonstrate the
amount of data for training in addition of that Y axis shows the
percentage accuracy of the algorithms. In order to show
performance blue line shows the performance of proposed
algorithm and the red line demonstrate the performance of
traditional algorithm.
Fig. 4. Error rate
Table 1 Accuracy
Table 2 Error rate
Dataset size Proposed traditional % gain
method method Dataset size Proposed traditional improvement
method method
100 71.43 67.22 4.21
100 28.57 32.78 4.21
300 73.91 68.91 5
300 26.09 31.09 5
500 74.29 70.15 4.14

587
2015 1st International Conference on Futuristic trend in Computational Analysis and Knowledge Management (ABLAZE-2015)

500 25.71 29.85 4.14 Time Consumption


800 22.63 28.92 6.29 The time consumption of the system is also termed as the
1000 20.73 26.19 5.46 time complexity. Time consumption of the system denotes
the amount of time required to processes the algorithm. The
1500 18.58 25.24 6.66 comparative performance of the proposed and traditional
2000 14.57 24.91 10.34 algorithm is given using figure 6.

As given results in table 2 the error rate of the predictive According to the given results the red line shows the
algorithm improved as the data for training input is increases performance of traditional algorithm and blue line shows the
and about 6.014% enhancing as compared to the traditional performance of proposed algorithm. Where the X axis
predictive technique. demonstrates the data size and the Y axis shows the time in
terms of milliseconds. According to the obtained performance
Memory Used the proposed algorithm consumes less time for training and
Memory consumption is sometimes also termed as the the traditional algorithm consumes higher time.
space complexity. The space complexity demonstrates the
amount of main memory required to successfully execute the
algorithm. The comparative performance of both the
technique is given using figure 5 where the X axis shows the
data size and Y axis shows the memory consumed in terms of
KB. The memory consumption with increasing amount of
data for traditional algorithm is given using red line and the
blue line shows the performance of proposed algorithm.
According to the obtained results the memory consumption of
the proposed algorithm is less than the traditional algorithm.

Fig. 6. Time consumption


Table 4 Time consumption
Proposed traditional
Dataset size Time gain
method method
100 4.5 5.7 1.2
300 8.2 12.5 4.3
500 13.5 18.4 4.9
800 29.8 36.2 6.4

Fig. 5. Memory used 1000 39.1 48.5 9.4

Table 3 Memory consumption 1500 53.3 63.9 10.6

Proposed Traditional 2000 69.2 75.09 5.89


Dataset size Memory gain
method method
The proposed predictive technique requires 6.09
100 28911 28991 -80
milliseconds less time for performing training as compared to
300 29019 29918 899 the traditional technique.
500 30193 31842 1649
IV CONCLUSION
800 31583 32842 1259
The main motive of the proposed work is to investigate
1000 32891 34928 2037 the web browsing performance enhancement techniques in
1500 33882 35837 1955 addition of that improving the performance of a traditional
pre-fetching technique. Therefore a number of different
2000 35781 36911 1130
research articles and papers are studied. Then after the
According to the obtained results as given in table 3 the issues and challenges are addressed, first the performance
proposed methods memory consumption is less than the accuracy and second the computational cost. Thus a new
traditional approach about 1264 KB.

588
2015 1st International Conference on Futuristic trend in Computational Analysis and Knowledge Management (ABLAZE-2015)

data model is proposed which offer high accuracy during Prefetching Techniques, 978-1-4673-6217-7/13/$31.00 c
prediction with less computational cost. 2013 IEEE.

The proposed data model includes a hybrid approach of


data analysis and prediction. Therefore three different
algorithms are applied in a sequence. First the input web
access log of the proxy server is pre-processed and stored in
an intermediate storage. Where a K-means clustering
algorithm is applied and IP address based clusters are
produces. This user based clustered data is used to develop
ID3 based decision rules. Than after for the current user
navigation sequences based next URLs is predicted in this
step KNN (K-nearest neighbour) algorithm is applied for
prediction. The proposed model is implemented using visual
studio platform and then the performance of the system is
evaluated. The performance evaluation of the system is
performed in terms of accuracy, error rate, memory
consumption and time consumption. The performance
summary of the system is given using table 5.
Table 5 performance summary

S. Proposed Traditional
Parameters
No method method
1 Accuracy High Low
2 Error rate Low High
3 Memory consumption Low High
4 Time consumption Low High

According to the evaluated results the performance of the


proposed data model is adoptable due to high accurate
predictive results, with reducing error rate and low time and
space complexity. In near future the performance of the
proposed predictive system is enhanced more by applying
the other classification techniques.

REFERENCES

[1] Der-Chiang Li and Chian-Wen Liu, Extending Attribute


Information for Small Dataset Classification IEEE
Transactions on Knowledge Discovery and Data Engineering
vol. 24. no. 3, pp.452- 464 March.
[2] J. Dy and C. Brodley, Feature Subset Selection and Order
Identification for Unsupervised Learning, Proc. 17th Intl
Conf. Machine Learning, pp.247-254, 2000.
[3] H. Liu and H. Motoda, Feature Extraction, Construction and
Selection: A Data Mining Perspective, Kluwer Academic
Publishers, 1998.
[4] H. Motoda and H. Liu, Feature Selection, Extraction and
Construction, Proc. Sixth Pacific- Asia Conf. Knowledge
Discovery and Data Mining, pp.67- 72, 2002.
[5] Liu, H., Motoda H., Feature Selection for Knowledge
Discovery and Data Mining. Kluwer Academic Publishers,
Norwell, MA, USA, pp.167-178, 1998.
[6] N Singh, A Panwar, and R Shringar Raw, Enhancing the
Performance of Web Proxy Server through Cluster Based

589

Você também pode gostar