Escolar Documentos
Profissional Documentos
Cultura Documentos
ISSN: 2321-8827
Vol. 1 Issue 5, October - 2013
techniques available as assuming handcrafted user commands the users typed in UNIX shells.
profile which encode repertoire of the observed user. 1.2 Classifying a new sequence of commands into the
We will be able to see challenges from this result with predefined profiles.
comparative study during creating evolve system
approach & predict it. In Summary, our Contributions are:
• We discover the limitations and their root causes
Keywords - Evolving Fuzzy Systems, Fuzzy-Rule Based when creating user behaviour profile in terms of
(FRB) Classifier, User Modelling. classifying relevant sequence of events.
• We generalize proposed previous work regarding
knowledge about computer user with increased
1. Introduction
complexity of thinking user behaviour.
Knowledge about computer user is very beneficial
• We extend new algorithm to execute the
to assist, to predict for creating & recognize behaviour
environments in which segmentation of subsequent
of profile. The recognition of other behaviour profile in
relevant events evaluated by using frequency based
real time significant offers different tasks such as to
method.
predict their future action. Specifically, computer user
• A comparative study to revise existing hypothesis
modelling learned about ordinary observing user to
than it is to generate hypothesis when each time new
promote a way of experience user profile. However, the
instance is observed.
construction of effective user profile problematic to
• To detect Masquerades (Un-Authorized work) when it
human behaviour is often erratic and sometimes it is
tends to knowledge of computer user.
different for their change of goals. There exists several
definition for user profile [1]. It defined as description
of user interests, characteristics, behaviours and 2. Motivation & Preliminary
preferences. In recent years, significant work has been Various approaches have been proposed as literature
carried out for profiling to the environment and new point of view that user profile usually changes to
goals of the user. Example behind this profile which recognize behaviour of others in real-time. To predict,
proposed in a previous work [2]. We approach to coordinate, to recognize human brain capacity for
(EVABCD) Evolving Agent behaviour Classification future actions. Different methods have been used to
IJCSRTV1IS050006 www.ijcsrt.org 28
International Journal of Computer Science Research & Technology (IJCSRT)
ISSN: 2321-8827
Vol. 1 Issue 5, October - 2013
find out relevant information in computer user Learning Vector Quantization (LVQ) is the nearest
behaviour in different computer areas : prototype learning algorithm [7]. LVQ considered to be
a supervised clustering algorithm which each weight
vector interpreted as a cluster center. Using this
algorithm number of reference vectors has to be set by
user. Poirier and Ferrieux proposed a method to
generate new porotypes dynamically. LVQ method
lacks the generation of prototype for application with
noisy data in Dynamic Vector Quantization (DVQ).
2.1 Discovery of navigation patterns However, this research focus command line interface, it
Spiliopoulou and Faulstich present the Web Utilization is necessary to approachable process in real time
Miner (WUM), a mining system for discovering streaming data. To capture sudden and abrupt changes
interesting navigation patterns in website.WUM in streaming data with necessary not only tuning
prepares the web log data for mining and the language parameter but also change in structure. Taking these
MINT mining the aggregated data according to the aspect when proposed a paper to evolving fuzzy-rule-
directives of the human expert [5]. based system; However, approach has important
advantage which makes it very useful in real
2.2 Computer security environments :
Pepyne et al [6] describe a method using queuing
theory and logistic regression modeling methods for • It can cope with huge amounts and data.
profiling computer users based on simple temporal • Its evolving structure can capture sudden and abrupt
aspects of their behavior. changes in the streams of data.
• Its structure meaning is very clear, as we propose a
rule-based classifier.
3.Existing Effective Classification Techniques
• It is monitoring in single pass computation with
In Observed Classifier efficient and fast.
Following several incremental effective classifier in • Its classifier structure is simple and interpretable.
evolving fuzzy rule based system which work with
automatically gain by observed behaviour for adaptive
distribution of relevant events. This classifier
4.Proposed Methodology
To Improve the Performance User Behaviour Profile
implemented using different framework.
system by using EVABCD as agent to predicting
masquerades. It is predicted with the help of standard
3.1 Prototype-based supervised algorithm platform such as JAVA and with environment of
LINUX etc. Primary Objectives of the proposed system
IJCSRTV1IS050006 www.ijcsrt.org 29
International Journal of Computer Science Research & Technology (IJCSRT)
ISSN: 2321-8827
Vol. 1 Issue 5, October - 2013
can be summarized as follows Creating User Behaviour to batch Bayesian classifier in terms classification
Profile in terms of classifying relevant sequence of accuracy. However, the proposed incremental Bayesian
events, Knowledge about Computer User with classifier has very high speed efficiency in comparison
increased complexity of thinking User Behaviour, to batch Bayesian classifier.
Segmentation of subsequent relevant events evaluated
by using frequency base method.
IJCSRTV1IS050006 www.ijcsrt.org 30
International Journal of Computer Science Research & Technology (IJCSRT)
ISSN: 2321-8827
Vol. 1 Issue 5, October - 2013
4.4.2 Storage of the subsequences in a trie matrix will represent a particular subsequence of
The subsequences of commands are stored in a trie data commands. In the previous example, the trie consists of
structure. When a new model needs to be constructed, nine nodes; therefore, the corresponding profile
we create an empty trie. And it insert each subsequence consists of nine different subsequences which are
of events into it, such that all possible subsequences are labeled with its support. It shows the distribution of
accessible and explicitly represented. Every trie node these subsequences. Once a user behaviour profile has
represents the subsequences of commands are stored in been created, it is classified and used to update the
a trie data structure. Evolving-Profile-Library, as explained in the next
section.
The construction of a user profile from a single
sequence of commands is done by a three step process. 5. Mathematical steps in implementation of
When a new subsequence is inserted into a trie, the each block
existing nodes are modified new nodes are created. As A prototype is a data sample a behaviour
the dependencies of the commands are relevant in the represented by a distribution of subsequences of
user profile, the subsequence suffixes subsequences commands that represents several samples which
that extend to the end of the given sequence are also represent a certain class. The classifier is initialized
inserted. Considering the previous example, the first with the first data sample, which is stored in EPLib.
subsequence (fls-date-lsg) is added as the first branch Then, each data sample is classified to one of the
of the empty trie (Fig.4.4.2.A). Each node is labeled prototypes classes defined in the classifier. Finally,
with the number 1 which indicates that the command based on the potential of the new data sample to
has been inserted in the node once this number is become a prototype, it could form a new prototype or
enclosed in square brackets. Then, the suffixes of the replace an existing one.
subsequence (fdate- sg-and-ls) are also inserted
(Fig.4.4.2.B). Finally, after inserting the three 5.1 Calculate the Potential of Data Sample
T
subsequences and its corresponding suffixes, the The potential (P) of the kth data sample is calculated by
completed trie is obtained (Fig.4.4.2.C).
SR
(1) which represents a function of the accumulated
distance between a sample and all the other k-1 samples
in the data space [11]. The result of this function
represents the density of the data that surrounds a
IJC
IJCSRTV1IS050006 www.ijcsrt.org 31
International Journal of Computer Science Research & Technology (IJCSRT)
ISSN: 2321-8827
Vol. 1 Issue 5, October - 2013
processed by the classifier. The structure of this distance and in using cosine distance. This formula is
classifier includes as follows:
𝑛 𝑗 𝑗 𝑗 𝑗
𝐵𝑘 = 𝑗 −1 𝑧𝑘 𝑏𝑘 ; 𝑏𝑘 = 𝑏(𝑘−1) +
1. Classify the new sample in a class represented by a
prototype. 𝑗
(𝑧𝑘 )2
2. Calculate the potential of the new data sample to be a (4)
1 (𝑧 𝑙 )2
prototype. 𝑙−1 1
3. Update all the prototypes considering the new data
sample. It is done because the density of the data space 𝑗
(𝑧𝑘 )2
𝑗
surrounding certain data sample changes with the 𝑏1 = 1 (𝑧 𝑙 )2 ; 𝑗 = 1, 𝑛 + 1 ,
insertion of each new data sample. Insert the new data 𝑙−1 1
IJCSRTV1IS050006 www.ijcsrt.org 32
International Journal of Computer Science Research & Technology (IJCSRT)
ISSN: 2321-8827
Vol. 1 Issue 5, October - 2013
𝑖 = 1, 𝑁𝑢𝑚𝑃𝑟𝑜𝑡𝑜𝑡𝑦𝑝𝑒𝑠 ∶ ∋ 𝑧𝑘 > 𝑒 1 (6) depends on the number of prototypes and its number of
For this reason, we calculate the membership function attributes. The self evolving system makes use of
between a data sample and a prototype which is defined recursive formula to find the potential of a new data
as where it represents the cosine distance between a sample to form new prototype or replace existing
data sample zk and the ith prototype P, i represents the prototype. The masquerade user data sample are
spread of the membership function, which also compared with the trained prototypes and behaviour is
symbolizes the radius of the zone of influence of the detected based on standard deviation.
prototype. The equation to get the spread of the kth data
sample is defined as where represents the cosine TABLE 5.6 Total Number of Different Subsequences
distance between a data sample (zk) and the ith Obtained
prototype(P); i represents the spread of the membership
No. of Command Sub-Sequence No. of Different Sub-
function, which also symbolizes the radius of the zone
per User Length Sequences
of influence of the prototype. This spread is determined
based on the scatter of the data. The equation to get the
3 799
spread of the kth data sample is defined as:
100
∈ 𝑘 = 4 799
1 𝑘
𝑘 𝑗 =1 𝑐𝑜𝑠𝐷𝑖𝑠𝑡(𝑃𝑟𝑜𝑡𝑖, 𝑧𝑘) ; ∈𝑖 0 =1 , 5 799
(7)
6 799
IJCSRTV1IS050006 www.ijcsrt.org 33
International Journal of Computer Science Research & Technology (IJCSRT)
ISSN: 2321-8827
Vol. 1 Issue 5, October - 2013
Fig.5.7 Evolution of the Classification Rate during Fig.5.8 Data Sample Entity for Different User
Online Learning with a Subset of UNIX User Data Set Prototypes
The combined features of learning agent adapted with Applying this technique in ABCD, the subsequences
knowledge-based, logic systems, case-based reasoning typed by a user are indexed [19] with a number that
and connectionist-based systems. A [18] suitable indicates the moment they were read. This value can be
different classifier algorithm shows significant considered as an integer from 1 the first subsequence
interfacing with command-line environment. It is read to the number of subsequences read. Using this
T
suitable for detecting Masquerades when it ignore the value, the Age of a subsequence can be calculated. This
fact that user behaviour cannot change and evolve. age value indicates how old a subsequence stored in a
SR
user profile having formula for calculating this value is
5.8 Classification Rate of Different Classifiers shown in fig.5.8.
in the UNIX Users Environments
IJC
IJCSRTV1IS050006 www.ijcsrt.org 34
International Journal of Computer Science Research & Technology (IJCSRT)
ISSN: 2321-8827
Vol. 1 Issue 5, October - 2013
engineering tasks as well as for simulation of emerging 4, pp. 497 508, http://dx.doi.org/10.1109/5326.983933, Nov.
and evolving biological and cognitive processes. The 2001.
self evolving system makes use of recursive formula to [15] G. Widmer and M. Kubat, “Learning in the Presence of
find the potential of a new data sample to form new Concept Drift and Hidden Contexts,” Machine Learning, vol.
23, pp. 69 101, 1996.
prototype or replace existing prototype. The [16] D. Kalles and T. Morris, “Efficient Incremental
masquerade user data sample are compared with the Induction of Decision Trees,” Machine Learning, vol. 24, no.
trained prototypes and behaviour is detected based on 3, pp. 231 242, 1996.
standard deviation. [17] F.J. Ferrer Troyano, J.S. Aguilar Ruiz, and J.C.R.
Santos, “Data Streams Classification by Incremental Rule
8. References Learning with Parameterized Generalization,” Proc. ACM
[1] D. Godoy and A. Amandi, “User Profiling in Personal Symp. Applied Computing (SAC), pp. 657 661, 2006.
Information Agents: A Survey,” Knowledge Eng. Rev., vol. [18] N. Kasabov, “Evolving Fuzzy Neural Networks for
20, no. 4, pp. 329 361, 2005. Supervised/Unsupervised Online Knowledge Based
[2] J.A. Iglesias, A. Ledezma, and A. Sanchis, “Creating User Learning,” IEEE Trans. Systems, Man and Cybernetics Part
Profiles from a Command Line Interface: A Statistical B: Cybernetics, vol. 31, no. 6, pp. 902 918, Dec. 2001.
Approach,” Proc. Int’l Conf. User Modeling, Adaptation, and [19] F. Poirier and A. Ferrieux, “Dvq: Dynamic Vector
Personalization (UMAP), pp. 90 101, 2009. Quantization An Incremental Lvq”, Proc. Int’l Conf.
[3] M. Schonlau, W. Dumouchel, W.H. Ju, A.F. Karr, and Artificial Neural Networks, pp. 1333 1336, 1991.
Theus, “Computer Intrusion: Detecting Masquerades,” [20] P. Angelov and X. Zhou, “Evolving Fuzzy Rule Based
Statistical Science, vol. 16, pp. 58 74, 2001. Classifiers from Data Streams,” IEEE Trans. Fuzzy Systems:
[4] Fredkin, E.: “Trie memory,” Comm. ACM 3(9), 490–499 Special Issue on Evolving Fuzzy Systems, vol. 16, no. 6, pp.
(1960). 1462 1475, Dec. 2008.
[5] Wexelblat, A.: “An environment for aiding information-
browsing tasks.” In: Proc. Of AAAI Spring Symposium on
Acquisition, Learning and Demonstration: Automating Tasks
T
for Users. AAAI Press, Menlo Park (1996).
[6] Pepyne, D.L., Hu, J., Gong, W.: “User profiling for
SR
computer security,” In: Proceedings of the American Control
Conference, pp. 982–987 (2004).
[7] T. Kohonen, J. Kangas, J. Laaksonen, and K. Torkkola,
“Lvq pak: A Program Package for the Correct Application of
IJC
IJCSRTV1IS050006 www.ijcsrt.org 35