Escolar Documentos
Profissional Documentos
Cultura Documentos
Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com Volume 2, Issue 2, March April 2013 ISSN 2278-6856
Abstract:
These Actionable patterns are identified by applying different data mining methods. Traditional data mining techniques are available to identify homogeneous features of patterns in the data sources. Enterprise applications consist of large volumes of data, mining such data causes more space complexity and time complexity. In this paper authors implemented combined mining approach i.e. Location based search mining algorithm and Cluster kinship search technique to generate the incremental cluster patterns. By observing informative patterns acquired from the above techniques efficient actionable decision making possible and also user interestingness evaluated.
1. INTRODUCTION
Real time complex data consists of vast information from this mining required information with the existing single traditional data mining method not possible. To acquire knowledgeable information from data source we should integrate one or more data mining methods. Combined mining approach implemented with different data mining algorithms on mobile transaction data set. It is a time series data represents users, location, and service and transaction path sequence. Users may access different services from various locations. Sometimes few of the users access the similar service from same location. Similar patterns [11] are identified with LBSA technique [5] by seeing behavior of the user from various transactions. In order to provide effective and efficient location management technologies in mobile communication systems, researchers recently tend to focus on characterizing the mobile users moving or calling Characteristics such as Call to Mobility Ratio (Markov model). Users moving behaviors can help to allocate personal data, pre-fetch useful information, preallocate wireless resources, and design personal paging areas. . Moving sequential patterns is a kind of moving behaviors, and we will first systematically describe the problem of mining moving sequential patterns. It can be viewed as a special case of mining sequential patterns Volume 2, Issue 2 March April 2013
with the extension of support, which helps a more reasonable pattern discovery. There are major differences when mining conventional sequential patterns and moving sequential patterns. Firstly, if two items are consecutive in a moving sequence , which is a subsequence of , those two items must be consecutive in . That is because we care about what the next move is for a mobile user in a mobile computing environment. Secondly, in mining moving sequential patterns the support considers the number of occurrences in a moving sequence, so the support of a moving sequence is the sum of the number of occurrence in all the moving sequences in the moving sequence database.
Frequent patterns of all the users from various locations maintain as a list of similarity matrix. User is dynamic in nature access service from various locations so level of similarity varied compared with other location users. All users accessing the same services have the same similarity value. Location based services have been recently attracting a lot of interest from both industry and research. When using these services, many users may be concerned with giving up one more piece of their private information by revealing their exact location, or releasing the information of having used a particular service. More generally, the association between the real identity of user issuing Location based service request and request itself as it reaches the service provider can be consider a privacy threat. The similarity of clusters is dened by the number of time intersecting points. The resulting clusters represent hyper-rectangular approximations of the true sub-space clusters. In an optional post processing step, these approximations can be rened by again applying any clustering algorithm to the points included in the approximation projected onto the corresponding subspace. Cluster kinship technique [10] is applied to get efficient informative patterns of users from similar Page 186
2. BACKGROUND
This chapter provides the introduction of the topic combined mining, and its relationship to the Actionable Pattern Discovery. The principles of combined mining are: Combine of multiple data mining methods for Discovery of actionable patterns. Combined cluster patterns generated easily are no need of depth mining. The papers [2], [3], and [4], consists of information how to create cluster patterns and cluster rules. The most common way to measure the proximity between categorical data is to use simple matching distance .which is a count of the number of matching attributes divided by the total number of attributes The Jacquard Coefficient has also been used as a Similarity measure between transactions in transactional data. These two non-metric proximity measures, although they reflect mutual proximity between all pairs of data points, do not give a global measure of the topology of the dataset. Relying on such non-metric proximity measures in the presence of categorical attributes limits the choice of the clustering algorithms used. The minimum Spanning Tree hierarchical clustering and the hierarchical clustering with group averages are widely used in such situations. The minimum spanning tree algorithm is known to be very sensitive to outliers while the group average algorithm has a tendency to split large clusters. With the objective of enabling distance based clustering methods in data sets with mixed attributes, Tuv and Runger suggested a procedure for mapping categorical variables to numeric scores. The scoring approach explores mutual relationships between variables in the data set and attempts to preserve the mutual information between all the variables. It uses a supervised contrasting independence clustering method relying on CKST as a supervised learning tool to discover contrasts between the Volume 2, Issue 2 March April 2013
3. RELATED WORK
Analyzing Multidimensional Mobile transaction dataset in clustering involves following steps. 1) Determination of transaction path sequence data: Data can be represented by real-valued expression matrix I where Iij is the measured transaction pattern of user I in experiment condition j. The Ith row of matrix is vector forming transaction pattern of user Volume 2, Issue 2 March April 2013
Threshold values
2 1.5 1 0.5 0
Incremental clusters
Figure 2: prediction of cluster patterns Temporal mobile transaction dataset consists of number of users, and their accessing different locations. Similarity levels of Services available for users in different locations. Threshold values to observe number of incremental clusters for varied values.
4. EXPERIMENTAL WORK
Mobile path sequence data consists of time series and categorical data. User can access different services from various locations in alternate fashion forms multiple patterns. Similarity value calculated for multiple users from similarity matrix of various patterns. If number of users increased takes more time evaluate the level of similarity. Similarity is fundamental to the definition of a cluster, a measure of the similarity between two patterns drawn from the same feature space is essential to most clustering procedures. Because of the variety of feature types and scales, the distance measures must be chosen carefully. It is most common to calculate the dissimilarity between two patterns using a distance measure defined on the feature space. We will focus on the well known distance measures used for patterns whose features are all continuous.
Figure 3: Transactions verses Number of users Every user creates more than one transaction at different times. One user at particular point of time may create a transaction that is varied in case of services and user identity and location of other transactions. Incremental clusters patterns depends on the number of user transactions, threshold values we are giving as input values to cluster algorithm.
Figure 4: Number of user their similarity of services Few of the users may access similar service at different times .This identification is useful to predict user particular service and present user is working with and cost estimation also possible. Analysis of a service tariffs also possible with and maximum user preferable services. Table1: Mobile Transactions dataset analysis table
User s 5 Location 3 Service s 4 Similarit y levels 0.1 Threshol d values 0.3
Figure 1: Level of similarity In this synthetic data set different users get similar patterns are further evaluated by giving different affinity threshold values to observe the number of cluster patterns forming, after certain range of threshold value same cluster pattern repeated. From this prediction of users cluster patterns possible. Number of users more cluster patterns increased. Volume 2, Issue 2 March April 2013
Page 189
[10] Amirben-dor, Ron Shamir, Zoharyakhini, clustering gene Expression Patterns 1999. [11] Hua yuan, junjie wu, Mining Maximal Frequent Patterns with Similarity Matrices of Data Records
5. CONCLUSION
Combined mining is better than individual data mining methods to retrieve knowledgeable information from the heterogeneous features dataset and to measure technical interestingness and business interestingness of users. We developed a tool Actionable Pattern Discovery using combined mining process to identify incremental cluster patterns [7] as informative patterns. Similarity matrix designed With LBSA algorithm [6] based on time distance between similar patterns in mobile sequence data set and similarity value measured. Cluster Kinship Search Algorithm (CKST) applied to cluster similar patterns with support of min and max affinity threshold values to get incremental cluster patterns. From our experimental results it is clear that level of similarity and cluster patterns depends on number of user transactions. Users pattern interestingness and prediction of user behavior possible. These patterns used to take efficient actionable decisions.
6. Future work
This process will be extended to other kinds of real time datasets such as categorical and temporal datasets, to mine informative patterns, prediction of patterns, merging of patterns and interestingness of patterns.
References
[1] L. Cao, Y. Zhao, H. Zhang, D. Luo, and C. Zhang, Flexible frameworks for actionable knowledge discovery ,IEEE Trans Sep. 2010. [2] B. Lent, A. N. Swami, and J.Widom Clustering association rules ,CDE, 1997. [3] H. Zhang, Y. Zhao, L. Cao, and C. Zhang Combined association rule mining, in Proc. PAKDD, 2008. [4] H. Zhang, L.Cao, H .Bohlscheid, Combined pattern mining from learned rules to actionable knowledge ,AI, 2008. [5] Eric Hsueh-Chan Lu, Vincent S. Tseng, Philip S. Yu, Mining Cluster-Based Temporal Mobile Sequential Patterns in Location-Based Service Environments, IEEE 2011 [6] Cao, L.yu, zhang.C, zhayo.Y, Domain Driven Data Mining, Springer, 2009 [7] Haixun wang, Wei wang, jiong wang Clustering by pattern similarity in large data sets. [8] Dorina Marghescu Evaluation of projection techniques using HUBERTSstatistics.2007 [9] J.Wang, G. Karypis, HARMONY: Efciently mining the best rules for classication, SDM, 2005. Volume 2, Issue 2 March April 2013 Page 190