Você está na página 1de 4

JOURNAL OF COMPUTING, VOLUME 3, ISSUE 11, NOVEMBER 2011, ISSN 2151-9617 HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING WWW.JOURNALOFCOMPUTING.

ORG

91

A Vertical Format to Mine Regular Patterns in Incremental Transactional Databases


G. Vijay Kumar, M. Sreedevi, NVS Pavan Kumar
Abstract In real life, transactional databases grow incrementally. The occurrence behaviour of patterns may change significantly when the database is updated. Each time when a small set of transactions added into the database, it is undesirable to mine regular patterns from scratch. So Mining Regular Patterns in incremental transactional databases is an important problem in data mining. Although some efforts done in finding regular patterns in incremental transactional databases, no such method has been proposed yet by using vertical format with one database scan. Therefore, in this paper we proposed a new method called IncVDRP method to generate complete set of regular patterns in incremental transactional databases for a user given regularity threshold. Our experimental result shows our results are quite promising. Index TermsIncremental transactional database, Regular patterns, regularity threshold, Vertical format.

1 INTRODUCTION
Discovering interesting patterns in transactional databases plays an important role in Knowledge Discovery in Databases (KDD) process. Frequent Pattern Mining [4], [5] is one of the active research areas in data mining tasks. It mines the patterns which appear frequently in a database. So far there have been a good number of algorithms proposed to mine frequent patterns in transactional databases as well as incremental transactional databases. Always, the frequency of a pattern may not represent the significance of the pattern. So there are some other interesting patterns to mine from the databases. The temporal regularity of the pattern may also be an important measure in several applications like web page sequence, retail market, stock market, network monitoring and gene data analysis. For example, in a retail market the stockist may be interested in the customers who purchase the same type of goods regularly rather than who purchase the goods in bulk at a specific period. Also, in stock market the set of stocks which rise at a regular interval might be an interest stocks to stock brokers and traders. So, mining pattern regularity can also be a useful metric in data mining.

the same problem in incremental transactional database and mined regular patterns using Incremental Regular Pattern Tree (IncRT) [9] and FP-growth algorithm. They mined regular patterns using Horizontal database in both transactional and incremental transactional databases. In this paper we used the DB table which is in [9] and propose a method called Inc-VDRP method to mine regular patterns using Vertical data format [10], [11], [12]. We define a regular pattern in an incremental transactional database using vertical format that satisfy the downward closure property [1]. Our experimental result shows that mining regular patterns from IncVDRP-method is efficient in both memory and time. The rest of the paper is organized as follows, Section 2 introduces about the related work. Section 3 introduces the problem definition of regular pattern mining in an incremental transactional database. The mining process of Inc-VDRP method is in Section 4, in Section 5 we shown our experiment results. Finally, in Section 6 we concluded the paper.

2 RELATED WORK

Regular Pattern is defined as a pattern which occurs at regular intervals in a database at the user given regularity threshold. Recently, Tanbeer et al. [8] have introduced a new problem called regular patterns that follows a temporal regularity in their occurrence behaviour. They mined regular patterns in a transactional database using Regular Patter tree (RP-tree) and FP-growth algorithm [4]. They also introduced

In data mining, one of the most important techniques is Association rule mining. It was first introduced by Agarwal et al. [1]. It extracts frequent patterns, correlations, associations among sets of items in the transactional databases. It is a two step process. In the first step it finds all frequent itemsets that satisfy the minimum support threshold. In the second step it generates association rules that satisfy the minimum confidence from the frequent itemsets. So far several algorithms were developed to generate association rules. The classical algorithm is G. Vijay Kumar is with School of Computing, K L University, Guntur, the Apriori algorithm [2] proposed by R. Agarwal and R. Andhra Pradesh, India Srikanth in 1993 for mining frequent item sets. The key M. Sreedevi is with School of Computing, K L University, Guntur, idea of the algorithm is to begin with generating frequent Andhra Pradesh, India 1-itemsets, recursively generate frequent 2-itemsets, geneNVS. Pavan Kumar is with School of Computing, K L University, rates frequent 3-itemsets and so on until we generate freGuntur, India quent itemsets of all sizes. The main drawback with the classical Apriori algorithm is that it needs repeated scans

JOURNAL OF COMPUTING, VOLUME 3, ISSUE 11, NOVEMBER 2011, ISSN 2151-9617 HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING WWW.JOURNALOFCOMPUTING.ORG

92

to generate candidate set. Frequent pattern tree and FPgrowth algorithm [4] is introduced by Han et al. to mine frequent patterns without candidate generation. Periodic patterns [3], [6] and Cyclic patterns [7] are also closely related with Regular patterns. Periodic pattern mining in time-series data focuses on the cyclic behaviour of patterns either in whole or some part of time-series. Although periodic pattern mining is closely related with our work, it cannot be applied directly to mine regular patterns from an incremental transactional databases because it process with either time-series or sequential data. Tanbeer et al. [9] have introduced Regular pattern which follows a temporal regularity in an incremental transactional database by using IncRT and FP-growth mining technique [4]. They constructed IncRT by scanning the DB once. IncR-table consists of five fields i, r, tl, m, p where i-item name, r-regularity of i, tl-the current last transaction of i, m-the modification indicator for item i and p-a pointer to the IncRT. They treated new transactions of db+ together with the transactions of DB as updated database UDB. Whenever a new item set is updated, modification indicator m-field will be reset. The IncRT will be mined recursively by constructing the conditional pattern-bases and corresponding conditional trees for each regular item in IncR-table.

user given regularity threshold. Inremental regular pattern mining is to discover the complete set of regular patterns having regularity not more than in the UDB.

4 MINING REGULAR PATTERNS


Both the Apriori algorithm and FP-growth algorithm mine frequent patterns in Horizontal data format (i.e., {TID : itemset}), where TID is a transaction-id and itemset is the set of items in transaction TID. But the data can also be present in {item : TID-set} format where item is an item name and TID-set is the set of transactions containing the item. This is known as Vertical data format. We are going to mine regular patterns in an incremental transactional database using vertical format. Table 1 contains the transactional database DB. Convert it to Table 2 i.e., from the given horizontal database into vertical database format with one database scan. Table 1. Transactional Database DB TID 1 2 3 4 5 Itemsets a, d b, c, a, e a, e, b, f a, e, b, c a, b, e, f b, c, d c, e, d d, e, f d, c, b

3 PROBLEM DEFINITION
Let I = {i1, i2, ... in} be a set of items. A set X = {il .... iq} I, l q and l, q [1, n] is called a pattern or an itemset. A transaction t = (tid, Y) is a tuple where tid is a transactionid and Y is a pattern. The set of transactions T = {t1, t2, ..., tm}, is a transactional database DB over I, m = DB, with DBbe the size of DB. If X Y, then tid of t is denoted as tjX, j [1, m]. Therefore, TX = {tjX, . . . tkX}, j k and j, k [1, m] is the set of all transactions where X occurs in DB. Let tjX and tXj+1, j [1, (m -1)] be two transactions in which pattern X appears successively. The number of transactions or the time difference between tXj+1 and tjX is defined as a period of pattern X, say pX (i.e., pX = tXj+1 tjX, j [1, (m 1)]). For simplicity in period communication, a null transaction is considered with no item at the beginning of DB and its tid, tf = 0, where tf be the first transaction. At the same time, tl, is the last transaction or mth transaction in DB. i.e., tl = tm. Let for a TX, PX = {p1X,...,prX} be the set of all periods of X, where r is the total number of periods of X in TX. Then, the regularity of X can be denote as reg(X) = max(p1X, ...,prX), because the largest occurrence period of a pattern provides the upper limit of its periodic occurrence characteristic. A pattern is called a regular pattern if its regularity is not more than a user-given maximum regularity threshold called max_reg, , with 1 DB. Let db+ represent the set of transactions added to DB. The updated database is denoted as UDB, is obtained from DB db+. Given DB, dbi + (i be the number of updates) and ,

6 7 8 9

IncVDRP-table is generated to store all length -1 items with regularity and support. Table 2. IncVDRP-Table Itemset a b c d e f TID-Set 1, 2, 3, 4, 5 2, 3, 4, 5, 6, 9 2, 4, 6, 7, 9 1, 6, 7, 8, 9 2, 3, 4, 5, 7, 8 3, 5, 8 tl 5 9 9 9 8 8 S 5 6 5 5 6 3 R 4 3 2 5 2 3

Our IncVDRP-table consists of four fields (i, tl, S, R): i item name, tl the most recent tid where item i occurred, S support of an item i and R regularity of i. After inserting all transactions into IncVDRP-table, we calculate the regularity of an item i with a simple calculation to find PX from TX. Regular patterns are the patterns that are less than or equal to user given regularity threshold i.e., ( = 3). If any itemset which is greater than will be delete from the IncVDRP-table.

JOURNAL OF COMPUTING, VOLUME 3, ISSUE 11, NOVEMBER 2011, ISSN 2151-9617 HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING WWW.JOURNALOFCOMPUTING.ORG

93

Table 4. IncVDRP-table Fig.1 IncVDRP method Input : DB, = 3 Output : Complete set of regular Patterns Procedure : Let Xi I be a k-itemset PXi = 0 for all Xi For each Xi Update S Find the period of Xi PXi = PXi+1 - PXi reg (Xi) = max(PXi ) repeat If reg(Xi) <= Xi is a regular itemset Else Delete Xi Find if any db+ exist If db+ exist Repeat the procedure recursively Else Increase the k value using and operation until no candidate is generated. In Table 2, two items a, d are greater than the user given regularity threshold. Therefore, two item a and d were deleted from the table. Now use and operation to get (k+1) regular itemset. Repeat the process to get (k+i) regular itemset. We will stop doing and operation until no regular items found. Let Table 3 be the updated database to the original database i.e., db+ also inserts into IncVDRP-table by updating tl and support S. Table 3. Updated Database UDB TID 10 11 Itemsets a, e, d b, d All programs are written in Microsoft Visual C++ 6.0 and run with windows XP on a 2.66 GHz machine with 2GB of main memory. Our experiment results show that our results are quite promosing.

Itemset b c e f

TID-Set 2, 3, 4, 5, 6, 9, 11 2, 4, 6, 7, 9 2, 3, 4, 5, 7, 8, 10 3, 5, 8

tl 11 9 10 8

S 6 5 7 3

R 3 2 2 3

5 EXPERIMENTAL RESULTS
We performed broad experimental analysis on the performance of IncVDRP-method over several synthetic (eg., T1014D100K) and real (eg. mushroom, kosarak) datasets which are regularly used in frequent pattern mining experiments. Different values of regularities over real data set mushroom and over synthetic dataset T1014D100K are reported in Fig2 and Fig 3respectively.

Fig. 2 Execution Time over T1014D100K

Since our method satisfies downward closure property, when the new database added to our old database (see Table 4) Support S and tl were also updated along with items b,c,e,f. Item a and item d wont be updated because item a and item d already deleted. After refreshing the above table, only the updated itemsets that are incremented in support S are going for the above to find out regular patterns.

6 CONCLUSION
We proposed an efficient regular pattern mining method from incremental transactional databases by using vertical data format. We have also mined the updated database with the help of IncVDRP-table from the previously mined knowledge. Our experimental result shows the outer performance of our method.

JOURNAL OF COMPUTING, VOLUME 3, ISSUE 11, NOVEMBER 2011, ISSN 2151-9617 HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING WWW.JOURNALOFCOMPUTING.ORG

94

Fig 3. Execution Time Over Mushroom

ing 1998 to 2004, he worked as a faculty member in KBN College, Vijayawada, India. During 2004 to 2006, he worked as a faculty member in PVP Siddhartha Institute of Technology. Since 2006, he has been working as a faculty member in School of Computting, K L University, Guntur, India. Currently he is pursuing his Ph.D. degree in Department of Computer Science and Engineering, Acharya Nagarjuna University, Guntur, India. His research interests are data mining and knowledge discovery. M. Sreedevi received her M.C.A. degree from Acharya Nagarjuna University in 1998, Andhra Pradesh, India and M.Tech degree in Information Technology from Punjabi University in 2003, India. During 1998 to 2004 she worked as a faculty member in Nimra Institute of Technology, Vijayawada, India. During 2004 to 2006, she worked as a faculty member in PVP Siddhartha Institute of Technology. Since 2006, she has been working as a faculty member in School of Computers, K L University, Guntur, India. She is a life member of Computer Society of India. Currently she is pursuing her Ph.D. degree in Department of Computer Science and Engineering, Acharya Nagarjuna University, Guntur, India. Her research interests are data security, data mining and knowledge discovery. NVS Pavan Kumar received his M.C.A. degree from Andhra University in 1999, Andhra Pradesh, India. During 2000 to 2009 he worked as a faculty member in Krishnaveni Institute of Technology and Science, Hyderabad, India around seven years and has an experience in industry about two years. Since 2009, he has been working as a faculty member in School of Computting, K L University, Guntur, India. Currently he is pursuing his Ph.D. degree in Department of Computer Science and Engineering, K L University, Guntur, India. His research interests are data mining and knowledge discovery.

REFERENCES
[1] Agarwal, R., Imielinski, T., Swamy, A.N.: Mining Association Rules between sets of Items in Large Databases, ACM, SIGMOD Conference of Management of Data, pp. 207-216 (1993). [2] Agarwal, R., Srikanth, R. Fast algorithms for mining association rules, In Proc. 1994 International Conference on very large databases (VLDBA94), Santiago, Chile, pp. 487-499, Sept. 1994. [3] Elfeky, M.G., Aref, W.G., Elmagarmid, A.K. Periodicity detection in time series databases, IEEE Transactions on Knowledge and Data Engineering 17(7), pp. 875-887 (2005). [4] Han, J., Pei, J., Yin, Y. Mining frequent patterns without candidate generation, In Proc. ACM, SIGMOD, International Conference on Management of Data, 2000, pp. 1-12. [5] Han, J., Kamber, M. Data Mining: Concepts and Techniques, 2nd edn. An imprint of Elsevier, Morgan Kaufmann publishers, pp. 232-248, 2006. [6] Lee, G., Yang, W., Lee, J-M.: A Parallel algorithm for mining partial periodic patterns, Information Society 176, pp. 3591-3609, 2006. [7] Ozden, B., Ramaswamy, S., Silberschatz, A. Cyclic Association Rules, In.: 14th International conference on Data Engineering, pp. 412-421, 1998. [8] Tanbeer, S.K., Ahmed, C.F., Jeong, B.S., Lee, Y-K. Mining Regular Patterns in Transactional Databases, IEICE, Trans. On information and systems, E91-D, 11, pp.2568-2577, 2008. [9] Tanbeer, S.K., Ahmed, C.F., Jeong, B.S. Mining Regular Patterns in Incremental Transactional Databases, 12th International AsiaPacific web conference, 2010 IEEE, DOI 10.1109/APWeb.2010.68, pp.375-377. [10] Yi-ming, G., Zhi-jun, W. A Vertical format algorithm for mining frequent itemsets, IEEE Transactions, pp. 11-13, 2010. [11] Zaki, M.J., Karam, G. Fast Vertical Mining using Diffsets, SIGKDD03, August 24-27, 2003, Copyright 2003 ACM 1-58113737-0/03/0008. [12] Vijay Kumar, G., Sreedevi, M., Pavan Kumar, NVS. Mining Regular Patterns in Transactional Databases using vertical Format, IJARCS, Sep-Oct 2011, pp. 581-583. G. Vijay Kumar received his M.C.A. degree from Acharya Nagarjuna University in 1998, Andhra Pradesh, India and M.Tech degree in Information Technology from Punjabi University in 2003, India. Dur-

Você também pode gostar