Você está na página 1de 1

Text Trend Analysis via Sig ifi T t T d A ly i i Significant T S t Term Hi t y History

A Study Based on Indonesia News


Jing Doo Jing-Doo Wang Arie Budiansyah A Department of Computer Science and Information En ngineering, ngineering Asia University, Taichung Taiwan University Taichung, Data Analysis and Syste Engineering Lab. em Lab

Abstr t Ab tract
This study provides the frequency distribution of significant terms over p past time periods for text trend analysis via an Indonesia newspaper The newspaper. approach consists of t pp h i t f two steps:(1) D t P t p (1) Data Pre-processing (2) T p i g Term Hi t y G neration. Th f History Gen ti The former adopts agent t h iq d pt g t techniques t d l d th news articles to download the ti l automatically and extracts the contents of these articles The later uses an existing external memory approach to extract significant terms while articles. computing the term history simultaneously. One significant term, in this study, is a series of words that were significant enough to present one event, event action or concept The term history of one term is the frequency dist concept. tribution of that term over consecutive time periods as a time series data data. The experimental resources includes one year of Indonesia newspaper, Sera Sera ambi , ambi, containing 28, 071 articles Experimental result shows that it is articles. attractive and meaningful f tt ti d i f l for f foreigners who d i i h desire t k to know th t the trend and situation h d d it ti happened i A h province of I d d in Aceh i f Indonesia, where th majority of i h the j it f Serambi newspaper concerned with Keywords: significant term trend analysis text mining with. term, s, s

(Introduction)
In last decade a massive growth of electronic text sources has been provided and served i many ways (E il MSM electronic b k d in (Email, MSM, l t i books, website b b it browser, etc) which it t ) hi h will return a daunting task if users should reading it (one by one) to find g g ( y ) related data or information By showing related data or information in Trend Chart information. which t d chart d t came b gathering significant t hi h trend h t data by th i i ifi t term hi t i histories within th ithi the electronic text sources, it will surely help users to find his/her term term. Significant term works based on users perspective and didnt implement stemming didn t p process in finding the root of word. This study intended for foreigner and also g y g for reviewer in investigate data with time period period.

(Experimental Results)
Significant term works by finding term (continuous of word) were sufficient enough to specify an event or an action which having linearity with users term p p perspective. In other word, this study finding a term and verified it whether , y g that term significant enough or not Finding some text trend to provide trend not.Finding chart will give k l d h t ill i knowledge about hi t i l d t ti b t historical date time of t t t d which it will f text trend hi h ill meaningful for foreigner or reviewer in general. Example: Term Calon Pegawai Calon Negeri Sipil CPNS(Indonesia Government worker) one of favorite job in Indonesia CPNS (Indonesia worker),one Indonesia, at November 2009 have spike chart and it tells us on during that November 2009 p g this job vacant for registration registration.

(Data Source) ( )
Data sources t k f D t took from one of I d f Indonesia l l newspaper, S i local Serambinews bi (http://www.serambinews.com), (http://www serambinews com), in Aceh Province for one year period in year 2009 2009. Total article is 28 071 and having 1,402,634 significant term within article. 28.071 402 634 Indonesia language is one of occidental language beside of English Language. g g g g g g g

(Method)
In this study there are two main steps:(1) Data pre-processing (2) Term History pre processing Generation. Generation The first step includes agent implementation and contents extraction extraction. The second step contains significant term extraction and term history generation. p g y g At first step LWP Perl used for auto downloading the newspaper source and then step, content extraction process will clean up HTML TAG and PARSING th newspaper t t t ti ill l d the source into newsdate, newstype, and newscontent.

At second step process finding significant term began by verifying the right and step, left boundary of t l ft b d f term. If t term passed, it meant, th t t d t that term significant t i ifi t term. Th The significant term is continuous of word (single or compound word) that were sufficient enough to specify an event or an action Last step significant term action. step, along with their time ( l g i h h i i (newsdate) will stored i d ) ill d into d b database.

(R f Reference) )
1. Jing Doo Wang. 1 Jing-Doo Wang External memory approach to compute the maximal repeats across classes from DNA sequences. A i Journal of H lth and I f l f Asian J l f Health d Information ti

Sciences, 1(2):276 295, 2006. 1(2):276295,


2. Jyh-Jong 2 Jyh Jong Tsay and Jing-Doo Wang. A scalable approach from Chinese term Jing Doo Wang extraction. i 2000 International Computer sympozium (ICS2000), p g i in i l C p y p i ( CS2000) pages 246 253, 2000 246253 2000. 3. Serambi Newspaper. http://www.serambinews.com/, 2010. 3 S bi N htt // bi / 2010 4. Hsin-Hsi Chen and Guo-Wei Bian. Proper name extraction from web pages Hsin Hsi Guo Wei for finding people in internet. in proceedings of ROCLIng x pages 143158, internet x. 143158 1997. 1997 5. 5 Donald Metzler Bruce Croft and Trevor Strohman Search Engines: Information Strohman.

Retrieval in Practice. Addison Wesley 2009. Practice Addison-Wesley, 2009

The IET In ernationa Con erence on F on ier Comp tin Theo y, Technologie and Ap lic tio s on Au us 4, 2010 eI nte na onal Confe en e on Fr nti r Comput ng T ory ec no og es nd App cat on o Aug st , 2 0

Você também pode gostar