Você está na página 1de 6

Mining WWW

Abstract Web mining is a very hot research topic which combines two of the activated research areas: Data Mining and World Wide Web. The Web mining research relates to several research communities such as Database, Information Retrieval and Artificial Intelligence. Although there exists quite some confusion about the Web mining, the most recognized approach is to categorize Web mining into three areas: Web content mining, Web structure mining, and Web usage mining. I. INTRODUCTION: Data mining refers to the process of analysing the data from different perspectives and summarizing it into useful information. Data mining software is one of the number of tools used for analysing data. It allows users to analyse from many different dimensions and angles, categorize it, and summarize the relationship identified. Data mining is about technique for finding and describing Structural Patterns in data.

III. DIFFERENT TYPES OF DATA MINING: Business Data Mining. Scientific Data Mining. Internet Data Mining.

IV. MAJOR ELEMENTS OF DATA MINING: Extract, Transform and load transaction data on to the data warehouse system. Store and manage data multidimensional database system. in

Provide access to business analysts and information technology Professionals. Analyse the data by application software. Present the data in useful format such as graph or table.

V. REQUIREMENTS OF DATA MINING. II. DEFINITION: Data mining is the process of finding correlation or patterns among fields in large relational databases. The process of extracting valid, previously unknown, comprehensible , and actionable information from large databases and using it to make crucial business decisions. (Simousis 1996). Handling of different types of data. Efficiency and scalability of algorithm. Usefulness, certainty and expressiveness of results. Expression of various kinds of mining results. Interactive mining knowledge at multiple levels. Mining information sources of data. from different

Fig: 1 Stages of Data Processing. VI.

Protection of privacy and data security.

VARIOUS KINDS OF DATA ON WHICH DATA MINING IS APPLIED:

Relational database. Data warehouse. Transactional database. Multimedia database Spatial and temporal data. Object relational database.

scalability, multimedia and temporal data respectively, due to those situations; the users are currently drowning in an information overload that expands at rate that far outpaces human ability to process and exploit it.

VII. DATA MINING APPLICATION: The main application for Data Mining is Web Mining.

IX. DOMAINS FOR WEB MINING:

There are three domains that pertain to Web mining. What is Web Mining?

Web mining can be broadly defined as the automated discovery and analysis of useful information from documents and services using data mining techniques. Web mining is the application of data mining or other information process techniques to WWW, to find useful patterns. People can take advantage of these patterns to access WWW more efficiently. Data Mining, also popularly known as Knowledge Discovery in Databases (KDD).

Fig 3: Three domains to Web mining

Web content mining. Web structure mining. Web usage mining

Fig 2: Web Mining VIII. NEED FOE WEB MINING:

Now a day, the World Wide Web is a popular and interactive medium, ideal for publishing information. It is huge, diverse and dynamic and thus raises issue of

These metadata, are organized into structural collections (Eg : relational or object oriented databases) and can be analyzed.

b.

WEB STRUCTURE MINING:

The data which describes organizations of content. Intra page structure information includes the arrangement of various HTML or XML tags within a given page. This can be represented as tree structure, where the <html> tag becomes the root of the tree. The principal kind of inter page structure information is hyper links connecting one page to another.

Fig 4: Three domains of Web mining in detail

a.

WEB CONTENT MINING:

c.

WEB USAGE MINING:

Web content mining is an automatic process that extracts patterns from on line information, such as the HTML files, images, or Emails, and it already goes beyond only keywords extraction or some simple statistics of words and phrases in documents. Web content mining is the process of information or resource discovery from millions of source across the World Wide Web. There are two approaches in web content mining:

Web servers record and accumulate data about user interaction whenever requests for resources are received. Analysing the web access logs of different Web sites can help to understand the user behaviour and the Web structure, by improving the design of the colossal collection of resources.

X. WEB MINING TECHNIQUES: Agent based approaches. The common techniques for web mining are: Clustering / Classification. Association. Path analysis. Sequential patterns.

The agent based approach involves artificial intelligence system that can act autonomously or semi autonomously on behalf of a particular user, to discover and organize Web based information. Some intelligent Web agents use a user profile to search for relevant information then organize and interpret the discovered information. (Eg : Harvest).

Data approaches.

CLUSTERING / CLASSIFICATION.

The technique is used to develop profiles of items with similar characteristics. This ability enhances the discovery of relationships that are otherwise not obvious. Eg : Classification of Web access logs allows a company to discover the average age of customers who order a certain product.

The database approach focuses on integrating and organizing the heterogeneous and semi structured data on the Web into more structured and high level collections of resources.

Association Rules.

XII. CURRENT RESEARCH: As many researchers believe, it was Etzioni who first came up with the term of Web mining in his paper . He brought out a question: is it practical to mine Web data? He also suggested dividing the Web mining to three processes. The paper opened up a new active research field. There are increasing number of researcher working on this field and do some surveys around the data mining on the Web. The Web mining was clearly categorized as Web content mining, Web structure mining and Web usage mining in till 1999. The research works have been well classified since then. There have been some works around content mining, and structure mining, based on the research of Data mining and Information Retrieval, Information Extraction, and Artificial Intelligence.

Rules that govern databases of transactions where each transaction consists of a set of items. This technique is used to predict the correlation of items where the presence of one set items in a transaction implies (with a certain degree of confidence.) the presence of other items. \

Path Analysis.

A technique that involves the generation of some form of graph that represents relation[s] defined on Web pages. This can be the physical layout of a Web site in which the Web pages are nodes and the hypertext links between these pages are directed edges. Eg : What paths do users travel before they go to a particular URL.

Sequential Patterns.

Applied to Web access server transaction logs. The purpose is to discover sequential patterns that indicates user visit patterns over a certain period.

In the usage mining research area, several groups did distinguished work. R. Cooley et al. in University of Minnesota did in-depth research to all the procedure of usage mining. They proposed a mining prototype WebMiner and derived a system WebSIFT to perform the usage mining, which is relatively practical. O. Zaiane et al. proposed the idea of how to implement the OLAP technique on the Web mining. Their works on the multimedia data also provided a valuable solution for content mining. M. Spiliopoulou et al. focused on the applications of the usage mining. His works on the navigation pattern discovery and web site personalization has special meaning for the ecommerce society and the Web marketplace allocation, and will be very helpful for both Web user and administrator. The Web Utilization Miner system is aninnovative sequential mining system.

XI. WEB MINING AS A TOOL: Web mining can be a promising tool address ineffective search engine, which produce incomplete indexing, unverified reliability of retrieved information. Web mining discovers information from mounds of data on the WWW, but it also monitors and predicts user visit habits. This gives designers more reliable information. Web mining technology can help librarians design Web sites with path that can be travelled easily by end user, saving time and efforts. Eg: Web mining librarianship. technology and academic

J. Borges et al. has explored some algorithms to mine the user navigation pattern in and his other papers. He proposed a data mining model to achieve an efficient mining, which captures the user navigation behaviour pattern by using Ngrammar approach.

REFERENCES:

[1] www.datawarehousingonline.com
[2] Data base System Elmasri, Navathe. Data Mining Technologies Arun K Pujari. [3]http://www.cse.aucegypt.edu/~rafea/CSCE564/ sldes/WebMiningOverview.pdf

[4]https://cs.uwaterloo.ca/~tozsu/courses/cs748t/s
urveys/wang.pdf

[5] http://www.jatit.org/volumes/researchpapers/Vol18No1/10Vol18No1.pdf Fig 5: Web Mining Architecture.

[6] http://www.mozenda.com/web-miningsoftware

XIII. MINING TOOL: Mozenda

Mozenda is a Software as a Service (SaaS) company that enables users of all types to easily and affordably extract and manage web data. With Mozenda, users can set up agents that routinely extract data, store data, and publish data to multiple destinations. Once information is in the Mozenda systems users can format, repurpose, and mashup the data to be used in other online/offline applications or as intelligence. All data in the Mozenda system is secure and is hosted in class A data warehouses but can be accessed over the web securely via the Mozenda Web Console. With the addition of a fully featured REST API, Companies can now seamlessly integrate their data automation with the Mozenda application.

CONCLUSION: Data warehousing provides the means to change the raw data into information for making effective business decisions the emphasis on information, not data. The Data warehouse is the hub for decision support. Data mining is a useful tool with multiple algorithms that can be tuned for specific tasks. It benefits business, medical, and science.

Você também pode gostar