1,5K visualizações

Enviado por ARVIND

- R Reference
- Predicting Earthquakes Through Data Mining
- A FUZZY EXPERT SYSTEM FOR EARTHQUAKE PREDICTION, CASE STUDY: THE ZAGROS RANGE
- Process for Predicting Earthquakes Through Data Mining
- 7Data Mining
- 06751199
- Data Mining: Concepts and Techniques
- Data Mining
- Adopters And
- The Combined Approach for Anomaly Detection Using Neural Networks and Clustering Techniques
- Nonnegative Matrix Factorization for Interactive Topic Modeling and Document Clustering
- Chapter 1 Pattern Classification
- Fuzzy Clustering and Fuzzy C-Means
- Data Mining
- A Survey on Clustering Algorithms for Wireless Sensor Networks
- Pre Writing
- cs 2032 data warehousing and data mining question bank by gopi
- kmeans
- Untitled
- multimedia Data mining.docx

Você está na página 1de 10

DATA MINING

ABSRACT:

evolving set of techniques that can be patterns from unobservable dynamics

used to extract valuable information and using data mining techniques, pattern

knowledge from massive volumes of recognition and ensemble forecasting.

data. Data mining research &tools have Thus this paper gives insight on how data

focused on commercial sector mining can be applied in finding the

applications. Only a fewer data mining consequences of earthquakes and hence

research have focused on scientific data. alerting the public.

This paper aims at further data mining

study on scientific data. This paper

highlights the data mining techniques

INTRODUCTION

applied to mine for surface changes over

The field of data mining has evolved

time (e.g. Earthquake rupture). The data

from its roots in databases, statistics, artificial

mining techniques help researchers to

intelligence, information theory and

predict the changes in the intensity of

algorithms into a core set of techniques that

volcanos. This paper uses predictive

have been applied to a range of problems.

statistical models that can be applied to

Computational simulation and data

areas such as seismic activity , the

acquisition in scientific and engineering

spreading of fire. The basic problem in

domains have made tremendous progress

this class of systems is unobservable

over the past two decades. A mix of advanced

dynamics with respect to earthquakes.

algorithms, exponentially increasing

The space-time patterns associated with

computing power and accurate sensing and

time, location and magnitude of the

measurement devices have resulted in more

sudden events from the force threshold

data repositories.

are observable. This paper highlights the

1

PREDICTING EARTHQUAKES THROUGH DATA MINING

have enabled the communication of large • Data mining is defined as

volumes of data across the world. This results process of extraction of relavent data

in a need of tools &Technologies for and hidden facts contained in

effectively analyzing the scientific data sets databases and data warehouses.

with the objective of interpreting the • It refers to find out the new

underlying physical phenomena. Data mining knowledge about an application

applications in geology and geophysics have domain using data on the domain

achieved significant success in the areas as usually stored in the databases. The

weather prediction, mineral prospecting, application domain may be

ecology, modeling etc and finally predicting astrophysics, earth science or about

the earthquakes from satellite maps. solar system.

An interesting aspect of many of these Datamining techniques support to

applications is that they combine both spatial identify nuggets of information and

and temporal aspects in the data and in the extracting this information in such a

phenomena that is being mined. Data sets in way that ,this will support in decision

these applications comes from both making, prediction, forecasting and

observations and simulation. Investigations estimation.

on earthquake predictions are based on the

DATA MINING GOALS:

assumption that all of the regional factors can

• Bring together representatives of the data

be filtered out and general information about

mining community and the domain

the earthquake precursory patterns can be

science community so that they can

extracted.

understand the current capabilities and

Feature extraction involves a pre

research objectives of each other

selection process of various statistical

communities related to data mining.

properties of data and generation of a set of

• Identify a set of research objectives from

seismic parameters, which correspond to

the domain science community that would

linearly independent coordinator in the

be facilitated by current or anticipated

feature space. The seismic parameters in the

data mining techniques.

form of time series can be analyzed by using

• Identify a set of research objectives for

various pattern recognition techniques.

the data mining community that could

Statistical or pattern

support the research objectives of the

recognition methodology usually performs

domain science community.

this extraction process. Thus this paper gives

insight of mining the scientific data.

DATA MINING MODELS:

2

PREDICTING EARTHQUAKES THROUGH DATA MINING

and relationships in data patterns.The large complex database.

relationships in data patterns can be analyzed • Unknown events/unknown

via 2 types of models. algorithms: Use thresholds or

1. Descriptive models: Used to describe trends to identify transient or

patterns and to create meaningful otherwise unique events and

subgroups or clusters. therefore to discover new physical

2. Predictive models: Used to forecast phenomena.

explicit values, based upon patterns in ** This paper focuses on unknown

known results. **This paper focuses on events and known algorithms.

predictive models. 2. Relationship based mining:

In large databases data mining and • Spatial Associations: Identify

knowledge discovery comes in two flavors: events (e.g. astronomical objects)

1. Event based mining: at the same location. (e.g. same

• Known events/known region of the sky)

algorithms: Use existing physical • Temporal Associations:

models (descriptive models and Identify events occurring during

algorithms) to locate known the same or related periods of

phenomena of interest either time.

spatially or temporally within a • Coincidence Associations: Use

large database. clustering techniques to identify

• Known events/unknown events that are co-located within a

algorithms: Use pattern multi-dimensional parameter

recognition and clustering space.

properties of data to discover new ** This paper focuses on all

observational (physical) relationship-based mining.

relationships (algorithms) among User requirements for data

known phenomena. mining in large scientific

• Unknown events/known

databases:

algorithms: Use expected physical

• Cross identifications: Refers to

relationships (predictive models,

the classical problem of

Algorithms) among observational

associating the source list in one

parameters of physical phenomena

database to the source list in

to predict the presence of

another.

3

PREDICTING EARTHQUAKES THROUGH DATA MINING

search for correlations, tendencies, e.g. frequency counts

and trends between physical histograms.

parameters in multidimensional ♦ Attribute redefinition e.g.

data usually across databases. bodies mass index.

• Nearest neighbor ♦ Data analysis is a measure of

identification. Refers to the association and their

general application of clustering relationships between

algorithms in multidimensional attributes interestingness of

parameter space usually within a rules, classification ,prediction

database. etc.

• Systematic data exploration: 2. Visualization:

Refers to the application of broad ♦ Enhances EDA , make patterns

range of event based queries and visible in different views .

relationship based queries to a 3. Clustering(cluster analysis):

database in making a Clustering is a process of

serendipitous discovery of new grouping similar data. The data which

objects or a new class . is are not part of clustering are called

** This paper focuses on as outliers. How to cluster in different

correlation and Clustering. conditions,

DATA MINING TECHNIQUES: ♦ Class label is unknown: Group

The various data mining techniques are related data to form new classes,

1. Statistics e.g., cluster houses to find

2. Clustering distribution patterns

3. Visualization ♦ Clustering based on the

4. Association principle: maximizing the intra-

5. Classification & Prediction class similarity and minimizing

6. Outlier analysis the interclass similarity

7. Trend and evolution analysis ♦ It provides subgroups of

1. Statistics: population for further analysis or

♦ Data cleansing i.e. the removal action –very important when

of erroneous or irrelevant data dealing with large databases.

known as outliers. 4. Association (correlation and causality)

4

PREDICTING EARTHQUAKES THROUGH DATA MINING

Mining association rules finds the (ii) Chemical changes in Ground water

interesting correlation relationship among (iii) Radon Gas in Ground water wells.

large databases . Ground Water Levels:-

5. Classification and Prediction Changing water levels in deep wells

♦ Finding models (functions) are recognized as precursor to

that describe and distinguish earthquakes. The pre-seismic

classes or concepts for future variations at observation wells are as

prediction e.g., classify countries follows.

based on climate, or classify cars 1. A gradual lowering of water levels

based on gas mileage at a period of months or years.

♦ Presentation: decision-tree, 2. An accelerated lowering of water

classification rule, neural network levels in the last few months or

7. Trend and evolution analysis Tokyo tested the water after the

composition of water changed

♦ Sequential pattern mining,

significantly in the period around

periodicity analysis

earthquake area.

♦ Similarity-based analysis

3. They observed that the chloride

** This paper focuses on clustering

concentration is almost constant.

and visualization technique for

4. Levels of sulphate also showed a

predicting the

similar rise.

earthquakes.

Radon Gas in Ground water wells.

EARTHQUAKE PREDICTION.

(i) Ground water levels

5

PREDICTING EARTHQUAKES THROUGH DATA MINING

wells is a precursor of earthquakes

recognized by research group.

♦ Although radon has relatively a short

half life and is unlikely to seep the

surface through rocks from the depths at

which seismic is very soluble in water

and can routinely be monitored in wells

This proposes a multi-resolutional

springs show reaction to seismic events

approach, which combines local clustering

and they are monitored for earthquake

techniques in the data space with a non-

predictions..

hierarchical clustering in the feature space.

♦ There is no effective solution to the The raw data are represented by n-

problem. dimensional vector Xi of measurements Xk.

♦ To solve this problem earthquake The data space can be searched for patterns

catalogs, geo-monitoring time series data and can be visualized by using local or

about stationary seismo-tectonic remote pattern recognition and by advanced

properties of geological environment and visualization capabilities. The data space X is

expert knowledge and hypotheses transformed to a new abstract space Y of

♦ To solve this problem earthquake vectors Yj . The coordinates Yl of these

catalogs, geo-monitoring time series data vectors represent nonlinear functions of

about stationary seismo-tectonic measurements Xk, which are averaged in

properties of geological environment and space and time in given space-time windows.

expert knowledge and hypotheses about This transformation allows for coarse

earthquake precursors . graining of data (data quantization),

6

PREDICTING EARTHQUAKES THROUGH DATA MINING

and suppression of the noise and other Mutual Nearest Neighbour algorithm (MNN).

random components. The new features Yl This type of clustering extracts the localized

form a N-dimensional feature space. We use clusters in the high resolution data space. In

multi-dimensional scaling procedures for the feature space we are searching for global

visualizing the multi-dimensional events in clusters of time events comprising similar

3D space. This transformation allows a events from the whole time interval.

visual inspection of the N-dimensional The non-hierarchical clustering

feature space. The visual analysis helps algorithms are used mainly for extracting

greatly in detecting subtle cluster structures compact clusters by using global knowledge

which are not recognized by classical about the data structure. We use improved

clustering techniques, selecting the best mean based schemes, such as a suite of

pattern detection procedure used for data moving schemes, which uses the k-means

clustering, classifying the anonymous data procedure and four strategies of its tuning by

and formulating new hypothesis. moving the data vectors between clusters to

obtain a more precise location of the

minimum of the goal function:

j (ω , n) = ∑J ∑ | xi − z j | 2

ε

i Cj

of mass of the cluster j , while xi are the

feature vectors closest to zj . To find a global

minimum of function J (), we repeat the

Clustering schemes Clustering clustering procedures at different initial

analysis is a mathematical concept whose conditions. Each new initial configuration is

main role is to extract the most similar constructed in a special way from the

separated sets of objects according to a given previous results by using the methods. The

similarity measure. This concept has been cluster structure with the lowest J (w, n)

used for many years in pattern recognition. minimum is selected.

Depending on the data structures and goals of HIERARCHICAL CLUSTERING

classification, different clustering schemes METHODS:

must be applied. A hierarchical clustering method

In our new approach we use two produces a classification in which small

different classes of clustering algorithms for clusters of very similar molecules are nested

different resolutions. In data space we use within larger clusters of less closely-related

7

PREDICTING EARTHQUAKES THROUGH DATA MINING

methods generate a classification in a bottom- them. A systematic evaluation of all possible

up manner, by a series of agglomerations in partitions is quite infeasible, and many

which small clusters, initially containing different heuristics have described to allow

individual molecules, are fused together to the identification of good, but possibly sub-

form progressively larger clusters. optimal, partitions. Three of the main

Hierarchical agglomerative methods are often categories of non-hierarchical method are

characterized by the shape of the clusters they single-pass, relocation and nearest neighbour.

tend to find, as exemplified by the following Single-pass method (e.g. Leader) produce

range: single-link - tends to find long, clusters that are dependent upon the order in

straggly, chained clusters; Ward and group- which the compounds are processed, and so

average - tend to find globular clusters; will not be considered further. Relocation

complete-link - tends to find extremely methods, such as k-means, assign compounds

compact clusters. Hierarchical divisive to a user-defined number of seed clusters and

methods generate a classification in a top- then iteratively reassign compounds to

down manner, by progressively sub-dividing produce the better clusters result. Such

the single cluster which represents an entire methods are prone to reaching local optimum

dataset .Monothetic (divisions based on just a rather than a global optimum, and it is

single descriptor) hierarchical divisive generally not possible to determine when or

methods are generally much faster in where the global optimum solution has been

operation than the corresponding polythetic reached. Nearest neighbour methods, such as

(divisions based on all descriptors) the Jarvis-Patrick method, assign compounds

hierarchical divisive and hierarchical to the same cluster as some number of their

agglomerative methods, but tend to give poor nearest neighbours. User-defined parameters

results. One problem with these methods is determine how many nearest neighbours need

how to choose which clusters or partitions to to be considered, and the necessary level of

extract from the hierarchy because display of similarity between nearest neighbour lists.

the complete hierarchy is not really Other non-hierarchical methods are generally

appropriate for data sets of more than a few inappropriate for use on large, high-

hundred compounds. dimensional datasets such as those used in

NON-HIERARCHICAL CLUSTERING chemical applications.

METHODS DATA MINING APPLICATIONS

A non-hierarchical method generates a ♦ In Scientific discovery – super

classification by partitioning a dataset, giving conductivity research, For Knowledge

a set of (generally) non-overlapping groups Acquisition.

8

PREDICTING EARTHQUAKES THROUGH DATA MINING

cost analysis, genetic sequence analysis, The problem of earthquake

prediction etc. prediction is based on data extraction

♦ In Engineering – automotive diagnostics of pre-cursory phenomena and it is

expert systems, fault detection etc., highly challenging task various

♦ In Finance – stock market perdition, computational methods and tools are

credit assessment, fraud detection etc. used for detection of pre-cursor by

FUTURE ENHANCEMENTS extracting general information from

mining since 2000 have been truly Darwinian clustering we are able to perform multi-

and show promise of consolidating and resolutional analysis of seismic data starting

stabilizing around predictive analytics. from the raw data events described by their

Nevertheless, the emerging market for magnitude spatio-temporal data space. This

predictive analytics has been sustained by new methodology can be also used for the

professional services, service bureaus and analysis of the data from the geological

profitable applications in verticals such as phenomena e.g. We can apply this clustering

applications. Predictive analytics have Books:

successfully proliferated into applications to

1. W.Dzwinel et

support customer recommendations, customer

al Non multidimensional scaling and

value and churn management, campaign

visualization of earth quake cluster over

optimization, and fraud detection. On the

space and feature space, nonlinear

product side, success stories in demand

processes in geophysics 12[2005] pp1-12.

planning, just in time inventory and market

2. C.Lomnitz.

basket optimization are a staple of predictive

Fundamentals of Earthquake prediction

analytics. Predictive analytics should be used

[1994]

to get to know the customer, segment and

3. B.Gutenberg &

predict customer behavior and forecast

C.H. Richtro, Earthquake magnitude,

product demand and related market

intensity, energy & acceleration bulseism

dynamics.Finally, they are at different stages

soc. Am 36, 105-145 [1996]

of growth in the life cycle of technology

4. C.Brunk,

innovation.

J.Kelly & Rkohai “Mineset An integrate

9

PREDICTING EARTHQUAKES THROUGH DATA MINING

Mining & Analytical Data Mining”,

proceeding of the 3rd conference on KDD

1997.

5. Andenberg

M.R.Cluster Analysis for application,

New York, Acedamic, Press 1973.

Websites:

www.dmreview.com

www.aaai.org/Press/Books/kargupta2.php

www.forrester.com

www.ftiweb.com

10

- R ReferenceEnviado porAlann83
- Predicting Earthquakes Through Data MiningEnviado porPrathyusha Reddy
- A FUZZY EXPERT SYSTEM FOR EARTHQUAKE PREDICTION, CASE STUDY: THE ZAGROS RANGEEnviado porMehdi Zare
- Process for Predicting Earthquakes Through Data MiningEnviado porAnusha Saranam
- 7Data MiningEnviado porPuneet Khatri
- 06751199Enviado porDebopriyo Banerjee
- Data Mining: Concepts and TechniquesEnviado pormahendirana
- Data MiningEnviado poranashussain
- Adopters AndEnviado porAulia Hussin
- The Combined Approach for Anomaly Detection Using Neural Networks and Clustering TechniquesEnviado porcseij
- Nonnegative Matrix Factorization for Interactive Topic Modeling and Document ClusteringEnviado porDa Kuang
- Chapter 1 Pattern ClassificationEnviado porHemal Vyas
- Fuzzy Clustering and Fuzzy C-MeansEnviado porNguyễn Duy Hiếu
- Data MiningEnviado porapi-3849393
- A Survey on Clustering Algorithms for Wireless Sensor NetworksEnviado porFranko Duka
- Pre WritingEnviado porfcrocco
- cs 2032 data warehousing and data mining question bank by gopiEnviado porapi-292373744
- kmeansEnviado porrajarajeswari
- UntitledEnviado porBoni Timukis
- multimedia Data mining.docxEnviado porzemichael
- 779882012040568083865Enviado poranandintel
- What is Data MiningEnviado porGustavo Alves
- Chapter 1Enviado porPurnanand Kumar
- The World in a Nutshell Concise Range QueriesEnviado porVinaya Kumar S
- Market Structure Analysis fEnviado por24500
- Artigo - An Improved DBSCAN Algorithm to Detect Stops in Individual TrajectoriesEnviado pordaniel
- First ReviewEnviado porAnonymous TxPyX8c
- 3featEnviado porBudi Joyo
- A Brief Review of Segmentation Methods for Medical ImagesEnviado poresatjournals
- Clustering Methods for Distributed SS in CREnviado porFarrukh Aziz Bhatti

- Shunt Capacitor Bank Protection GuideEnviado porARVIND
- Generator Hydrogen Gas System DiagramEnviado porARVIND
- skin response circuitEnviado porurbikash081266
- DC to DC ConversionEnviado porARVIND
- substation DesignEnviado porDundikumar
- Capacitors for Power Factor CorrectionEnviado pormessallam
- Touch Screen Sensor MatheenEnviado porARVIND
- ZIGBEEEnviado porARVIND
- Centralised reactive power compensationEnviado porHans De Keulenaer
- Electrical SubstationEnviado porMohammedSaadaniHassani
- ZigBeeEnviado porARVIND
- zigbeeEnviado porARVIND
- Protection of Transmission Lines Using Series Compensation CapacitorsEnviado porkittleboy
- Simulation LabEnviado porARVIND
- 30917947 Reactive Power Compensation Using Capacitor BanksEnviado porweikotoRC
- Broadband 123Enviado porARVIND
- Guide for LV Compensation CubiclesEnviado porARVIND
- Simulation ManualEnviado porARVIND
- Reactive Power CompensationEnviado porSyed Muhammad Munavvar Hussain
- Reactive)Enviado porARVIND
- Fuel Energizer 1Enviado pormishramanish044705
- 3 Reactive ServicesEnviado porARVIND
- pro2_11Enviado porARVIND
- Cap Bank SwitchingEnviado porAmal Nath Mani
- Control SystemEnviado porARVIND
- jewell_powerfactorEnviado porARVIND
- electricalEnviado porARVIND
- Electrical and Electronics LabEnviado porARVIND
- ECAEnviado porARVIND

- MetesEnviado porPegah Janipour
- CrossValidation.pdfEnviado porboulby692555
- 44Enviado porasdsadsa322
- Statistics for Machine Learning • Techniques for ExploringEnviado porJuan Manuel Báez Cano
- ModelingEnviado portatodc7
- A Possibilistic and Probabilistic Approach to Precautionary Saving FinalEnviado porIrina Alexandra
- syllabusEnviado porTruong Huynh
- Advanced Processing and InterpretationEnviado porShashank Sinha
- FahrmeirAndTutz-Generalized Additive ModelsEnviado porMarcel Irving
- JoSS.pdfEnviado porJ
- WeiBull AnalysisEnviado porcerato2
- Spatial Prediction of Soil Properties Using EnviroEnviado porAsmaa Abu Hammad
- Link Ratio MethodEnviado porLuis Enrique Chio
- Price PurityEnviado porAngelesOrtiz
- Algoritmos20.pdfEnviado porpalcoip
- bayes.pdfEnviado pordiahnacarter
- Determining Idf Equations for the State of RondoniaEnviado porFrancisco Evandro
- tong 1983Enviado porDanny Infante Sanchez
- Experimental Practice and an Error Statistical Account of Evidence (Deborah Mayo).pdfEnviado porjosepepefunes26
- Matheron.pdfEnviado porFredy HC
- ChannelEnviado porSaroj Pandey
- Howell and Pevehouse - Presidents, Congress, And the Use of ForceEnviado porBill Johnson
- Z-score GFJEnviado porMinshines
- Statistical Modelling of Financial Time Series - An IntroductionEnviado porKofi Appiah-Danquah
- ASCEFinalReport8-4-04Enviado porDhiraj Kasliwal
- Lidong PunyaEnviado porRizal Mattawang
- Complex SurveysEnviado porG Delis
- Dynamic Model for COTS Glue Code Development and COTS IntegrationEnviado poreltantillo
- LISREL 9.1 Release NotesEnviado porSagita Fajarahayu
- Stanford University ,Structural Health Monitoring in Extreme Events From Machine Learning PerspectiveEnviado porAnirudh Kumar