Você está na página 1de 10

DISSERTATION

SEMESTER & PROGRAM: XI SEM B.TECH + LL.B (HONS.)

TITLE: DATA MINING & ITS IMPACT ON PRIVACY

Name Enrollment No. Sap ID.


Abhimanyu Agarwal R120214001 500040403

Submitted under the guidance of: Mr. Himanshu Dhandharia


(Assistant Professor)

This synopsis of dissertation is submitted in partial fulfilment of the degree of


B.Tech.,LL.B.(Hons.)

School of Law
University of Petroleum and Energy Studies
Dehradun

(November, 2019)
DECLARATION/UNDERTAKING OF ORIGINALITY

I, Abhimanyu Agarwal having Enrollment No R120214001 SAP ID 500040403 declare


that the Synopsis titled “Data Mining and its impact on Privacy” is the outcome of
my original work conducted under the supervision of Ms. Himanshu Dhandharia at
School of Law, University of Petroleum and Energy Studies, Dehradun.

I undertake full responsibility of the contents of this Synopsis complying with the
‘Academic Integrity’ policy of UPES and I understand that if this work is found in
violation of the same, this may result in rejection of Synopsis/Dissertation and entail
appropriate disciplinary proceedings as per Rules of the University.

Signature
[Name of the Student]
Date……………
Place…………….

Endorsement by the Mentor:


Date of final Submission:………………………..
Antiplagiarism Check /Similarity found: ………..
Late Submission………………………………….

Signature
[Name of the Mentor]
Date……………

Introduction
These days, web app and services provide the users a wide range of services such as e-
commerce, e-banking, e-governance, etc. To access them, the users need to provide
private information such as social security nos., card numbers. Companies and
organizations want to gather and analyze this information efficiently through data
mining processes. The aim of data mining is to turn raw data into useful information
that allows developing more effective marketing strategies, increase sales, decrease
costs, etc.1

The field of data mining is attaining significant recognition due to the availability of
large amounts of data, easily collected and stored via computer systems. The
information can be used to increase revenue, cut costs or both. Data mining software is
one of a number of analytical tools for analyzing data.2 It allows users to analyze data
privacy is growing constantly. Data mining, popularly known as Knowledge Discovery
in Databases (KDD), is the non-trivial extraction of implicit, previously unknown and
potentially useful information from databases.3 Though, data mining and KDD are
frequently treated as synonyms, data mining is actually part of the knowledge discovery
process.4

Data mining is the process of analyzing data from different perceptions and
summarizing it into useful information from many different angles, categorizing it, and
summarizing the relationships recognized. Continuous innovations in computer
processing power, disk storage, and statistical software are dramatically increasing the
accuracy of analysis while driving down the cost.5 “Data mining, the discovery of new
and interesting patterns in large datasets, is an exploding field. One aspect is the use of
data mining to improve security, e.g., for intrusion detection. A second aspect is the
potential security hazards posed when an adversary has data mining capabilities.”6

1
https://ieeexplore.ieee.org/document/8123561
2
Introduction to Data Mining and Knowledge Discovery, Third Edition ISBN: 1-892095-02-5, Two
Crows Corporation, 10500 Falls Road, Potomac, MD 20854 (U.S.A.), 1999
3
Dunham, M. H., Sridhar S., “Data Mining: Introductory and Advanced Topics”,Pearson Education,
New Delhi, ISBN: 81-7758-785-4, 1st Edition, 2006
4
Fayyad, U., Piatetsky-Shapiro, G., and Smyth P., “From Data Mining to Knowledge Discovery in
Databases,” AI Magazine, American Association for Artificial Intelligence, 1996
5
L. Getoor, C. P. Diehl. “Link mining: a survey”, ACM SIGKDD Explorations, vol. 7, pp. 3-12, 2005.
6
http: //ijcttjournal.org/ Volume4/issue-2/IJCTT-V4I2P129.pdf
Privacy issues have appealed the attention of the media, government agencies, privacy
advocates and businesses.

WHAT IS DATA MINING?

"Data mining is an iterative and interactive process of discovering something


innovative. The same as Novel-something we are not aware, Valid-generalize the
future, Useful-some reaction is possible, Understandable-leading to insight, many step
and process. Data mining is the process of discovering meaningful new correlations,
patterns and trends by sifting through large amounts of data stored in repositories, using
pattern recognition technologies as well as statistical and mathematical techniques.”7
There are other definitions:

“Data mining is the analysis of (often large) observational data sets to find unsuspected
relationships and to summarize the data in novel ways that are both understandable and
useful to the data owner”.8

“Data mining is an interdisciplinary field bringing together techniques from machine


learning, pattern recognition, statistics, databases, and visualization to address the issue
of information extraction from large data bases”.9

The legal and policy foundation for data mining is based on the some specified
protocols, which established penalization for data security and privacy Government
Act, which requires consequence to provide a level of security for data mining, that is
adequate with the level of security provided for data. 10

PRIVACY

As additional information sharing and data mining initiatives have been announced,
increased attention has focused on the implications for privacy. Concerns about privacy
focus both on actual projects proposed, as well as concerns about the potential for data
mining applications to be expanded beyond their original purposes. For example, some

7
Ibid
8
David Hand, Heikki Mannila, and Padhraic Smyth,”Principles of Data Mining”,MIT Press,
Cambridge, MA, 2001
9
Peter Cabena, Pablo Hadjinian, Rolf Stadler, JaapVerhees, and Alessandro Zanasi, Discovering Data
Mining: From Concept to Implementation, Prentice Hall, Upper Saddle River, NJ, 1998.
10
http: //ijcttjournal.org/ Volume4/issue-2/IJCTT-V4I2P129.pdf
experts suggest that anti-terrorism data mining applications might also be useful for
combating other types of crime as well.11 “Observers contend that tradeoffs should be
made regarding privacy to ensure security. Others suggest that existing laws and
regulations regarding privacy protections are adequate, and that these initiatives do not
pose any threats to privacy.” 12 Still some observers argue that not enough is known
about how data mining projects will be carried out, and that greater oversight is needed.
There is also some disagreement over how privacy concerns should be addressed. Some
observers suggest that technical solutions are adequate initiatives.13 From the security
perspective, data mining has been shown to be beneficial in confronting various types
of attacks to computer systems. However, the same technology can be used to create
potential security hazards.14 In addition to that, data collection and analysis efforts by
government agencies and businesses raised fears about privacy, which motivated the
privacy preserving data mining research.15

Statement of Problem

The important problem raised by data mining is the problem of individual privacy. Data
mining helps in analyzing business transactions, etc. and gathering a significant amount
of information about individuals’ habits and preferences.

The data is collected by different organizations on first hand basis by directly asking
customers about it or in second hand by buying the data from other organization. But
what do they do with the data is not known to the Customers. The data that the
Organizations receive is used by them for digital profiling of individuals which can be
detrimental to their privacy.

Since these days, huge importance is given to the data floating on the network, and the
society becoming more and more money minded, organizations are mining huge

11
Agrawal, R, and R. Srikant,“Privacy-preserving Data Mining,” Proceedings of the ACM SIGMOD
Conference, Dallas, TX, May2000.
12
Clifton, C., M. Kantarcioglu and J. Vaidya,“Defining Privacy for Data Mining,” Purdue University,
2002.
13
Evfimievski, A., R. Srikant, R. Agrawal, and J. Gehrke, “Privacy Preserving Mining of Association
Rules,” In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge
Discovery and Data Mining. Edmonton, Alberta, Canada, July 2002.
14
Fung B., Wang K., Yu P. ”Top-Down Specialization for Information and Privacy Preservation.
ICDE Conference, 2005
15
Wang K., Yu P., Chakraborty S., “Bottom-Up Generalization: A Data Mining Solution to Privacy
Protection.”, ICDM Conference, 2004
amount of data to sell them for their economic benefits. There entities are not concerned
with the damages caused to the individual but the benefits they are reaping.

Review of Literature

1. Data Security and Privacy in Data Mining: Research Issues& Preparation


In this paper the authors focus on key online privacy and security issues
and concerns, the role of self-regulation and the user on privacy and
security protections, data protection laws, regulatory trends, and the outlook
for privacy and security legislation. Recent developments in information
technology have enabled collection and processing of enormous amount of
personal data, such as criminal records, online shopping habits, online
banking, credit and medical history, and driving records and almost
importantly the government concerned data.16

2. When do Data Mining Results Violate Privacy?


Privacy-preserving data mining has concentrated on obtaining valid results
when the input data is private. An extreme example is Secure Multiparty
Computation-based methods, where only the results are revealed. However, this
still leaves a potential privacy breach: Do the results themselves violate privacy?
This paper explores this issue, developing a frame-work under which this
question can be addressed. Metrics are proposed, along with analysis that those
metrics are consistent in the face of apparent problems.17

3. Optimal Randomization for Privacy Preserving Data Mining


Randomization is an economical and efficient approach forprivacy preserving
data mining (PPDM). In order to guaran-tee the performance of data mining and
the protection of in-dividual privacy, optimal randomization schemes need to
beemployed. This paper demonstrates the construction of op-timal
randomization schemes for privacy preserving densityestimation. We propose a
general framework for randomiza-tion using mixture models. The impact of
randomization ondata mining is quantified by performance degradation

16
http://ijcttjournal.org/Volume4/issue-2/IJCTT-V4I2P129.pdf
17
http://mg.scihub.ltd/10.1145/1014052.1014126
andmutual information loss, while privacy and privacy loss arequantified by
interval-based metrics. Two different typesof problems are defined to identify
optimal randomizationfor PPDM. Illustrative examples and simulation results
arereported.18

4. Big data's impact on privacy, security and consumer welfare


Big data has some intrinsic features that are tightly linked to a number
of privacy, security and welfare concerns. Moreover, these concerns are
linked with the collection and storing of data as well as data sharing and
accessibility by third parties and various user types. Overall firms' uses
of big data raise a wide range of ethical issues because they may lead to
potential exploitation of consumers and disregard their interests and
sometimes firms even engage in deceptive practices. As the above
discussion has already pointed out, while consumers' decisions to
withhold information may hinder the ability of the society to benefit
from big data, consumers are also rightly concerned about potential
abuses and misuses of their information. Regarding the privacy issues,
consumers are often uncomfortable and embarrassed when they feel that
companies know more about them than they are willing to voluntarily
provide. Big data is likely to affect welfare of unsophisticated,
vulnerable and technologically unsavvy consumers more negatively.
Such consumers may lack awareness of multiple information sources
and are less likely to receive up to date and accurate information about
multiple suppliers in a manner that facilitates effective search and
comparisons. They are also not in a position to assess the degree of
sensitiveness of their online actions and are more likely to be tricked by
illicit actors. A number of uses of big data currently fall into a regulatory
gray area. Due to the underdeveloped regulatory institutions, there is a
need to have a firm-level big data policy, which must take into account
the degree of sensitivity of information used in predictive modeling. Yet
most organizations have not developed best practices to ensure privacy

18
http://mg.scihub.ltd/10.1145/1014052.1014153
and security of customer data. There is also the question of whose
welfare, preferences and opinions are to prevail in the formulation of big
data related laws and policies in the future. The increasing consumer
concerns are likely to force further regulatory response to ensure that
consumers' interests are protected.19

Research Objective

The dissertation has following objectives:

 To understand the threats and challenges of data mining

 To analyze the impact of data mining on the privacy of individuals

 To understand the laws and regulations regarding privacy of individuals

 To understand the applicability of the privacy laws on the data mining and their
technologies.

 To understand as to the practical usage of the laws in the cases of Data


warehouse mining and big data

Research Questions

 Whether the analysis of Data violates the privacy of individuals whose data is
referred to?
 Can Privacy be preserved while mining the data.
 Is masking of the data bases possible
 Whether the present laws on privacy able to tackle the problem posed by data
mining?
 What are the impact on privacy with the growth in technology?

Hypothesis

That the Data Mining in general sense is having a great detrimental effect on privacy
of individuals since the data of the users which is taken by the organizations is many
times used to relate and access the crucial personal and private data of the individuals

19
http://mg.scihub.ltd/10.1016/j.telpol.2014.10.002
which can and is generally being used to cause grievous damage to the physical and
mental health of the individuals.

Methodology

The research methodology adopted for making this project is doctrinal research
methodology .Doctrinal research asks what the law is on a particular issue. It is
concerned with analysis of the legal doctrine and how it has been developed and
applied. This type of research is also known as pure.
The research methodology includes comparative study, inductive order, Qualitative
analysis and most importantly historical and recent analysis.
Historical analysis is defined as the integral component of the study of history.
Specifically, it entails interpretation and understanding of various historical events,
documents and processes. History is best understood as not a series of facts, but rather
as a series of competing interpretive narratives. Whereas Qualitative analysis is a
research method that uses open-ended interviewing to study and understand the
attitudes, opinions, feelings, and behavior of individuals or a group of individuals.

Scope of Study

The scope of this study is the use of the internet for the purposes of understanding the
process of data mining and its impact on the privacy of individuals.

Limitations of the study

Limitations faced during the study are:

 Study is limited to the information available on the net.


 Understanding of international languages limits the study.
 Geographical barriers to the data available on net.

Chapterization (tentative)

 Introduction
 Data mining
 Data Privacy
 Impact of Mining on Privacy
 Measures that can be adopted to prevent loss of privacy.
 Conclusion

Você também pode gostar