K-Anonymity Model

Presented by Anubhav,Saurav,Ravi,Ashutosh (ASRA Group) CSE/2k7 Guided by Prof.
Binod Kumar
ASRA Group
13/07/2011
1. Introduction 2. Motivation 3. Achieving Anonymity via Clustering 4. Proposed algorithm 5. Experimental result 6. Conclusion 7. Future Work
ASRA Group
13/07/2011
Data holders, Statistics Offices are facing tremendous demand for Person specific data for the application such as : Data mining Cost analysis Fraud detection
ASRA Group 13/07/2011 3
How can a data holder release a version of its private data with scientific guarantees that the individuals who are the subjects of the data cant be re-identified while the data remains practically useful for survey work.
ASRA Group 13/07/2011 4
k-Anonymity Model
ASRA Group 13/07/2011 5
Sensitive
Uniquely identify you!
Zipcode Age
75275 22
Gender
Male
Disease Flu Cold
75277
75278 75275 75275 75275
23
24 33 38 36
Male
Quasi-identifiers: Male Diabetes approximate foreign keys Male Flu Female Female
Arthritis Heart problem

13/07/2011 6
ASRA Group
Identifying
Mobile number Name Zipcode Gender age
Sensitive
Disease
9905150112 9905121223
Amit John
75275 75277 75278 75275 75275 75275
Male Male Male Male
22 23
Flu Cold
9431103097
9334292352 9431109087 9421345678
Rajan
Robin Ramesh Dhoni
Quasi-identifiers: Diabetes 24 approximate foreign keys

33 Flu Arthritis Arthritis
Female 38 Female 36
ASRA Group 13/07/2011 7
Sensitive Age
22 23 24
Gender
Male Male Male
Zip code Disease

75275
Flu
Cold 75277 Quasi-identifiers: approximate foreign keys Diabetes 75278

75275 75275
75275
33 38
36
Male Female
Female
Flu Arthritis Heart problem

ASRA Group 13/07/2011 8
Zip Code
Gender
Age
Disease
Expense
75277
75277 75277 75275 75275 75275
Male
Male Male Male Female Female
22
23 24 33 38 36
Flu
Cancer
100
3000 Quasi-identifiers: approximate foreign keys HIV+ 5000 Diabetes Diabetes Diabetes 2500 2800 2600
ASRA Group 13/07/2011 9
Zip Code Gender 7527* 7527* 7527* 7527* 7527* 7527* Person Person Person Person Person Person
Age [21-30] [21-30] [21-30] [31-40] [31-40] [31-40]
Disease Flu Cancer HIV+ Diabetes Diabetes Diabetes
Expense 100 3000 5000 2500 2800 2600
ASRA Group 13/07/2011 10
Zip Code
Gender
Age
Disease
Expense
7527*
7527* 7527* 75275
Male
Male Male Person
[21-25]
[21-25] [21-25] [31-40]
Flu
Cancer HIV+ Diabetes
100
3000 5000 2500
75275
75275
Person
Person
[31-40]
[31-40]
Diabetes
Diabetes
ASRA Group
2800
2600
13/07/2011 11
Zipcode
83100*
Gender
Person
Age
[25-30]
Disease
Flu
82530* 83400* 83100* 82530* 83400* 82530* 83100* 83400*
Person Person Person Person Person Person Person Person
[10-15] [30-35] [25-30] [15-20] [30-35] [25-30] [25-30] [30-35]
Obesity Cancer HIV+ Cancer Diabetes Obesity Flu Flu
ASRA Group
13/07/2011
12
How to decide number of cluster?
ASRA Group
13/07/2011
13
Distance between two numerical values
ASRA Group
13/07/2011
14
ASRA Group
13/07/2011
15
Distance between two Categorical values

Country
America
Asia
North
South
East
West
USA
Canada
Brazil
Mexico
Iran
Egypt
India
Pakistan
C ( v i, v j)=H(( v i , v j ))/H(TD)
Fig : Taxonomy Tree of Country
ASRA Group 13/07/2011 16
Function greedy_k_member_clustering (S, k) If ( |S| k) Return S; End if; Result =; r = a randomly picked from S; While ( |S| k) r= the furthest record from r; S=S-{r}; C ={r}; While ( |C| < k) r= find_best_record(S,C); S=S-{r}; C=C U {r}; End while; Result =Result U {C}; End while; While ( |S| 0) r= a randomly picked record from S; S=S-{r}; C=find_best_cluster(Result, r); C=C U {r}; End while;
ASRA Group 13/07/2011 17
Function find_best_record (S, c) Input: a set of records S and a cluster c Output: a record r S such that IL(c U {r}) is minimal n= |S|; min=; best = null; for(i=1..n) r= i-th record in S; diff= IL(c U {r}) IL(c); If(diff<min) min=diff; best=r; End if; End for; Return best; End;
ASRA Group
13/07/2011
18
Function find_best_cluster (C, r) Input: a set of clusters C and a record r. Output: a cluster c C such that IL(c {r} is minimal n=|C|; min=; best=null; for( i=1..n) c=i-th cluster in C; diff=IL(CU{r}) IL(C); if(diff<min) min=diff; best=c; end if; end for; return best;
End.
ASRA Group
13/07/2011
19
ASRA Group
13/07/2011
20
The time complexity of this algorithm is O ((n2 log (n))/c), where c is the average number of records in each cluster. The time complexity of this algorithm is better than greedy k-member algorithm
ASRA Group 13/07/2011 21
It is difficult to decide a proper

value for the user-defined threshold This algorithm might delete many records, which in turn cause a significant information loss. This algorithm is less sensitive to outliers
ASRA Group 13/07/2011 22
The main goal of the experiments was to investigate the implementation of the k-anonymity model using clustering algorithm. We mainly focus on the data quality, k-anonymization and scalability which are main consideration of kanonymity model
ASRA Group 13/07/2011 23
ASRA Group
13/07/2011
24
Finally, keeping in mind data quality is the big problem in kanonymization. We also focus on data quality rather than the computation efficiency that should be the main consideration in kanonymity model, so we are encouraged by our result which demonstrates that our algorithm is flexible and is able to produce a range of desired anonymization.
ASRA Group 13/07/2011 25
Encouraged
by experimental result, we are currently working on more efficient heuristics to improve the performance of our approach. We are also working to utilize this clustering algorithm to detect fraud.
ASRA Group 13/07/2011 26
1. Sweeney, L.: k-Anonymity: A Model for Protecting Privacy. International Journal of Uncertainty, Fuzziness and Knowlege-Based Systems 10, 557570 (2002) 2. Efficient k-Anonymization using clustering techniques, Ji-Wyun, R.Kotagiri et al. (Eds.):DASFAA 2007,LNCS 4443, pp. 188-2007.
ASRA Group 13/07/2011 27
ASRA Group
13/07/2011
28

K-Anonymity Model

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

K-Anonymity Model

Enviado por

Direitos autorais:

Formatos disponíveis

Presented by Anubhav,Saurav,Ravi,Ashutosh (ASRA Group) CSE/2k7 Guided by Prof.

ASRA Group 13/07/2011 3

ASRA Group 13/07/2011 5

Disease Flu Cold

Arthritis Heart problem

75275 75277 75278 75275 75275 75275

Male Male Male Male

Quasi-identifiers: Diabetes 24 approximate foreign keys

ASRA Group 13/07/2011 7

Zip code Disease

Cold 75277 Quasi-identifiers: approximate foreign keys Diabetes 75278

Flu Arthritis Heart problem

Age [21-30] [21-30] [21-30] [31-40] [31-40] [31-40]

Disease Flu Cancer HIV+ Diabetes Diabetes Diabetes

Expense 100 3000 5000 2500 2800 2600

ASRA Group 13/07/2011 10

82530* 83400* 83100* 82530* 83400* 82530* 83100* 83400*

Person Person Person Person Person Person Person Person

[10-15] [30-35] [25-30] [15-20] [30-35] [25-30] [25-30] [30-35]

Obesity Cancer HIV+ Cancer Diabetes Obesity Flu Flu

How to decide number of cluster?

Distance between two numerical values

Distance between two Categorical values

It is difficult to decide a proper

Você também pode gostar