Escolar Documentos
Profissional Documentos
Cultura Documentos
Binod Kumar
ASRA Group
13/07/2011
1. Introduction 2. Motivation 3. Achieving Anonymity via Clustering 4. Proposed algorithm 5. Experimental result 6. Conclusion 7. Future Work
ASRA Group
13/07/2011
Data holders, Statistics Offices are facing tremendous demand for Person specific data for the application such as : Data mining Cost analysis Fraud detection
How can a data holder release a version of its private data with scientific guarantees that the individuals who are the subjects of the data cant be re-identified while the data remains practically useful for survey work.
ASRA Group 13/07/2011 4
k-Anonymity Model
Sensitive
Uniquely identify you!
Zipcode Age
75275 22
Gender
Male
75277
75278 75275 75275 75275
23
24 33 38 36
Male
Quasi-identifiers: Male Diabetes approximate foreign keys Male Flu Female Female
ASRA Group
Identifying
Mobile number Name Zipcode Gender age
Sensitive
Disease
9905150112 9905121223
Amit John
22 23
Flu Cold
9431103097
9334292352 9431109087 9421345678
Rajan
Robin Ramesh Dhoni
Female 38 Female 36
Sensitive Age
22 23 24
Gender
Male Male Male
Flu
33 38
36
Male Female
Female
Zip Code
Gender
Age
Disease
Expense
75277
75277 75277 75275 75275 75275
Male
Male Male Male Female Female
22
23 24 33 38 36
Flu
Cancer
100
3000 Quasi-identifiers: approximate foreign keys HIV+ 5000 Diabetes Diabetes Diabetes 2500 2800 2600
ASRA Group 13/07/2011 9
Zip Code Gender 7527* 7527* 7527* 7527* 7527* 7527* Person Person Person Person Person Person
Zip Code
Gender
Age
Disease
Expense
7527*
7527* 7527* 75275
Male
Male Male Person
[21-25]
[21-25] [21-25] [31-40]
Flu
Cancer HIV+ Diabetes
100
3000 5000 2500
75275
75275
Person
Person
[31-40]
[31-40]
Diabetes
Diabetes
ASRA Group
2800
2600
13/07/2011 11
Zipcode
83100*
Gender
Person
Age
[25-30]
Disease
Flu
ASRA Group
13/07/2011
12
ASRA Group
13/07/2011
13
ASRA Group
13/07/2011
14
ASRA Group
13/07/2011
15
America
Asia
North
South
East
West
USA
Canada
Brazil
Mexico
Iran
Egypt
India
Pakistan
C ( v i, v j)=H(( v i , v j ))/H(TD)
Fig : Taxonomy Tree of Country
ASRA Group 13/07/2011 16
Function greedy_k_member_clustering (S, k) If ( |S| k) Return S; End if; Result =; r = a randomly picked from S; While ( |S| k) r= the furthest record from r; S=S-{r}; C ={r}; While ( |C| < k) r= find_best_record(S,C); S=S-{r}; C=C U {r}; End while; Result =Result U {C}; End while; While ( |S| 0) r= a randomly picked record from S; S=S-{r}; C=find_best_cluster(Result, r); C=C U {r}; End while;
ASRA Group 13/07/2011 17
Function find_best_record (S, c) Input: a set of records S and a cluster c Output: a record r S such that IL(c U {r}) is minimal n= |S|; min=; best = null; for(i=1..n) r= i-th record in S; diff= IL(c U {r}) IL(c); If(diff<min) min=diff; best=r; End if; End for; Return best; End;
ASRA Group
13/07/2011
18
Function find_best_cluster (C, r) Input: a set of clusters C and a record r. Output: a cluster c C such that IL(c {r} is minimal n=|C|; min=; best=null; for( i=1..n) c=i-th cluster in C; diff=IL(CU{r}) IL(C); if(diff<min) min=diff; best=c; end if; end for; return best;
End.
ASRA Group
13/07/2011
19
ASRA Group
13/07/2011
20
The time complexity of this algorithm is O ((n2 log (n))/c), where c is the average number of records in each cluster. The time complexity of this algorithm is better than greedy k-member algorithm
ASRA Group 13/07/2011 21
The main goal of the experiments was to investigate the implementation of the k-anonymity model using clustering algorithm. We mainly focus on the data quality, k-anonymization and scalability which are main consideration of kanonymity model
ASRA Group 13/07/2011 23
ASRA Group
13/07/2011
24
Finally, keeping in mind data quality is the big problem in kanonymization. We also focus on data quality rather than the computation efficiency that should be the main consideration in kanonymity model, so we are encouraged by our result which demonstrates that our algorithm is flexible and is able to produce a range of desired anonymization.
ASRA Group 13/07/2011 25
Encouraged
by experimental result, we are currently working on more efficient heuristics to improve the performance of our approach. We are also working to utilize this clustering algorithm to detect fraud.
ASRA Group 13/07/2011 26
1. Sweeney, L.: k-Anonymity: A Model for Protecting Privacy. International Journal of Uncertainty, Fuzziness and Knowlege-Based Systems 10, 557570 (2002) 2. Efficient k-Anonymization using clustering techniques, Ji-Wyun, R.Kotagiri et al. (Eds.):DASFAA 2007,LNCS 4443, pp. 188-2007.
ASRA Group 13/07/2011 27
ASRA Group
13/07/2011
28