Bem-vindo(a) ao Scribd!

Pular no carrossel

Ashok Index

Enviado por

ali

0% acharam este documento útil (0 voto)

6 visualizações4 páginas

papers

Direitos autorais

Formatos disponíveis

DOCX, PDF, TXT ou leia online no Scribd

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Denunciar este documento

papers

Direitos autorais:

Formatos disponíveis

Baixe no formato DOCX, PDF, TXT ou leia online no Scribd

Sinalizar o conteúdo como inadequado

0% acharam este documento útil (0 voto)

6 visualizações4 páginas

Ashok Index

Enviado por

ali

papers

Direitos autorais:

Formatos disponíveis

Baixe no formato DOCX, PDF, TXT ou leia online no Scribd

Sinalizar o conteúdo como inadequado

Pular para a página

Você está na página 1de 4

Pesquisar no documento

ABSTRACT

Duplicate detection is the process of identifying multiple representations of same

real world entities. Today, duplicate detection methods need to process ever larger
datasets in ever shorter time: maintaining the quality of a dataset becomes increasingly
difficult. This project present two novel, progressive duplicate detection algorithms that
significantly increase the efficiency of finding duplicates if the execution time is limited:
They maximize the gain of the overall process within the time available by reporting
most results much earlier than traditional approaches. Comprehensive experiments show
that our progressive algorithms can double the efficiency over time of traditional
duplicate detection and significantly improve upon related work.
Both PSNM and PB algorithms increase the efficiency of duplicate detection for
situation with limited execution time; they dynamically change the ranking of
comparisons candidates based on intermediate results to execute promising comparisons
first and less promising comparisons later.
To determine the performance gain of these algorithms, this project also proposed
novel quality measure for progressiveness that integrates seamlessly with existing
measures. Using this measure, experiments shows that these approaches outperform the
traditional SNM by up to 100 percent and related work by up to 30 percent. In future
work, these progressive approaches with scalable approaches for duplicate detection to
deliver results even faster.

Contents
Abstract

Contents

ii iii

List of Figures

Chapter 1: Introduction

1.1
1.2
1.3
1.4

Objective of the Project

Existing System
Proposed System
Organization of Thesis

Chapter 2: System Requirements and Feasibility Study

2.1
2.2
2.3
2.4

Functional Requirements
Non Functional Requirements
Other Requirements
Pseudo Requirements
2.4.1 Hardware Requirements
2.4.2 Software Requirements
2.5 Feasibility Study
Chapter 3: System Design
3.1 System Design Introduction
3.2 Input Design
3.3 Output Design
3.4 UML Diagrams
3.4.1 Use-Case Diagram
3.4.2 Class Diagram Sequence diagram
3.4.3 Sequence diagram
3.4.4 Activity Diagram
3.4.5 Collaboration Diagram

Chapter 4: Implementation
4.1 Modules Description
4.1.1 Dataset Collection
4.1.2 Preprocessing Method
4.1.3 Data Separation
4.1.4 Duplicate Detection
4.1.5 Quality Measures

1
12
23
3
46
4
4
5
5
5
5
6
7 15
7
78
89
9 15
10 11
11 12
13
14
14 15

16 17
16
16
16
16
17
17

Chapter 5: Software Testing

5.1
5.2
5.3
5.4
5.5

18 23

Testing Introduction
Testing cycle
White box testing
Black Box Testing
Types of Testing
5.5.1 Unit Testing
5.5.2 Integration testing
5.5.3 System testing

18
18
19
20
20
21
22
23

Chapter 6: Results

24 34

6.1 Home Page

6.2 Dataset Loading Page
6.3 Data Pre-processing Page
6.4 Data Separation Page
6.5 Data Duplicate Detection Page
6.6 Quality Measures Page

24
25
26
27
28 29
32 34

Chapter 7: Conclusion

References

Appendix

37 43

List of Figures
Figure No

Figure Name

PAGE NO

3.4.1

Use case diagram for user

3.4.2

Class Diagram for User

3.4.3

Sequence diagram for user

3.4.4

Activity Diagram for user

3.4.5

Collaboration Diagram for use

5.3

White box testing

5.4

Black Box Testing

5.5.1

Unit Testing

5.5.2

Integration testing

5.5.3

System testing

6.1

Home page

6.2

Dataset loading page

6.3

Data Pre-processing Page

6.4

Data Separation Page

6.5.1

Progressive Blocking Page

6.5.2

PSNM Comparison Page

6.5.3

Delete Duplicate Page

6.5.4

Result Page

6.6

Quality Measures Page

6.6.1

Effectiveness Page

6.6.2

Runtime Results Page

6.6.3

Compare Page

Você também pode gostar

Fear: Trump in the White House
No Everand
Fear: Trump in the White House
Bob Woodward
Nota: 3.5 de 5 estrelas
3.5/5 (738)
A Man Called Ove: A Novel
No Everand
A Man Called Ove: A Novel
Fredrik Backman
Nota: 4.5 de 5 estrelas
4.5/5 (4609)
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
No Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
Nota: 3.5 de 5 estrelas
3.5/5 (231)
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
No Everand
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
Viet Thanh Nguyen
Nota: 4.5 de 5 estrelas
4.5/5 (119)
Never Split the Difference: Negotiating As If Your Life Depended On It
No Everand
Never Split the Difference: Negotiating As If Your Life Depended On It
Chris Voss
Nota: 4.5 de 5 estrelas
4.5/5 (838)
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
No Everand
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
Gilbert King
Nota: 4.5 de 5 estrelas
4.5/5 (265)
The Little Book of Hygge: Danish Secrets to Happy Living
No Everand
The Little Book of Hygge: Danish Secrets to Happy Living
Meik Wiking
Nota: 3.5 de 5 estrelas
3.5/5 (399)
Grit: The Power of Passion and Perseverance
No Everand
Grit: The Power of Passion and Perseverance
Angela Duckworth
Nota: 4 de 5 estrelas
4/5 (587)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
No Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
Nota: 3.5 de 5 estrelas
3.5/5 (2219)
Yes Please
No Everand
Yes Please
Amy Poehler
Nota: 4 de 5 estrelas
4/5 (1891)
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
No Everand
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
Mark Manson
Nota: 4 de 5 estrelas
4/5 (5794)
Principles: Life and Work
No Everand
Principles: Life and Work
Ray Dalio
Nota: 4 de 5 estrelas
4/5 (599)
Team of Rivals: The Political Genius of Abraham Lincoln
No Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
Nota: 4.5 de 5 estrelas
4.5/5 (234)
Rise of ISIS: A Threat We Can't Ignore
No Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
Nota: 3.5 de 5 estrelas
3.5/5 (137)
Shoe Dog: A Memoir by the Creator of Nike
No Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
Nota: 4.5 de 5 estrelas
4.5/5 (537)
The Emperor of All Maladies: A Biography of Cancer
No Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
Nota: 4.5 de 5 estrelas
4.5/5 (271)
The Glass Castle: A Memoir
No Everand
The Glass Castle: A Memoir
Jeannette Walls
Nota: 4.5 de 5 estrelas
4.5/5 (1711)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
No Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brene Brown
Nota: 4 de 5 estrelas
4/5 (1090)
A Tree Grows in Brooklyn
No Everand
A Tree Grows in Brooklyn
Betty Smith
Nota: 4.5 de 5 estrelas
4.5/5 (1929)
Her Body and Other Parties: Stories
No Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
Nota: 4 de 5 estrelas
4/5 (821)
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
No Everand
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
Ben Horowitz
Nota: 4.5 de 5 estrelas
4.5/5 (344)
John Adams
No Everand
John Adams
David McCullough
Nota: 4.5 de 5 estrelas
4.5/5 (2409)
The Woman in Cabin 10
No Everand
The Woman in Cabin 10
Ruth Ware
Nota: 3.5 de 5 estrelas
3.5/5 (2322)
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
No Everand
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
Margot Lee Shetterly
Nota: 4 de 5 estrelas
4/5 (890)
Sing, Unburied, Sing: A Novel
No Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
Nota: 4 de 5 estrelas
4/5 (1103)
Wolf Hall: A Novel
No Everand
Wolf Hall: A Novel
Hilary Mantel
Nota: 4 de 5 estrelas
4/5 (3811)
Angela's Ashes: A Memoir
No Everand
Angela's Ashes: A Memoir
Frank McCourt
Nota: 4.5 de 5 estrelas
4.5/5 (440)
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
No Everand
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
Ashlee Vance
Nota: 4.5 de 5 estrelas
4.5/5 (474)
The Art of Racing in the Rain: A Novel
No Everand
The Art of Racing in the Rain: A Novel
Garth Stein
Nota: 4 de 5 estrelas
4/5 (4200)
The Unwinding: An Inner History of the New America
No Everand
The Unwinding: An Inner History of the New America
George Packer
Nota: 4 de 5 estrelas
4/5 (45)
The Yellow House: A Memoir (2019 National Book Award Winner)
No Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
Nota: 4 de 5 estrelas
4/5 (98)
The Perks of Being a Wallflower
No Everand
The Perks of Being a Wallflower
Stephen Chbosky
Nota: 4.5 de 5 estrelas
4.5/5 (2099)
The Constant Gardener: A Novel
No Everand
The Constant Gardener: A Novel
John le Carre
Nota: 3.5 de 5 estrelas
3.5/5 (104)
The Outsider: A Novel
No Everand
The Outsider: A Novel
Stephen King
Nota: 4 de 5 estrelas
4/5 (1839)
The Light Between Oceans: A Novel
No Everand
The Light Between Oceans: A Novel
M.L. Stedman
Nota: 4.5 de 5 estrelas
4.5/5 (789)
Little Women
No Everand
Little Women
Louisa May Alcott
Nota: 4 de 5 estrelas
4/5 (104)
On Fire: The (Burning) Case for a Green New Deal
No Everand
On Fire: The (Burning) Case for a Green New Deal
Naomi Klein
Nota: 4 de 5 estrelas
4/5 (73)
Brooklyn: A Novel
No Everand
Brooklyn: A Novel
Colm Tóibín
Nota: 3.5 de 5 estrelas
3.5/5 (1937)
Manhattan Beach: A Novel
No Everand
Manhattan Beach: A Novel
Jennifer Egan
Nota: 3.5 de 5 estrelas
3.5/5 (792)
Bad Feminist: Essays
No Everand
Bad Feminist: Essays
Roxane Gay
Nota: 4 de 5 estrelas
4/5 (1015)
Question and Answer Set of System Analysis and
Documento169 páginas
Question and Answer Set of System Analysis and
Anthony Salangsang
100% (1)
Clean Code V2
Documento4 páginas
Clean Code V2
Anush Prem
Ainda não há avaliações
Steve Jobs
No Everand
Steve Jobs
Walter Isaacson
Nota: 4.5 de 5 estrelas
4.5/5 (806)
Gartner - Innovation Guide For AI Coding Assistants
Documento28 páginas
Gartner - Innovation Guide For AI Coding Assistants
artie.mellors
Ainda não há avaliações
Sample Master Test Plan
Documento11 páginas
Sample Master Test Plan
Deepak P
100% (1)
7 Factors Affecting Test Estimation of Selenium Automation Project - Selenium Tutorial #32
Documento14 páginas
7 Factors Affecting Test Estimation of Selenium Automation Project - Selenium Tutorial #32
Vijay Kumar
100% (1)
Unit 2 Path Testing Concepts
Documento66 páginas
Unit 2 Path Testing Concepts
rkrajdevloper34
100% (1)
Williams PDF
Documento9 páginas
Williams PDF
Varsha Dwivedi
Ainda não há avaliações
Testing Brush Up
Documento20 páginas
Testing Brush Up
srimkb
Ainda não há avaliações
Mc9233 Software Engineering
Documento10 páginas
Mc9233 Software Engineering
novfelnawzin
Ainda não há avaliações
Azure DevOps Pipeline
Documento11 páginas
Azure DevOps Pipeline
Somnath Kadam
Ainda não há avaliações
Product testing system
Documento6 páginas
Product testing system
Bhaskar Rao P
Ainda não há avaliações
Identification of Lung Nodules Using Yolov7
Documento32 páginas
Identification of Lung Nodules Using Yolov7
CSE HOD
Ainda não há avaliações
Petrol Bunk Automation
Documento44 páginas
Petrol Bunk Automation
Ranjith Kumar
Ainda não há avaliações
Manage Student Library Database
Documento25 páginas
Manage Student Library Database
Bharathi Gunasekaran
100% (1)
Sypman: Dept. of CSE, MITE, Moodabidri
Documento28 páginas
Sypman: Dept. of CSE, MITE, Moodabidri
UdupiSri group
Ainda não há avaliações
Ashish Sahu - Software CV
Documento3 páginas
Ashish Sahu - Software CV
Shubham Karnewar
Ainda não há avaliações
Sr Oracle Developer Resume
Documento11 páginas
Sr Oracle Developer Resume
Amit Kumar
Ainda não há avaliações
BI Professional with Expertise in Oracle BI Apps, ODI, Informatica and Talend
Documento10 páginas
BI Professional with Expertise in Oracle BI Apps, ODI, Informatica and Talend
Uday Diwakar
Ainda não há avaliações
CG Project Report
Documento26 páginas
CG Project Report
Vinutha Reddy
100% (1)
MAUI
Documento108 páginas
MAUI
Manjunath Mj
Ainda não há avaliações
Waterfall
Documento8 páginas
Waterfall
JosielynFlores
Ainda não há avaliações
Online Exam Quiz System
Documento62 páginas
Online Exam Quiz System
Vishal Tyagi
Ainda não há avaliações
Nagarjuna S: Business Analyst
Documento12 páginas
Nagarjuna S: Business Analyst
Srujana M
Ainda não há avaliações
4 Types of Software Testing Explained
Documento6 páginas
4 Types of Software Testing Explained
Shehan Isharaka
Ainda não há avaliações
End PPT2
Documento49 páginas
End PPT2
Eagl Terminus
Ainda não há avaliações
Swhat Is Software Testing? Definition, Basics & Types
Documento27 páginas
Swhat Is Software Testing? Definition, Basics & Types
Mohan Krishna
Ainda não há avaliações
(A Central University) : Department of Csit
Documento38 páginas
(A Central University) : Department of Csit
Ajit Tiwari
Ainda não há avaliações
Stubs and Drivers
Documento4 páginas
Stubs and Drivers
Areej Al Majed
Ainda não há avaliações
Ten Steps To Effective Mainframe Testing
Documento21 páginas
Ten Steps To Effective Mainframe Testing
Jesus Munoz Porras
Ainda não há avaliações
RTOMS Reports
Documento84 páginas
RTOMS Reports
Ashish MOHARE
Ainda não há avaliações