Bem-vindo(a) ao Scribd!

Active Learning with Multiple Views for Extracting Information

Enviado por

0% acharam este documento útil (0 voto)

5 visualizações1 página

This document discusses active learning techniques for wrapper induction, which is the task of automatically generating extraction rules to extract structured data from web pages. It introduces three multi-view active learning algorithms - Co-Testing, Co-EMT, and Adaptive View Validation - that can learn accurate wrappers from only a few labeled examples by exploiting redundancy across different representations or views of the data. Co-Testing and Co-EMT actively select the most informative examples to label by considering disagreement between views, while Adaptive View Validation predicts whether a new task is suitable for multi-view learning based on prior tasks.

Descrição original:

Wrapper Introduction

Título original

Wrapper Introduction

Direitos autorais

Formatos disponíveis

PDF, TXT ou leia online no Scribd

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Denunciar este documento

Direitos autorais:

Formatos disponíveis

Baixe no formato PDF, TXT ou leia online no Scribd

Sinalizar o conteúdo como inadequado

0% acharam este documento útil (0 voto)

5 visualizações1 página

Active Learning with Multiple Views for Extracting Information

Enviado por

Srinivas

Direitos autorais:

Formatos disponíveis

Baixe no formato PDF, TXT ou leia online no Scribd

Sinalizar o conteúdo como inadequado

Pular para a página

Você está na página 1de 1

Pesquisar no documento

Active Learning with Multiple Views

detecting the most informative examples, while also • From Zagat’s, it obtains the name and address of
exploiting the remaining unlabeled examples. Second, all Thai restaurants in L.A. A
we discuss Adaptive View Validation (Muslea et al., • From the L.A. County Web site, it gets the health
2002b), which is a meta-learner that uses the experience rating of any restaurant of interest.
acquired while solving past learning tasks to predict • From the Geocoder, it obtains the latitude/longi-
whether multi-view learning is appropriate for a new, tude of any physical address.
unseen task. • From Tiger Map, it obtains the plot of any loca-
tion, given its latitude and longitude.
A Motivating Problem: Wrapper
Induction Information agents typically rely on wrappers to
extract the useful information from the relevant Web
Information agents such as Ariadne (Knoblock et al., pages. Each wrapper consists of a set of extraction rules
2001) integrate data from pre-specified sets of Web sites and the code required to apply them. As manually writ-
so that they can be accessed and combined via database- ing the extraction rules is a time-consuming task that
like queries. For example, consider the agent in Figure requires a high level of expertise, researchers designed
1, which answers queries such as the following: wrapper induction algorithms that learn the rules from
user-provided examples (Muslea et al., 2001).
Show me the locations of all Thai restaurants in L.A. In practice, information agents use hundreds of
that are A-rated by the L.A. County Health Depart- extraction rules that have to be updated whenever the
ment. format of the Web sites changes. As manually labeling
examples for each rule is a tedious, error-prone task,
To answer this query, the agent must combine data one must learn high accuracy rules from just a few
from several Web sources: labeled examples. Note that both the small training
sets and the high accuracy rules are crucial to the suc-
cessful deployment of an agent. The former minimizes
the amount of work required to create the agent, thus
making the task manageable. The latter is required in
order to ensure the quality of the agent’s answer to
Figure 1. An information agent that combines data each query: when the data from multiple sources is
from the Zagat’s restaurant guide, the L.A. County integrated, the errors of the corresponding extraction
Health Department, the ETAK Geocoder, and the Tiger rules get compounded, thus affecting the quality of
Map service the final result; for instance, if only 90% of the Thai
restaurants and 90% of their health ratings are extracted
Restaurant Guide
correctly, the result contains only 81% (90% x 90% =
81%) of the A-rated Thai restaurants.
Query:
L.A. County
Health Dept. A-rated Thai
We use wrapper induction as the motivating problem
restaurants for this article because, despite the practical importance
in L.A. of learning accurate wrappers from just a few labeled
examples, there has been little work on active learn-
ing for this task. Furthermore, as explained in Muslea
Agent (2002), existing general-purpose active learners can-
RESULTS: not be applied in a straightforward manner to wrapper
induction.
Geocoder

MAIN THRUST
Tiger Map Server

In the context of wrapper induction, we intuitively

describe three novel algorithms: Co-Testing, Co-EMT,

Você também pode gostar

Paper 32-A New Automatic Method To Adjust Parameters For Object Recognition
Documento5 páginas
Paper 32-A New Automatic Method To Adjust Parameters For Object Recognition
Editor IJACSA
Ainda não há avaliações
A Practical QA System in Restricted Domains
Documento7 páginas
A Practical QA System in Restricted Domains
postscript
Ainda não há avaliações
Customer Churn Prediction For Telecom Services: Utku Yabas Hakki Candan Cankaya Turker Ince
Documento2 páginas
Customer Churn Prediction For Telecom Services: Utku Yabas Hakki Candan Cankaya Turker Ince
nima
Ainda não há avaliações
Degradation Detection Algorithm For LTE Root Cause Analysis
Documento10 páginas
Degradation Detection Algorithm For LTE Root Cause Analysis
Ahmed
Ainda não há avaliações
WP Analyzing Interpreting Reporting Load Test Results-A4 EN PDF
Documento12 páginas
WP Analyzing Interpreting Reporting Load Test Results-A4 EN PDF
Mohammed Fazullah Khan
Ainda não há avaliações
Testing-As-A-Service (Taas) - Capabilities and Features For Real-Time Testing in Cloud
Documento8 páginas
Testing-As-A-Service (Taas) - Capabilities and Features For Real-Time Testing in Cloud
Anonymous Gl4IRRjzN
Ainda não há avaliações
4 DOE Elements of Success
Documento2 páginas
4 DOE Elements of Success
cvenkatasunil
Ainda não há avaliações
Comparison of Performances of Jaya Algorithm and Cuckoo Search Algorithm Using Benchmark Functions
Documento14 páginas
Comparison of Performances of Jaya Algorithm and Cuckoo Search Algorithm Using Benchmark Functions
Mashuk Ahmed
Ainda não há avaliações
Estimating Handling Time of Software Defects
Documento14 páginas
Estimating Handling Time of Software Defects
CS & IT
Ainda não há avaliações
Automated Time Table Generator System Ijariie18951
Documento5 páginas
Automated Time Table Generator System Ijariie18951
Mukesh Prajapat
Ainda não há avaliações
Vide
Documento80 páginas
Vide
RajnishKumar
Ainda não há avaliações
Techniques For Testing Performance/Scalability and Stress-Testing ADF Applications
Documento24 páginas
Techniques For Testing Performance/Scalability and Stress-Testing ADF Applications
dinilmithra
Ainda não há avaliações
Sample Exam 2
Documento23 páginas
Sample Exam 2
Nguyen Khanh
Ainda não há avaliações
An Efficient Hardware Implementation of Reinforcement Learning: The Q-Learning Algorithm
Documento12 páginas
An Efficient Hardware Implementation of Reinforcement Learning: The Q-Learning Algorithm
Nguyễn Hồng Nhung
Ainda não há avaliações
An Efficient Hardware Implementation of Reinforcement Learning: The Q-Learning Algorithm
Documento12 páginas
An Efficient Hardware Implementation of Reinforcement Learning: The Q-Learning Algorithm
Nguyễn Hồng Nhung
Ainda não há avaliações
Icac2004 Chen Diagnosis
Documento8 páginas
Icac2004 Chen Diagnosis
Hddjdj
Ainda não há avaliações
Test Algebra For Combinatorial Testing
Documento7 páginas
Test Algebra For Combinatorial Testing
mingchiat88
Ainda não há avaliações
Load Balancing in Cloud Datacenters Using Honey Bee Behavior
Documento6 páginas
Load Balancing in Cloud Datacenters Using Honey Bee Behavior
Víctor Rojas
Ainda não há avaliações
Case Study 219302405
Documento14 páginas
Case Study 219302405
nishantjain2k03
Ainda não há avaliações
UNIT 4 Seminar
Documento10 páginas
UNIT 4 Seminar
Pradeep Chaudhari
50% (2)
Data Quality
Documento4 páginas
Data Quality
AsalSanjabzadeh
Ainda não há avaliações
Vahidnia, Arash Ledwich, Gerard Ghosh, Arindam Palmer, Edward
Documento8 páginas
Vahidnia, Arash Ledwich, Gerard Ghosh, Arindam Palmer, Edward
MarioTresić
Ainda não há avaliações
Diagnosis Based On Genetic Fuzzy Algorithms For LTE Self-Healing
Documento13 páginas
Diagnosis Based On Genetic Fuzzy Algorithms For LTE Self-Healing
PranaviAgarwal
Ainda não há avaliações
Different Type of Feature Selection For Text Classification
Documento6 páginas
Different Type of Feature Selection For Text Classification
seventhsensegroup
Ainda não há avaliações
Unit 10 PL-SQL Query Processing & Query Optimization
Documento8 páginas
Unit 10 PL-SQL Query Processing & Query Optimization
malav piya
Ainda não há avaliações
Mf195 Preliminary Report
Documento4 páginas
Mf195 Preliminary Report
qwerty
Ainda não há avaliações
Relating Reinforcement Learning Performance To Classification Performance
Documento8 páginas
Relating Reinforcement Learning Performance To Classification Performance
Paul Weng
Ainda não há avaliações
A Multi Objective Binary Bat Approach For Testcase Selection in Object Oriented Testing
Documento12 páginas
A Multi Objective Binary Bat Approach For Testcase Selection in Object Oriented Testing
Madhu Yarru
Ainda não há avaliações
BT interview questions cover HashMaps, HashTables, super vs this, types of testing
Documento8 páginas
BT interview questions cover HashMaps, HashTables, super vs this, types of testing
Administrator Bussines
Ainda não há avaliações
Test Suite Quality
Documento5 páginas
Test Suite Quality
Eduardo Luna
Ainda não há avaliações
Efficient Bug Triage Using Data Reduction
Documento6 páginas
Efficient Bug Triage Using Data Reduction
Anbu Vidhya
Ainda não há avaliações
Timetable by GA
Documento3 páginas
Timetable by GA
nhpancholi01
Ainda não há avaliações
Report
Documento2 páginas
Report
Ali
Ainda não há avaliações
Stock Price Trend Forecasting Using Supervised Learning Methods
Documento2 páginas
Stock Price Trend Forecasting Using Supervised Learning Methods
PRAFULKUMAR PARMAR
Ainda não há avaliações
R1 - Adaptive Profiling For Root-Cause Analysis of Performance Anomalies in Web-Based Applications
Documento8 páginas
R1 - Adaptive Profiling For Root-Cause Analysis of Performance Anomalies in Web-Based Applications
N EM
Ainda não há avaliações
AI Final Report
Documento29 páginas
AI Final Report
Vishesh Bhargava
Ainda não há avaliações
Niis Project SIP
Documento61 páginas
Niis Project SIP
Imperial Systems
Ainda não há avaliações
Relating Reinforcement Learning Performance To Cla PDF
Documento9 páginas
Relating Reinforcement Learning Performance To Cla PDF
Anthony Soares de Alencar
Ainda não há avaliações
Mathematical Formulation & Results
Documento1 página
Mathematical Formulation & Results
Muhammad Furqan
Ainda não há avaliações
Use Case Points
Documento6 páginas
Use Case Points
jack
Ainda não há avaliações
Stock Price Prediction using ML
Documento2 páginas
Stock Price Prediction using ML
Ali
Ainda não há avaliações
Created by Roshan Nair Page 1 of 5: Performance Tuning in Oracle
Documento5 páginas
Created by Roshan Nair Page 1 of 5: Performance Tuning in Oracle
gorthi.pavan
Ainda não há avaliações
A Matlab-Based Identification Procedure Applied To A Two-Degrees-Of-Freedom Robot Manipulator For Engineering Students
Documento22 páginas
A Matlab-Based Identification Procedure Applied To A Two-Degrees-Of-Freedom Robot Manipulator For Engineering Students
Harishanker Tiwari
Ainda não há avaliações
Artificial Intelligence Research in Particle Accelerator Control Systems For Beam Line Tuning
Documento3 páginas
Artificial Intelligence Research in Particle Accelerator Control Systems For Beam Line Tuning
Amna Majid
Ainda não há avaliações
Very Fast Estimation For Result and Accuracy of Big Data Analytics: The EARL System
Documento4 páginas
Very Fast Estimation For Result and Accuracy of Big Data Analytics: The EARL System
Dxtr Medina
Ainda não há avaliações
Real-Time Intrusion Detection in Network Traffic Using Adaptive and Auto-Scaling Stream Processor
Documento7 páginas
Real-Time Intrusion Detection in Network Traffic Using Adaptive and Auto-Scaling Stream Processor
edgar
Ainda não há avaliações
Li - 2020 - Software Reliability Growth Fault Correction Model Based On Machine Learning and Neural Network Algorithm
Documento5 páginas
Li - 2020 - Software Reliability Growth Fault Correction Model Based On Machine Learning and Neural Network Algorithm
ibrahim
Ainda não há avaliações
Ding 2015
Documento10 páginas
Ding 2015
bilalkhurshid
Ainda não há avaliações
2010, Zaragoza: Web Search Solved? All Result Rankings The Same?
Documento10 páginas
2010, Zaragoza: Web Search Solved? All Result Rankings The Same?
xkjon
Ainda não há avaliações
Restaurant Review Analysis Using Natural Language Processing
Documento6 páginas
Restaurant Review Analysis Using Natural Language Processing
IJRASETPublications
Ainda não há avaliações
Lab 4 Modeling Simple Model
Documento8 páginas
Lab 4 Modeling Simple Model
Pard Teekasap
Ainda não há avaliações
Reverse Engineering Domain Analysis
Documento13 páginas
Reverse Engineering Domain Analysis
Hari Prakash
Ainda não há avaliações
1.1 Background
Documento36 páginas
1.1 Background
Anshi Agrawal
Ainda não há avaliações
Assignment 2
Documento2 páginas
Assignment 2
ZERO TO VARIABLE
Ainda não há avaliações
Thesis Genetic Algorithm
Documento7 páginas
Thesis Genetic Algorithm
carmensanbornmanchester
100% (1)
The Study of A Sales Forecast Model Based On SA-LS
Documento7 páginas
The Study of A Sales Forecast Model Based On SA-LS
Nicholas ielts
Ainda não há avaliações
Deep Learning Recommends Java Annotations Based on Code Context
Documento12 páginas
Deep Learning Recommends Java Annotations Based on Code Context
Rutuparn Sadvelkar
Ainda não há avaliações
Manual Testing Interview Questions
Documento36 páginas
Manual Testing Interview Questions
Corythia
Ainda não há avaliações
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
No Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
Ainda não há avaliações
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
No Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
Ainda não há avaliações
Bioinformatics Programmers
Documento1 página
Bioinformatics Programmers
Srinivas
Ainda não há avaliações
Machine Learning Tools: (Scherf Et. Al. 2005)
Documento1 página
Machine Learning Tools: (Scherf Et. Al. 2005)
Srinivas
Ainda não há avaliações
Bio in For Matics
Documento1 página
Bio in For Matics
Srinivas
Ainda não há avaliações
Bibliomining For Library Decision-Making: Key Terms
Documento1 página
Bibliomining For Library Decision-Making: Key Terms
Srinivas
Ainda não há avaliações
Familiar With The Browser
Documento1 página
Familiar With The Browser
Srinivas
Ainda não há avaliações
Circulation and Usage Information
Documento1 página
Circulation and Usage Information
Srinivas
Ainda não há avaliações
Bibliomining For Library Decision-Making: Background
Documento1 página
Bibliomining For Library Decision-Making: Background
Srinivas
Ainda não há avaliações
Databases and Ontologies
Documento1 página
Databases and Ontologies
Srinivas
Ainda não há avaliações
Which Are The Two Main
Documento1 página
Which Are The Two Main
Srinivas
Ainda não há avaliações
Bibliomining for Library Decision Making
Documento1 página
Bibliomining for Library Decision Making
Srinivas
Ainda não há avaliações
Historic Nature of Data
Documento1 página
Historic Nature of Data
Srinivas
Ainda não há avaliações
Have Realized The Importance
Documento1 página
Have Realized The Importance
Srinivas
Ainda não há avaliações
Experts Based Upon Past Assignments
Documento1 página
Experts Based Upon Past Assignments
Srinivas
Ainda não há avaliações
American Standard Code For Informa
Documento1 página
American Standard Code For Informa
Srinivas
Ainda não há avaliações
Discussed The Application
Documento1 página
Discussed The Application
Srinivas
Ainda não há avaliações
Business Areas Served
Documento1 página
Business Areas Served
Srinivas
Ainda não há avaliações
Behavioral Pattern-Based Customer Segmentation
Documento1 página
Behavioral Pattern-Based Customer Segmentation
Srinivas
Ainda não há avaliações
It Will Mean Delays in Refreshing
Documento1 página
It Will Mean Delays in Refreshing
Srinivas
Ainda não há avaliações
Best Practices in Data Warehousing: Les Pang
Documento1 página
Best Practices in Data Warehousing: Les Pang
Srinivas
Ainda não há avaliações
Bayesian ML Application to Task Analysis
Documento1 página
Bayesian ML Application to Task Analysis
Srinivas
Ainda não há avaliações
Intuitive Method For Grouping
Documento1 página
Intuitive Method For Grouping
Srinivas
Ainda não há avaliações
Behavioral customer segmentation using patterns
Documento1 página
Behavioral customer segmentation using patterns
Srinivas
Ainda não há avaliações
A Bayesian Based Machine Learning Application To Task Analysis
Documento1 página
A Bayesian Based Machine Learning Application To Task Analysis
Srinivas
Ainda não há avaliações
Behavioral Pattern-Based Customer Segmentation
Documento1 página
Behavioral Pattern-Based Customer Segmentation
Srinivas
Ainda não há avaliações
Task Analysis Compared
Documento1 página
Task Analysis Compared
Srinivas
Ainda não há avaliações
Bayesian ML Application for Task Analysis Decomposition
Documento1 página
Bayesian ML Application for Task Analysis Decomposition
Srinivas
Ainda não há avaliações
Classic Task Analysis Methods
Documento1 página
Classic Task Analysis Methods
Srinivas
Ainda não há avaliações
Sampling Problems Related
Documento1 página
Sampling Problems Related
Srinivas
Ainda não há avaliações
What Are Musical Pitch
Documento1 página
What Are Musical Pitch
Srinivas
Ainda não há avaliações
Bayesian Based Machine Learning
Documento1 página
Bayesian Based Machine Learning
Srinivas
Ainda não há avaliações
Design Patterns: Horstmann Chapter 5 (Sections 1-7)
Documento14 páginas
Design Patterns: Horstmann Chapter 5 (Sections 1-7)
ben simanjuntak
Ainda não há avaliações
Design Patterns Explained With Java and Uml2 2008
Documento86 páginas
Design Patterns Explained With Java and Uml2 2008
Benckau
100% (68)
Wrapper Class
Documento26 páginas
Wrapper Class
asad_raza4u
Ainda não há avaliações
1.6.1 Manual Quality Game Settings
Documento76 páginas
1.6.1 Manual Quality Game Settings
Kagehiza Aghna Ilma
0% (1)
Chap 6 - Software Reuse
Documento51 páginas
Chap 6 - Software Reuse
Mohamed Med
Ainda não há avaliações
Implement Bridge Pattern in C
Documento9 páginas
Implement Bridge Pattern in C
sandymcan
Ainda não há avaliações
Chapter 4 - Design Patterns
Documento48 páginas
Chapter 4 - Design Patterns
Trần Quang Hà
Ainda não há avaliações
Designing Objects with Responsibilities
Documento86 páginas
Designing Objects with Responsibilities
MALARMANNAN A
Ainda não há avaliações
Case Study Cad/Cam Problem: Course Teacher
Documento27 páginas
Case Study Cad/Cam Problem: Course Teacher
Hira Fatima
Ainda não há avaliações
Siebel OpenUI
Documento17 páginas
Siebel OpenUI
Priya
Ainda não há avaliações
Core Java Interview Questions
Documento17 páginas
Core Java Interview Questions
Vishnu Rajoria
Ainda não há avaliações
CTL The Language For Describing Core-Based Test
Documento10 páginas
CTL The Language For Describing Core-Based Test
imoon63
Ainda não há avaliações
18 Design Patterns 1682947063
Documento2 páginas
18 Design Patterns 1682947063
Viji Lakshmi
Ainda não há avaliações
PR No 14-Mad
Documento8 páginas
PR No 14-Mad
Sandesh Gaikwad
Ainda não há avaliações
Design Patterns
Documento19 páginas
Design Patterns
Himanshu Srivastava
Ainda não há avaliações
Java Means Durgasoft: DURGA SOFTWARE SOLUTIONS, 202 HUDA Maitrivanam, Ameerpet, Hyd. PH: 040-64512786
Documento12 páginas
Java Means Durgasoft: DURGA SOFTWARE SOLUTIONS, 202 HUDA Maitrivanam, Ameerpet, Hyd. PH: 040-64512786
Shubh
Ainda não há avaliações
ROP Mainframe Data Services
Documento29 páginas
ROP Mainframe Data Services
Sameer Mathad
Ainda não há avaliações
Design-Patterns - Course-Notes PDF
Documento97 páginas
Design-Patterns - Course-Notes PDF
Andi Faizal
Ainda não há avaliações
Not Yet Answered Marked Out of 1.00: Clear My Choice
Documento34 páginas
Not Yet Answered Marked Out of 1.00: Clear My Choice
Đinh Quang Huy
Ainda não há avaliações
Chapter 10 Thinking in Objects
Documento67 páginas
Chapter 10 Thinking in Objects
Bhavik Dave
Ainda não há avaliações
Computer Science Syllabus Second Year - Fourth Semister
Documento9 páginas
Computer Science Syllabus Second Year - Fourth Semister
Turtle17
Ainda não há avaliações
MVC design patterns application object controller
Documento3 páginas
MVC design patterns application object controller
jyothi
100% (1)
Thinking in Java (4th Edition) - Bruce Eckel
Documento7 páginas
Thinking in Java (4th Edition) - Bruce Eckel
dazaciku
0% (2)
Session 17
Documento2 páginas
Session 17
ganesh
Ainda não há avaliações
SDA Lec5
Documento68 páginas
SDA Lec5
sana alam
Ainda não há avaliações
Java Lecture Notes
Documento66 páginas
Java Lecture Notes
Kumar Harsha
100% (1)
Gang of Four Design Patterns 4.5
Documento80 páginas
Gang of Four Design Patterns 4.5
pablo@
100% (4)
Rc0008 Designpatterns Online
Documento7 páginas
Rc0008 Designpatterns Online
Harmeet Singh Bawa
Ainda não há avaliações
Constructors and Parsing Methods of Wrapper Classes in Java
Documento3 páginas
Constructors and Parsing Methods of Wrapper Classes in Java
Mohan Raj
Ainda não há avaliações
Design Patterns Tutorialspoint
Documento94 páginas
Design Patterns Tutorialspoint
Qwerty Developer
Ainda não há avaliações