Escolar Documentos
Profissional Documentos
Cultura Documentos
1
F
1
1
( SC ( F ( {L } ), H ))
F
1 2
( SC ( F ( {L } ), H ))
2
(9)
1 2
1
F
1
1
( SC ( F ( {L } ), H )) F ( SC ( F ( {L } ), H ))
1 2 2
where O
1
and O
2
are two ontologies, F is the reference function that links set of lexical entries
to the set of concepts they refer to. But there is another case: L X
case, the taxonomic overlap can be computed as follows:
1 1
c
1
c
and L X . In this
2
TO ' '( L, O , O ) =
e F 1 ( SC ( F ( {L } ),
H 1
c
)) F 2 ( SC ( F ( {L } ), H 2 ))
| (10)
1 2
max
C C
F
1
( SC ( F ( {L } ), H
))
1
F ( SC ( F ( {L } ), H ))
2 1 1 2 2
Now we can get the average similarity for taxonomies:
TO (
O
which is constrained with
1
, O ) =
2
1
c
X 1
TO
9
L X 1 c
( L ,
O
1
,
O
2
) (11)
c
eTO'(L,O ,O ) if L X
1
TO ( L , O , O ) =
2 2
(12)
1 2
b) Comparing relations
c
TO ' ' (L , O 1, O 2 ) if
L
c
X 2
Similar to the above comparison, we can compute the relation overlap (RO), i.e. the accuracy two
relations match. RO can be computed based on the geometric mean value of similarity their
domain concepts. Similar to taxonomies comparison, we define the upwards cotopy (UC) in order
to consider the similarity of concepts:
UC(C , H ) = {C A H (C ,C )} (13)
i j i j
Then, similar to the definition of TO, the concept match is defined as (14) shows:
1
F ( UC ( C , H ))
F
1
( UC ( C , H ))
CM ( C , O , C , O ) =
1 1 1 2 2 2
(14)
1 1 2 2 1
F ( UC ( C , H ))
F
1
( UC ( C , H ))
1 1
RO of relations R1 and R2 can be defined as (15).
RO '( R , O , R , O ) = CM ( d ( R ), O , d (
R
1 2 2 2
), O ) CM ( r ( R ), O , r ( R ), O ) (15)
1 1 2 2
r r
Consider L X 2, L X 1
RO ' ' ( L , O , O ) =
1 1 2 2
1
max {
RO '
1 1 2 1
( R , O , R , O )} (16)
1 2
9
1 1 2 2
RO ' ' ' ( L , O 1 ,
O 2 ) =
G
1
({ L })
R 1 G 1 ({
L })
1
9
R 2 G 2 ({ L })
max { RO ' ( R 1 , O 1 , R 2 , O 2 )} (17)
G
1
({ L }) R 2 P2
R 1 G 1 ({ L })
where R1 and R2 are two relations we want to compare. G is the reference function that links
set of lexical entries to the set of relations they refer to. Since there are several different
2013 Page 21 of 29
conditions, the author gives (15), (16) and (17) respectively. Then with combination of the
above definitions, we get RO for L X
e RO
'
c
1
r
' ( L , O , O ) if L X
RO ( L , O , O ) =
1 2 2
(18)
1 2 c r
RO'''(L,O
1,
O2 ) if L X 2
Similar to taxonomies, the average relation overlap is then defined grounded with the
condition (18):
RO (
O
1
, O ) =
2
1
r
9
RO ( L,
O
1
,
O
2
) (19)
X
1 L X r
1
Now we can see this method is more complex than the previous approaches, but it is very
interesting and creative.
2.8 Toolkits
2.8.1 Jena
HP has developed a toolkit named Jena for developing applications within the Semantic Web. The
toolkit contains a RDF/XML Parser (ARP), Jena relational database interface, integrated query
language (RDQL), a server for publishing RDF models on the web (Joseki [27]) and a set of Java
API for RDF. It has added support for DAML+OIL since version 1.2.
2.8.2 Sesame
Sesame [34] is an open source project which is an RDF Schema-based repository. It has
several useful features:
a) Data administration: add, delete data.
b) Export: exports the data in repository to RDF documents.
c) RQL engine: can be used to evaluate RQL queries.
2.8.3 DAMLJessKB
DAMLJessKB [32] is set of Java API for reasoning with DAML within the Semantic Web. It is
built based on Jena and Jess [31]. It now supports DAML, RDFS, RDF and XML Schemas. It
provides abilities such as reading DAML files, converting the information to the DAML
language, and making queries.
2013 Page 22 of 29
Part II
3. Project Plan
3.1. Aims and Objectives
The main aim of this project is to develop a search engine based on ontology matching within the
Semantic Web. Since the Semantic Web is distributive, there are a lot of resource
descriptions where two concepts within different ontologies are equivalent, but they are
described in different terms. This project is to match those elements, and then bring a more
accuracy search result. This project can be divided into several sub-tasks.
c. A query builder: A query builder is the bridge between the user and machine learning
agent. It accepts the input of the user and constructs a kind of query language. Then it
transfers the query to machine learning agent. It has a friendly interface that will also
ease the user input.
d. A machine learning agent: This module is the core of this project. Some machine
learning techniques involve this project in this layer. The agent will measure the
similarity of the elements and determine if they are equivalent. Then it will transfer the
output to a CGI-like program to return the result to the user.
3.2. Initial Design and Specification
Since a detailed design and specification of this project is not presented now, I can only give a
rough illustration. I dont guarantee that all features here will be implemented and some
components may be changed, added or removed. The project has 3 main sub-tasks:
3.2.1 DAML Converter and Builder
I will prefer DAML to RDF as the semantic web data form in this project since there are
many data sets that open to public. Data will be stored in a relational database since the
response time is very important to a search engine. I will also use some artificial data if the
data on the Web is not suitable for my project. In this particular case, I will make an agent that
Page 23 of 29
will generate data in SW (Semantic Web) form automatically and convert data in Non-SW
form to SW form. This agent will be implemented by java with some DAML API. It has a
GUI user interface, and I will also implement a web interface if time is enough.
3.2.2 Query Builder
The aim of query builder is to ease the process of user input, especially when a user wants to
make a complex query. It will take the user requests and format them into a kind of Query
Language to agents. This part will be a cgi-like program, built with some jsp pages or
servlets.
Mahcine
UserInput QueryBuilder
Learning
Agent
Database
Figure 3.2.2.1 Qurey Builder
3.2.3 Machine Learning Agent
This is the core of this project. The agent will accept the queries, match the contents against the
requests and measure the similarity among elements of two ontologies and determine if they are
matched candidates. Then it will bring the information back to the user.
Some machine learning techniques such as Nave Bayes, involve in this layer. As we
mentioned in previous sections, there are several types of mapping. I will focus on one-to-one
mapping in this project, rather than complex types. Complex mapping may be dealt with if
there is time left, e.g. Full Name maps to the combination of First Name and Last
Name.
As for the similarity measures, I have not decide what type of measures I should use. This is
because almost all methods are not suitable for every domain or condition. So I will compare
several methods when developing and choose one that best fits my data sets, or may
2013 Page 24 of 29
implement a hybrid system.
3.2.4 DAML Crawler
DAML crawler is a program that collects DAML statements by traveling on the Web. The
main aim of the crawler is to collect DAML content on the Web periodically. This part is
optional since there are several existing crawlers on the Web and I can also use the existing
tools.
Database
Root URIs
Crawler
Internet
DAML
content
rdfDB
Figure 3.2.4.1 DAML Crawler
2013 Page 25 of 29
3.3 Project Schedule
June July August September
Project Steps: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Final Design
1. Data sets survey
2. Compare several theories of similarity measures
3. Design the detailed program structure
Software Development
1. DAML converter
2. Machine Learning Agent
3. Query Builder
Testing
1. Unit Test & System Test
Documentation
1. Prepare Presentation
2. Final documentation
3. Prepare demostration
Development Final Design
Milestones Testing Documentation
2013 Page 26 of 29
Part III
4. Bibliography
1. The Semantic Web - ISWC 2002: First International Semantic Web Conference, Sardinia,
Italy, June 9-12, 2002 : proceedings / Ian Horrocks, James Hendler (eds.)
2. Advances in web-based learning: first international conference, ICWL 2002, Hong Kong,
China, August 17-19, 2002 : proceedings / Joseph Fong ... [et al.] (eds.)
5. References
[1] W3C Semantic Web, http://www.w3.org/2001/sw/
[2] Tim Berners-Lee, James Hendler, Ora Lassila, The Semantic Web, Scientific American,
http://www.sciam.com/article.cfm?articleID=00048144-10D2-1C70-84A9809EC588EF21
[3] Sean B. Palmer, The Semantic Web: An Introduction, 2001,
http://infomesh.net/2001/swintro/
[4] Eric Prudhommeaux, Presentation of W3C and Semantic Web, 2001,
http://www.w3.org/2001/Talks/0710-ep-grid
[5] Tim Berners-Lee, W3C Naming and Addressing Overview (URIs. URLs, ...),
http://www.w3.org/Addressing/
[6] W3C, W3C Resource Description Framework (RDF), http://www.w3.org/RDF/
[7] The DARPA Agent Markup Language (DAML), http://www.daml.org/
[8] AQ-Search Group (Jianghua Tu, Rui Feng, Zhuoyun Li, Wei Tong, Zaiqiang Liu and
Instructor --- Prof. Kokar), AQ-Search Project, 2002
[9] Aaron Swartz, The Semantic Web (for Web Developers), May 2001,
http://logicerror.com/semanticWeb-webdev
[10] Aaron Swartz, The Semantic Web In Breadth, May 2002,
http://logicerror.com/semanticWeb-long#acks
2013 Page 27 of 29
[12] Sandro Hawke, How the Semantic Web Works, April 2002,
http://www.w3.org/2002/03/semweb/
[13] W3C, HTTP Specifications and Drafts, http://www.w3.org/Protocols/Specs.html
[14] IETF, Hypertext Transfer Protocol -- HTTP/1.1 - Draft Standard RFC 2616,
http://www.ietf.org/rfc/rfc2616.txt
[15] W3C, Extensible Markup Language (XML), http://www.w3.org/XML/
[16] W3C, Extensible Markup Language (XML) 1.0 (Second Edition), 6 October 2000,
http://www.w3.org/TR/REC-xml
[17] Erik T. Ray, Learning XML, First Edition, January 2001, ISBN: 0-59600-046-4, 368
pages.
[18] R. Allen Wyke, Brad Leupen, Sultan Rehman, XML Programming, 2002, Microsoft
Press
[19] W3Schools (http://www.w3schools.com), XML Schema Tutorial, (2001)
http://www.w3schools.com/schema/schema_intro.asp
[20] Tim Bray, What is RDF? http://www.xml.com/lpt/a/2001/01/24/rdf.html
[21] W3C, RDF Primer, W3C Working Draft 23 January 2003,
http://www.w3.org/TR/2003/WD-rdf-primer-20030123/
[22] W3C, Resource Description Framework (RDF) Model and Syntax Specification, W3C
Recommendation 22 February 1999, http://www.w3.org/TR/1999/REC-rdf-syntax-19990222
[23] W3C, RDF Vocabulary Description Language 1.0: RDF Schema, W3C Working Draft 23
January 2003, http://www.w3.org/TR/rdf-schema/
[24] Roxane Ouellet, Uche Ogbuji, Introduction to DAML: Part I (2002),
http://www.xml.com/lpt/a/2002/01/30/daml1.html
[25] Roxane Ouellet, Uche Ogbuji, Introduction to DAML: Part II (2002),
http://www.xml.com/lpt/a/2002/03/13/daml.html
[26] Adam Pease, Why Use DAML? (10 April, 2002),
http://www.daml.org/2002/04/why.html
[27] http://www.joseki.org/
2013 Page 28 of 29
[28] Jena, http://www.hpl.hp.com/semweb/
[29] T. R. Gruber. A translation approach to portable ontologies. Knowledge Acquisition,
5(2):199-220, 1993.
[30] AnHai Doan, Jayant Madhavan, Pedro Domingos, Alon Halevy. Learning to Map
Between Ontologies on the Semantic Web. May 2002.
http://www2002.org/CDROM/refereed/232/
[31] Jess, http://herzberg.ca.sandia.gov/jess/
[32] DAMLJessKB, http://edge.mcs.drexel.edu/assemblies/software/damljesskb
[33] Xiaomeng Su, A Text Categorization Perspective for Ontology Mapping,
http://www.idi.ntnu.no/~xiaomeng/paper/Position.pdf
[34] Sesame, http://sesame.aidministrator.nl/
[35] Alexander Maedche and Steffen Staab, Comparing Ontologies Similarity Measures
and a Comparison Study, March 2001,
http://www.aifb.uni-karlsruhe.de/~sst/Research/Publications/report-aifb-408.pdf
2013 Page 29 of 29