Escolar Documentos
Profissional Documentos
Cultura Documentos
S.No. Topic Abstract 1. 2. 2.1 2.2 3. 4. 5. 5.1 5.2 5.3 5.4 5.5 6. 7. 8. 9. 10. 11. Introduction History Web 1.0 Web 2.0 Web 3.0- A Basic Introduction Architecture of Semantic Web Key Components URI RDF RDFS OWL Microformat Practical Illustration Difference between Web 1.0, Web 2.0 and Web 3.0 Challenges Project Implementation Conclusion Bibliography Page No. 6 8 9 9 11 13 14 15 15 17 18 21 22 24 26 27 28 31 32
FIGURE INDEX
Figure 1. 2. 3. 4. 5. 6. Figure 1 Web 1.0 Example Figure 2 Web 2.0 Example Figure 3 Semantic Web Architecture Figure 4 RDF Example Figure 5 Traditional Web Model Figure 6 Semantic Web Model
Page No. 10 12 14 20 24 25
ABSTRACT
2
The Semantic Web is an evolving development of the World Wide Web in which the meaning (semantics) of information and services on the web is defined, making it possible for the web to "understand" and satisfy the requests of people and machines to use the web content. and a variety of enabling technologies. Some elements of the semantic web are expressed as prospective future possibilities that are yet to be implemented or realized.Other elements of the semantic web are expressed in formal specifications.Some of these include Resource Description Framework (RDF), a variety of data interchange formats (e.g. RDF/XML, N3, Turtle, N-Triples), and notations such as RDF Schema (RDFS) and the Web Ontology Language (OWL), all of which are intended to provide a formal description of concepts, terms, and relationships within a given knowledge . The key components of semantic web technology are as follows: 1. OWL: The Web Ontology Language (OWL) is a family of knowledge representation languages for authoring ontologies endorsed by the World Wide Web Consortium. They are characterised by formal semantics and RDF/XML-based serializations for the Semantic Web. OWL has attracted both academic, medical and commercial interest. 2. Resource Description Format: The Resource Description Framework (RDF) is a family of World Wide Web Consortium (W3C) specifications originally designed as a metadata data model. It has come to be used as a general method for conceptual description or modeling of information that is implemented in web resources, using a variety of syntax formats. 3. RDF Schema: RDF Schema (various abbreviated as RDFS, RDF(S), RDF-S, or RDF/S) is an extensible knowledge representation language, providing basic elements for the description of ontologies, otherwise called Resource Description Framework (RDF) vocabularies, intended to structure RDF resources. 4. Microformat: A microformat (sometimes abbreviated F) is a web-based approach to semantic markup that seeks to re-use existing HTML/XHTML tags to convey metadata and other attributes, in web pages and other contexts that support (X)HTML, such as RSS. This approach allows information intended for end-users . At its core, the semantic web comprises a set of design principle. collaborative working groups,
1. INTRODUCTION
Currently the focus of a W3C working group, the Semantic Web vision was conceived by Tim Berners-Lee, the inventor of the World Wide Web. The World Wide Web changed the way we communicate, the way we do business, the way we seek information and entertainment the very way most of us live our daily lives. Calling it the next step in Web evolution, Berners-Lee defines the Semantic Web as a web of data that can be processed directly and indirectly by machines. In the Semantic Web data itself becomes part of the Web and is able to be processed independently of application, platform, or domain. This is in contrast to the World Wide Web as we know it today, which contains virtually boundless information in the form of documents. We can use computers to search for these documents, but they still have to be read and interpreted by humans before any useful information can be extrapolated. Computers can present you with information but cant understand what the information is well enough to display the data that is most relevant in a given circumstance. The Semantic Web, on the other hand, is about having data as well as documents on the Web so that machines can process, transform, assemble, and even act on the data in useful ways. Imagine this scenario. Youre a software consultant and have just received a new project. Youre to create a series of SOAP-based Web services for one of your biggest clients. First, you need to learn a bit about SOAP, so you search for the term using your favorite search engine. Unfortunately, the results youre presented with are hardly helpful. There are listings for dish detergents, facial soaps, and even soap operas mixed into the results. Only after sifting through multiple listings and reading through the linked pages are you able to find information about the W3Cs SOAP specifications. Because, of the different semantic associations of the word soap, the results you receive are varied in relevance and you still have to do a lot of work to find the information youre looking for. However, in a Semantic Web-enabled environment, you could use a Semantic Web agent to search the Web for SOAP where SOAP is a type of technology specification used in Web services. This time, the results of your search will be relevant. Your Semantic Web agent can also search your corporate network for the SOAP specification and discover if your colleagues have completed similar projects or have posted SOAP-related 5
research on the network. Based on the semantic information available for SOAP, your agent also presents you with a list of related technologies. Now you know that WSDL, XML, and URI are all technologies related to SOAP, and that youll need to do some research on them, too, before beginning your project. Armed with the information returned by your Semantic Web agent, you read the related technology specifications and send emails to the colleagues who have made SOAP-related materials available on the network to ask for their input before starting your new project.
2. HISTORY
Static pages instead of dynamic user-generated content. The use of framesets. Proprietary HTML extensions such as the <blink> and <marquee> tags introduced during the first browser war. Online guestbooks. GIF buttons, typically 88x31 pixels in size promoting web browsers and other products. HTML forms sent via email. A user would fill in a form, and upon clicking submit their email client would attempt to send an email containing the form's details. 6
Facebook is a social networking site and it is a prominent example of web 2.0. This site allows user to make friends, write them messages, chat with them , upload and share photos etc. activities.
10
one database, and then move through an unending set of databases which are connected not by wires but by being about the same thing.
Figure 3. Semantic Web Architecture Here : URI: Uniform Resource Identifier OWL: Web Ontology Language XML: Extensible Mark-Up Language 11
RDF: Resource Description Format RDFS: Resource Description Format Schema SPARQL: Sparql Protocol and RDF Query Language RIF/SWRL: Rule Interchange Format
Architecture Description:
The basic architecture of semantic web contains Identifiers (Uniform Resource Identifiers) and character code as Unicode. Above this layer is the Syntax layer, defining the syntactical realtionship and the base here is XML. Above this layer is the Data Interchange layer with RDF defining the same. Above it the query handling part is handled by SPARQL and the taxonomies is determined by RDFS. The Ontologies are governed by OWL and rules by RIF/SWRL. Above it is the unifying logic and the proof layer. All the aforementioned layers were encrypted using Cryptology. Above these is the Trust layer. A brief description of all the aforementioned layers and components shall be given in the upcoming segments of the report.
5. Key Components
Semantic Web has five main components which help in accomplishing the required task and define the functioning of the web:
12
A URI may be classified as a locator (URL), or a name (URN), or both. A Uniform Resource Name (URN) functions like a person's name, while a Uniform Resource Locator (URL) resembles that person's street address . In other words: the URN defines an item's identity, while the URL provides a method for finding it. The URI syntax consists of a URI scheme name followed by a colon character, and then by a scheme-specific part. The specifications that govern the schemes determine the syntax and semantics of the scheme-specific part, although the URI syntax does force all schemes to adhere to a certain generic syntax that, among other things, reserves certain characters for special purposes (without always identifying those purposes). The URI syntax also enforces restrictions on the scheme-specific part, in order to, for example, provide for a degree of consistency when the part has a hierarchical structure. Percent encoding can add extra information to a URI. A URI reference is another type of string that represents a URI, and (in turn) represents the resource identified by that URI. Informal usage does not often maintain the distinction between a URI and a URI reference, but protocol documents should not allow for ambiguity. A URI reference may take the form of a full URI, or just the scheme-specific portion of one, or even some trailing component thereof even the empty string. An optional fragment identifier, preceded by #, may be present at the end of a URI reference. The part of the reference before the # indirectly identifies a resource, and the fragment identifier identifies some portion of that resource. In order to derive a URI from a URI reference, software converts the URI reference to 'absolute' form by merging it with an absolute 'base' URI according to a fixed algorithm. The system treats the URI reference as relative to the base URI, although in the case of an absolute reference, the base has no relevance. The base URI typically identifies the document containing the URI reference, although this can be overridden by declarations made within the document or as part of an external data transmission protocol. If the base URI includes a fragment identifier, it is ignored during the merging process. If a fragment identifier is present in the URI reference, it is preserved during the merging process. Web document markup languages frequently use URI references to point to other resources, such as external documents or specific portions of the same logical document. 13
5.2 RDF:
The Resource Description Framework (RDF) is a family of World Wide Web Consortium (W3C) specifications originally designed as a metadata data model. It has come to be used as a general method for conceptual description or modeling of information that is implemented in web resources, using a variety of syntax formats. The RDF data model is similar to classic conceptual modeling approaches such as Entity-Relationship or Class diagrams, as it is based upon the idea of making statements about resources (in particular Web resources) in the form of subject-predicate-object expressions. These expressions are known as triples in RDF terminology. The subject denotes the resource, and the predicate denotes traits or aspects of the resource and expresses a relationship between the subject and the object. For example, one way to represent the notion "The sky has the color blue" in RDF is as the triple: a subject denoting "the sky", a predicate denoting "has the color", and an object denoting "blue". RDF is an abstract model with several serialization formats (i.e., file formats), and so the particular way in which a resource or triple is encoded varies from format to format. A collection of RDF statements intrinsically represents a labeled, directed multigraph. As such, an RDF-based data model is more naturally suited to certain kinds of knowledge representation than the relational model and other ontological models. However, in practice, RDF data is often persisted in relational database or native representations also called Triplestores, or Quad stores if context (i.e. the named graph) is also persisted for each RDF triple. As RDFS and OWL demonstrate, additional ontology languages can be built upon RDF. The subject of an RDF statement is either a Uniform Resource Identifier (URI) or a blank node, both of which denote resources. Resources indicated by blank nodes are called anonymous resources. They are not directly identifiable from the RDF statement. The predicate is a URI which also indicates a resource, representing a relationship. The object is a URI, blank node or a Unicode string literal. In Semantic Web applications, and in relatively popular applications of RDF like RSS and FOAF (Friend of a Friend), resources tend to be represented by URIs that intentionally denote, and can be used to access, actual data on the World Wide Web. But RDF, in general, is not limited to the description of Internet-based resources. In fact, the URI that names a resource does not have to be dereferenceable at all. For example, a URI that begins 14
with "http:" and is used as the subject of an RDF statement does not necessarily have to represent a resource that is accessible via HTTP, nor does it need to represent a tangible, network-accessible resource such a URI could represent absolutely anything. However, there is broad agreement that a bare URI (without a # symbol) which returns a 300-level coded response when used in an http GET request should be treated as denoting the internet resource that it succeeds in accessing.
5.3 RDFS:
RDF Schema (various abbreviated as RDFS, RDF(S), RDF-S, or RDF/S) is an extensible knowledge representation language, providing basic elements for the description of ontologies, otherwise called Resource Description Framework (RDF) vocabularies, intended to structure RDF resources. The first version was published by the World-Wide Web Consortium (W3C) in April 1998, and the final W3C recommendation was released in February 2004. Many RDFS components are included in the more expressive language Web Ontology Language (OWL). For Example: rdfs:Class declares a resource as a class for other resources. A typical example of an rdfs:Class is foaf:Person in the Friend of a Friend (FOAF) vocabulary. An instance of foaf:Person is a resource that is linked to the class using the
rdf:type
sentence : 'John is a Person'. Ex:John rdf:type foaf:Person The definition of rdfs:Class is recursive: rdfs:Class is the rdfs:Class of any rdfs:Class. rdfs:subClassOf allows to declare hierarchies of classes. For example, the following declares that 'Every Person is an Agent': foaf:Person rdfs:subClassOf foaf:Agent
15
Hierarchies of classes support inheritance of a property domain and range from a class to its subclasses. The RDF Schema specification describes rdf:Property as the class of RDF properties. Each member of the class is an RDF predicate. rdfs:domain of an rdf:predicate declares the class of the subject in a triple whose second component is the predicate. rdfs:range of an rdf:predicate declares the class or datatype of the object in a triple whose second component is the predicate. For example, the following declarations are used to express that the property
ex:employer
foaf:Organization:
ex:employer rdfs:domain foaf:Person ex:employer rdfs:range foaf:Organization Given the previous two declarations, the following triple requires that ex:John is necessarily a
foaf:Person,
ex:John ex:employer ex:CompanyX rdfs:subPropertyOf is an instance of rdf:Property that is used to state that all resources related by one property are also related by another.
16
RDF Triple: ( Aditya Thatte, stays in, Pune) This can be mapped to a schema which contains the classes Citizen and Country. A Citizen abc stays in a country X , then X also involves abc. The class citizen has subclasses Voting citizen and non voting citizen and the country class has subclasses states which inturn has subclasses city , town , taluka represented by the subclassof property.
17
The rectangle represents properties, ellipses in the RDFS layer represents classes while ellipses in the RDF layer represents instances. The domain and range enforce constraints on the subject and objects of a property. So, the above diagram suggests that the subject ( Aditya Thatte ) is a type of voting citizen , object (Pune) is a type of a city and the relationship between them is stays in or resides in
5.4 OWL:
The Web Ontology Language (OWL) is a family of knowledge representation languages for authoring ontologies endorsed by the World Wide Web Consortium. They are characterised by formal semantics and RDF/XML-based serializations for the Semantic Web. OWL has attracted both academic, medical and commercial interest. In October 2007, a new W3C working group was started to extend OWL with several new features as proposed in the OWL 1.1 member submission. This new version, called OWL 2, soon found its way into semantic editors such as Protg and semantic reasoners such as Pellet, RacerPro and FaCT++. W3C announced the new version on 27 October 2009. The OWL family contains many species, serializations, syntaxes and specifications with similar names. This may be confusing unless a consistent approach is adopted. OWL and OWL2 will be used to refer to the 2004 and 2009 specifications, respectively. Full species names will be used, including specification version (for example, OWL2 EL). When referring more generally, OWL Family will be used. The data described by an ontology in the OWL family is interpreted as a set of "individuals" and a set of "property assertions" which relate these individuals to each other. An ontology consists of a set of axioms which place constraints on sets of individuals (called "classes") and the types of relationships permitted between them. These axioms provide semantics by allowing systems to infer additional information based on the data explicitly
18
provided. A full introduction to the expressive power of the OWL is provided in the W3C's OWL Guide.
Example:
An ontology describing families might include axioms stating that a "hasMother" property is only present between two individuals when "hasParent" is also present, and individuals of class "HasTypeOBlood" are never related via "hasParent" to members of the "HasTypeABBlood" class. If it is stated that the individual Harriet is related via "hasMother" to the individual Sue, and that Harriet is a member of the "HasTypeOBlood" class, then it can be inferred that Sue is not a member of "HasTypeABBlood".
5.5 Microformat:
A microformat (sometimes abbreviated F) is a web-based approach to semantic markup that seeks to re-use existing HTML/XHTML tags to convey metadata and other attributes, in web pages and other contexts that support (X)HTML, such as RSS. This approach allows information intended for end-users (such as contact information, geographic coordinates, calendar events, and the like) to also be automatically processed by software. Although the content of web pages is technically already capable of "automated processing," and has been since the inception of the web, such processing is difficult because the traditional markup tags used to display information on the web do not describe what the information means. Microformats are intended to bridge this gap by attaching semantics, and thereby obviate other, more complicated, methods of automated processing, such as natural language processing or screen scraping. The use, adoption and processing of microformats enables data items to be indexed, searched for, saved or cross-referenced, so that information can be reused or combined. Current microformats allow the encoding and extraction of events, contact information, social relationships and so on. More are being developed. Version 3 of the Firefox
19
browser, as well as version 8 of Internet Explorer are expected to include native support for microformats. Microformats emerged as part of a grassroots movement to make recognizable data items (such as events, contact details or geographical locations) capable of automated processing by software, as well as directly readable by end-users Link-based microformats emerged first. These include vote links that express opinions of the linked page, which can be tallied into instant polls by search engines. As the microformats community grew, CommerceNet, a nonprofit organization that promotes electronic commerce on the Internet, helped sponsor and promote the technology and support the microformats community in various ways. CommerceNet also helped co-found the Microformats.org community site. Neither CommerceNet nor Microformats.org is a standards body. The microformats community is an open wiki, mailing list, and Internet relay chat (IRC) channel. Most of the existing microformats were created at the Microformats.org wiki and associated mailing list, by a process of gathering examples of web publishing behaviour, then codifying it. Some other microformats (such as rel=nofollow and unAPI) have been proposed, or developed, elsewhere.
Example:
In this example, the contact information is presented as follows:
<div> <div>Joe Doe</div> <div>The Example Company</div> <div>604-555-1234</div> <a href="http://example.com/">http://example.com/</a> </div>
20
</div>
Here, the formatted name (fn), organisation (org), telephone number (tel) and web address(url) have been identified using specific class names and the whole thing is wrapped in
class="vcard",
which indicates that the other classes form an hCard (short for "HTML) and
are not merely coincidentally named. Other, optional, hCard classes also exist. It is now possible for software, such as browser plug-ins, to extract the information, and transfer it to other applications, such as an address book.
A Faculty Page
A research Page
A Blog Site
Now, if she decides to use semantic web instead of the traditional web model then the complexity and presentability of the web pages would increase immensely. So we can link Professor Sharmas faculty page to her research. Then link data in her blog to both of these. And link profile data to her staff listing. And her staff listing could show some of the other academics she works with. With her research page showing her links with worldwide research collaborators. Who also know one of her colleagues. Who comment on Professor Sharmas blog regularly. With all this data being able to be displayed simply it provides a much richer user experience and offers information that previously might not have been exposed. The web page would now look like: Figure 6 Semantic Web Model
22
The straight lines show the relationship between various web pages, researchers, staff and other web entities. The inter twined relationship shows the complex relation between data that can be viewed and the entities.
role was limited only to reading the information presented to him. The best examples are millions of static websites which mushroomed during the.com boom. There was no active communication or information flow from consumer of the information to producer of the information. Web 2.0: The lack of active interaction of common user with the web lead to the birth of
Web 2.0. The year 1999 marked the beginning of a Read-Write-Publish era with notable contributions from LiveJournal (Launched in April, 1999) and Blogger (Launched in August, 1999). Now even a non-technical user can actively interact & contribute to the web using different blog platforms. This era empowered the common user with a few new concepts viz. Blog, Social-Media & Video-Streaming. Publishing your content is only a few clicks away! Few remarkable developments of Web 2.0 are Twitter, YouTube, eZineArticles, Flickr and Facebook. Web 3.0: It seems we have everything whatever we had wished for in Web 2.0, but it is way
behind when it comes to intelligence. Perhaps a six year old child has a better analytical abilities than the existing search technologies! Keyword based search of web 2.0 resulted in an information overload. The following attributes are going to be a part of Web 3.0:
contextual Search Tailor made Search Personalized Search Evolution of 3D Web Deductive Reasoning
23
Though Web is yet to see something which can be termed as fairly intelligent but the efforts to achieve this goal has already began. 2 weeks back the Official Google Blog mentioned about how Google search algorithm is now getting intelligent as it can identify many synonyms. For example Pictures & Photos are now treated as similar in meaning. From now onwards your search query GM crop will not lead you to GM (General Motors) website. Why? Cause, first by synonym identification Google will understand that GM may mean General Motors or Genetically Modified. Then by context i.e. by the keyword crop it will deduce that the user wants information on genetically modified crops and not on General Motors. Similarly, GM car will not lead you to genetically modified crop. Try out yourself to check how this newly added artificial intelligence works in Google. Also, there are many websites built on Web 3.0 which personalizes your search. The web is indeed getting intelligent.
8. Challenges :
1. Vastness: The World Wide Web contains at least 48 billion pages as of this writing (August 2, 2009). The SNOMED CT medical terminology ontology contains 370,000 class names, and existing technology has not yet been able to eliminate all semantically duplicated terms. Any automated reasoning system will have to deal with truly huge inputs. 2. Vagueness: These are imprecise concepts like "young" or "tall". This arises from the vagueness of user queries, of concepts represented by content providers, of matching query terms to provider terms and of trying to combine different knowledge bases with overlapping but subtly different concepts. Fuzzy logic is the most common technique for dealing with vagueness. 3. Uncertainty: These are precise concepts with uncertain values. For example, a patient might present a set of symptoms which correspond to a number of different distinct diagnoses each with a different probability. Probabilistic reasoning techniques are generally employed to address uncertainty. 4. Inconsistency: These are logical contradictions which will inevitably arise during the development of large ontologies .Deductive reasoning fails catastrophically when faced with inconsistency, because "anything follows from a contradiction.
24
5. Deceit: This is when the producer of the information is intentionally misleading the consumer of the information. Cryptography techniques are currently utilized to alleviate this threat.
9. Project Implementation:
This section provides some example projects and tools, but is very incomplete. The choice of projects is somewhat arbitrary but may serve illustrative purposes. It is also remarkable that in this early stage of the development of semantic web technology, it is already possible to compile a list of hundreds of components that in one way or another can be used in building or extending semantic webs.
A). DBPEDIA
DBpedia is an effort to publish structured data extracted from Wikipedia: the data is published in RDF and made available on the Web for use under the GNU Free Documentation License, thus allowing Semantic Web agents to provide inferencing and advanced querying over the Wikipedia-derived dataset and facilitating interlinking, re-use and extension in other datasources.
B). FOAF
A popular application of the semantic web is Friend of a Friend (or FoaF), which uses RDF to describe the relationships people have to other people and the "things" around them. FOAF permits intelligent agents to make sense of the thousands of connections people have with each other, their jobs and the items important to their lives; connections that may or may not be enumerated in searches using traditional web search engines. Because the connections are so vast in number, human interpretation of the information may not be the best way of analyzing them. FOAF is an example of how the Semantic Web attempts to make use of the relationships within a social context.
25
D). SIOC
The SIOC Project - Semantically-Interlinked Online Communities provides a vocabulary of terms and relationships that model web data spaces. Examples of such data spaces include, among others: discussion forums, weblogs, blogrolls / feed subscriptions, mailing lists, shared bookmarks, image galleries.
E). SIMILE
Semantic Interoperability of Metadata and Information in unLike Environments SIMILE is a joint project, conducted by the MIT Libraries and MIT CSAIL, which seeks to enhance interoperability among digital assets, schemata/vocabularies/ontologies, meta data, and services.
F). NEXTBIO
A database consolidating high-throughput life sciences experimental data tagged and connected via biomedical ontologies. Nextbio is accessible via a search engine interface. Researchers can contribute their findings for incorporation to the database. The database currently supports gene or protein expression data and is steadily expanding to support other biological data types.
26
H). OPENPSI
OpenPSI the (OpenPSI project) is a community effort to create UK government linked data service that supports research. It is a collaboration between the University of Southampton and the UK government, lead by OPSI at the National Archive and is supported by JISC funding.
I). ERFGOEDPLUS.BE
Erfgoedplus.be ('heritage-plus') is a Belgian project aimed at disclosing all types of heritage from the provinces of Limburg and Flemish Brabant and the city of Leuven to the public by applying semantic web technology. Erfgoedplus.be uses RDF/XML, OWL and SKOS to describe relationships to heritage types, concepts, objects, people, place and time. Data are normalized and enriched by means of thesauri (AAT) and an ontology (CIDOC CRM), available for input, conversion and navigation. Erfgoedplus.be is a regional aggregator for EuropeanaLocal (Europeana) and an example of how semantic web technology is applied within the heterogeneous context of heritage.
27
CONCLUSION
Semantic Web is the future of Internet. Semantic web is expected to re write the internet as we know it and change the way we search information on net. The searches will become personalized and the results will be more accurate and more relevant. The use of Resource Description Format and Microformats will help in the advent of this technology. Although there are many challenges that have to be overcome in order to do so but the possibility of this technology overcoming and replacing the traditional web model seem bright currently. The traditional model of internet does not allow for intelligent searches and takes a lot of time because of the irrelevant searches being displayed too. Semantic Web can overcome all these problems to provide a better and rich user experience to consumers all over the globe. The next generation of web will better connect people and will further advent the information technology revolution.
BIBLIOGRAPHY
28
www.wikipedia.org
University Of Leeds W3C(The world wide web consortium) , the main international standards organisation for the world wide web. thesemanticway.files.wordpress.com
www.google.com www.bing.com
w3c school.org
29