Você está na página 1de 10

Contents

ACKNOWLEDGEMENT .............................................................................................................................. 2 INTRODUCTION ....................................................................................................................................... 3 PROBLEM STATEMENT............................................................................................................................. 4 GOALS AND OBJECTIVE ............................................................................................................................ 5 METHODOLOGY ....................................................................................................................................... 6 PROJECT SCHEDULE ................................................................................................................................. 9 REFRENCES ............................................................................................................................................ 10

ACKNOWLEDGEMENT
We would like to thank our Head of Department of Electronics and Computer Engineering, Prof. Dr. Shashidhar Ram Joshi, for providing us an opportunity to work on this project. We express our humble gratitude towards Dr. Aman Shakya for helping us in deciding our project title and his valuable guidelines. We could not have done it without him. We would also like to thank other faculty members of DOECE who helped us directly or indirectly in helping us compile this project. Lastly we would thank our friends and classmates of 065/BCT for helping us in any ways possible to get us here. Sameer Dangol (065/BCT/537) Subash Poudel (065/BCT/542) Susan Joshi (065/BCT/546) Rajan Maharjan (065/BCT/548)

INTRODUCTION
Efficient access to multimedia information requires the ability to search and organize the information. While, the technology to search text has been available for some time - and in the form of web search engines is familiar to many people - the technology to search images and videos is much more challenging. In general, people would like to pose semantic queries using textual descriptions and nd images relevant to those semantic queries. For example, one should be able to pose a query like nd me all images of tigers in grass . This is difficult if not impossible with many of these image retrieval systems and hence has not led to widespread adoption of these systems. The traditional solution to this problem, used by libraries and other organizations is to annotate such images manually and then search those annotations. Although this allows semantic image retrieval manual annotations are expensive and do not always capture the content of images and videos well. Image annotation is the process of assigning metadata in the form of captioning or keyword to a digital image. According to the W3C, "The Semantic Web provides a common framework that allows data to be shared and reused across application, enterprise, and community boundaries." Multimedia and the semantic web in theory is a perfect match. The Semantic Web, on the one hand, provides a stack of languages and technologies for annotating Web resources, enabling machine processing of metadata describing semantics of web content. Image applications, on the other hand, require metadata descriptions of their media items to facilitate search and retrieval, intelligent processing and effective presentation of multimedia information. This need for multimedia metadata was recognized by the media industry long ago. Semantic Web technologies, however, still play a very minor role within multimedia applications like images and most approaches employ non-RDF based techniques.

PROBLEM STATEMENT
Manual annotation of large number of images, their sorting and searching can be very tedious, exhausting and expensive. A general rule is that it is much easier to annotate earlier rather than later. The single most "best" practice in image annotation is that, adding metadata during the production process is much cheaper and yields higher quality annotations than adding metadata in a later stage (such as by automatic analysis of the digital artifact or by manual postproduction data). Manual annotation can provide image descriptions at the right level of abstraction. On the other hand, annotation based on automatic feature extraction is relatively fast and cheap, and can be more systematic. There are various classifications of metadata and these may or may not be interoperable. Reusing metadata created by another tool is often hindered by a lack of interoperability. First, a tool may use a different syntax for its file formats. As a consequence, other tools are not able to read in the annotations produced by this tool. Second, a particular tool may assign a different meaning (semantics) to the same annotation. In this situation, the tool may be able to read annotations from other tools, but will fail to process them in the way originally intended. Both problems can be solved by using Semantic Web technology. First, the Semantic Web provides developers with means to explicitly specify the syntax and semantics of their metadata. Second, it allows tool developers to make explicit how their terminology relates to that of other tools.

GOALS AND OBJECTIVE


The major objectives of our project are as follows: y y y y y y y Investigate potential approaches for image annotation strategies that combine, integrate the Semantic Web languages approach like RDF, OWL, SPARQL, etc. Apply semantic technologies in multimedia annotation. Implement content based annotation to the image data so that image retrieval would be accurate and simple. Automated annotation of images Develop a standardized approach for retrieving, aggregating, using and presenting metadata Experiment with different approaches for automatic annotations extraction based on machine learning algorithms and the available semantic descriptions. Study the syntactic and semantic interoperability problem

METHODOLOGY
What is to be done : The information conveyed by a multimedia document can be formalized, represented, analyzed and processed in three different levels of abstraction: the sub-symbolic, the symbolic and the logical.

Figure 1: Abstraction levels of multimedia annotation

The sub-symbolic level of abstraction covers the raw multimedia information represented in well known formats for video, image, audio, text, metadata, etc. Note that these are typically binary formats, typically optimized for compression and streaming delivery. They are not necessarily well-suited for further processing that uses, for example, the internal structure or specific features of the media stream. To address this issue, one can introduce a level of abstraction, the middle layer in Figure 1, which provides this information. This is the approach of MPEG-7, which allows one to use the output of feature detectors, (multi-cue) segmentation algorithms, etc. to provide a structural layer on top of the binary media stream. Note that information on this level is typically serialized in XML. The problem with this XML-based, structural layer is that the semantics of the information encoded in the XML is specified only in the specification of, for example, the MPEG-7 standard and needs to be hard wired into the code by the programmer of the MPEG-7 application software. It also makes it hard toreuse this data in environments that are not based on MPEG-7, or to integrate non-MPEG metadata in an MPEG-7 application.

To address this, one could simply replace the middle layer by another one that is open and has formal, machine processable semantics. This, however, would not take advantage of existing XML-based metadata, and, more importantly, ignore the advantages of an XML-based structural layer (more on that later). So, rather than replacing the middle layer, a solution is to add a third layer that provides the semantics for the middle layer.

How annotation can be achieved:

Annotation using keywords Each image is annotated by having a list of keywords associated with it. There are two possibilities for choosing the keywords: (1) The annotator can use arbitrary keywords as required. (2) The annotator is restricted to using a pre-defined list of keywords (a controlled vocabulary).

This information can be provided at two levels of specificity: (1) A list of keywords associated with the complete image, listing what is in the image (see Figure 2a for an example). (2) A segmentation of the image along with keywords associated with each region of the segmentation. In addition, keywords describing the whole image can be provided (see Figure 2b for an example). Often the segmentation is much simpler than that shown, consisting simply of a rectangular region drawn around the region of interest or a division of the image into foreground and background pixels.

Figure 2(a): outdoors, dog, grass, brick surface 7

Figure 2(b): outdoors

Fig.2. Examples of image annotation: (a) Whole image annotation the listed keywords are associated with the image. (b) Segmentation and annotation keywords are associated with each region of the segmentation. Keywords describing the whole image can also be used.

Annotations based on ontologies An ontology is a specification of a conceptualization. It basically contains concepts (entities) and their relationships and rules. Adding a hierarchical structure to a collection of keywords produces a taxonomy, which is an ontology as it encodes the relationship is a (a dog is an animal). An ontology can solve the problem that some keywords are ambiguous. For example, a leopard could be a large cat, a tank, a gecko or a Mac operating system. Ontologies are important for the Semantic Web, and hence a number of languages exist for their formalisation, such as OWL and RDF. Free text annotation For this type of annotation, the user can annotate using any combination of words or sentences. This makes it easy to annotate, but more difficult to use the annotation later for image retrieval. Often this option is used in addition to the choice of keywords or ontology. This is to make up for the limitation stated There is no way the domain ontology can be complete it will not include everything a user might want to say about a photograph . Any concepts which cannot adequately be described by choosing keywords are simply added in free form description. This is the approach used in the W3C RDFPic software in which the content description keywords are limited to the following: Portrait, Groupportrait, Landscape, Baby, Architecture, Wedding, Macro, Graphic, Panorama and Animal. This is supplemented by a free text description. The IBM VideoAnnEx software also provides this option. The ImageCLEF 2004 bilingual ad hoc retrieval task used 25 categories of images each labelled by a semistructured title (in 13 languages). Examples of the English versions of these titles are: Portrait pictures of church ministers by Thomas Rodger Photos of Rome taken in April 1908 Views of St. Andrews cathedral by John Fairweather Men in military uniform, George Middlemass Cowie Fishing vessels in Northern Ireland The IAPR-TC12 dataset of 20 000 images contains free text descriptions of each image in English, German and Spanish. These are divided into title , description and notes fields. Additional contentindependent metadata such as date, photographer and location are also stored. Figure 2 shows the annotation of one of the photos.

PROJECT SCHEDULE

REFRENCES
y y y http://www.w3.org/2005/Incubator/mmsem/XGR-image-annotation/#annot_intro Daniela Kolarova, Gennady Agre, Danail Dochev. An Annotea-Based Approach for Multimedia Data Integration and Semantic Annotation Services in the SINUS Platform, 2010 Jacco van Ossenbruggen , Giorgos Stamou and Je Z. Pan. Multimedia Annotations and the Semantic Web

10

Você também pode gostar