Você está na página 1de 4

Tuning a CBIR system for vector images: the interface support

Tania Di Mascio Marco Francesconi Daniele Frigioni Laura Tarantino

Dipartimento di Ingegneria Elettrica Universita ` degli Studi dellAquila I-67040 Monteluco di Roio, LAquila - Italy {tania, frigioni, laura}@ing.univaq.it ABSTRACT
This paper presents a system supporting tuning and evaluation of a Content-Based Image Retrieval (CBIR) engine for vector images, by a graphical interface providing query-bysketch and query-by-example interaction with query results, and analysis of result quality. Vector images are rst modelled as an inertial system and then they are associated with descriptors representing visual features invariant to ane transformation. To support requirements of dierent application domains, the engine oers a variety of moment sets as well as dierent metrics for similarity computation. The graphical interface oers tools that helps in the selection of criteria and parameters necessary to tune the system to a specic application domain.
as well as editable/searchable text. Common vector formats include AI (Adobe Illustrator), CDR (CorelDRAW), CGM (Computer Graphics Metale), SWF (Shockwave Flash), and DXF (CAD software). Vector images would also be conveniently used on the Web, allowing faster download speeds, thanks to reduced size compared to raster images, as well as client-side scaling, thus avoiding new images to be sent. The SVG format [8], currently being developed by the W3C, will soon become a standard for vector images on the Web. Notwithstanding this increasing interest, the great majority of content-based image retrieval (CBIR) systems proposed in the literature deal with raster images (for a general discussion on CBIR systems see, e.g., [9]). On the other hand, there is the recognized need to search vector-images databases not only based on keywords or textual annotations, but also based on visual features (such as shape, color, and texture). Our work stem from the requirements of a 2D animation production environment [12] (a typical application relying on vector images), where ecient searching of animation material is crucial for helping the cartoonist in scene reuse. Images in this domain often possess few features and belong to well-dened logical categories (e.g., background, faces). A CBIR system supporting the work of cartoonists should hence comprise retrieval by primitive features (such as shape), and retrieval by logical features, aimed at extracting images of a given category. Other application domains utilizing vector images (e.g., Clip-art, drawing and CAD systems) share similar retrieval requirements. In [3] a CBIR system has been proposed including a feature extraction module associating images with descriptors representing visual features. Feature representations have to be invariant to ane transformations applied to a selected point, representative for the image. The image is considered as an inertial system and the center of mass is used as selected point. The inertial system is obtained by discretizing the image, and associating material points with basic elements obtained by the discretization process. Vectors of descriptors are built from moments of the inertial system, while similarity between any two images is computed as the similarity between the corresponding descriptor vectors. The retrieval eciency of moment invariants for retrieval by primitive features is documented in the literature for raster-based CBIR systems (see, e.g. [1, 6, 7, 14]); the

Categories and Subject Descriptors


H.5 [Information Interfaces and Presentation]: User Interfaces; H.3.3 [Information Storage and Retrieval]: Information search and Retrievalquery formulation, search process

Keywords
CBIR, vector images, visual interfaces

1.

INTRODUCTION

Though unsuitable for photo-realistic imagery, vector graphics are continually becoming more advanced and diused. Vector images are made up of many individual, scalable objects dened by mathematical equations rather than pixels. This makes vector images fully scalable, resolution independent, not restricted to rectangular shape, allowing layering
Partially supported by the University of LAquila under the project Representation and Interaction Techniques for Spatial Data
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. AVI '04, May 25-28, 2004, Gallipoli (LE), Italy 2004 ACM 1-58113-867-9/04/0500..$5.00

425

experimental results of [3] for vector images, initially based on the Hus proposal [6], indeed met the expectations. To support the dierent requirements of application domains diverse from 2D animation, with respect to discriminating power, we extended the engine by including a variety of moment sets, as well as a repertoire of dierent metrics for similarity computation. The system discussed in this paper supports tuning and evaluation of the engine, by a graphical interface providing query-by-sketch and query-by-example, interaction with query results, and analysis of result quality. The graphical interface oers tools to help designers in the selection of appropriate criteria and parameters. The engine can be tuned to the discrimination requirements of a selected application domain by choosing a moment set and a metric (along with a vector of weights) among those included in the system. The choice of the moment set and the metric is based on quality result indicators to establish the eectiveness of a search.

and representative for the image. Our approach is to consider the image as an inertial system and to use the center of mass as selected point. The inertial system is obtained by discretizing the vector image, and associating material points with basic elements obtained by the discretization process. The origin of the inertial system is then moved to the center of mass, to which transformation can be applied (see Figure 2). The shape descriptor vector contains values of an inertial moment set. The moments in the set are computed from terms representing distribution, skew, and kurtosis of the inertial system, which provide useful information about the image. In the literature, dierent moment sets have been proposed diering in the way the moments are computed (see, e.g., [10, 13]).

2.

THE CBIR ENGINE

In this section we briey summarize the main features of a CBIR engine that, dierently from most of the CBIR systems proposed in the literature [9], focuses on vector images rather than on raster images (a preliminary version of the engine was presented in [3]). The purpose of image processing in image retrieval is to enhance aspects in the image data relevant to the query, and to reduce the remaining aspects. In Figure 1 a gross dataow scheme typical of similarity search in CBIR processes is depicted (we refer to [11] for a detailed discussion of the schema). Given a query image, database images are ranked based on the similarity with the input image, so that more relevant images are returned rst in the query result. The processing hence requires a feature extraction module to associate images with vectors of descriptors representing visual features (e.g., color, texture, and shape), and ranking criteria to evaluate distances between descriptor vectors. The similarity between any two images is computed as the similarity between the two corresponding descriptor vectors.
Figure 2: Creation of the inertial system Concerning the similarity computation, dierent metrics have been proposed in the literature [5]. All metrics need a vector of coecients to adequately weigh the individual values of descriptor vectors. Our CBIR engine can be tuned to the discrimination requirements of a selected application domain by choosing a moment set and a metric (along with a vector of weights) among those included in the system. At the present implementation stage, our engine includes: Moments : Hu moments [6], Bamieh moments [14], Zernike moments [2]; Metrics : Euclidean distance, City Block distance, and Chebyschev distance [14], Cross Correlation distance and Discrimination Cost distance [2]. A certain degree of uncertainty is typical of similarity search, and, in particular, two types of errors may occur: 1. Images that do not exactly satisfy the user are returned in the query result; 2. Images that do satisfy the users query are not returned in the query results.

Figure 1: Architecture of the system Without loss of generality, our engine deals with shape extraction [4], since shape adequately identies and classies elements in our application domain (treatment of other visual features is a direct generalization of the shape case). The shape representation is required to be invariant to translation, rotation, and scaling. These ane transformations are to be regarded as applied to a selected point of the image

Engine tuning hence requires quality result indicators to establish the eectiveness of a search, i.e., to evaluate, on the one hand, how well the engine satisfy the domain requirements, and, on the other hand, the extent of the retrieval process errors. Let us denote as No Rel Im Ret the number of returned relevant images, as No Rel Im the total number of relevant images in the collection, and as No Im Ret the total number of images returned. The following two quantities are often used to measure the eectiveness of a search [5]:

426

Recall = No Rel Im Ret / No Rel Im


Precision = No Rel Im Ret / No Im Ret
Both these values range between 0 and 1. The goal would be for both Recall and Precision to be as close to 1 as possible. In practice this may be dicult to achieve, and the tuning process requires a trade-o between the two indicators. If either of the two indicators is low, the eectiveness of the search will be low as well.

former, always displayed on the screen, is used to handle users input actions and to visualize results and a rst system feedback. The latter are invoked to visualize additional system feedback to favor an in-depth tuning analysis. The main window of the interface (Figure 4) is partitioned into three display panels, spatially organized into three rows: 1. the upper panel accepts user input actions, and is in turn divided into three areas: the image processing request panel, the parameter setting panel, and a tool palette. The image processing request panel requires the user to rst provide an image, either by sketching it, or by specifying a le containing it, and then to activate the desired function: users may want to use the input image as an example image for a search, or may want to analyze it (e.g., to classify it with the support of the system). In either case processing parameters are to be set: a query requires to specify which set of moments is to be used as descriptor vector, and which metrics are to be applied in the similarity computation; the analysis of an individual image requires to specify the set of moments only. 2. the middle panel displays a scrollable list of the images retrieved due to a query, ranked by similarity as discussed in Section 2, along with progression bars giving echoes of the system processing. The panel is also used by users to provide relevance feedback. Images in the list may be selected to be provided as target image in a new search, in an incremental querying process. 3. the lower panel provides an initial indication on the search eectiveness, by displaying, both in tabular and in chart form, gures about recall and precision obtained with the selected metrics. The tools in the palette on the right side of the panel are used to invoke interactive log tables and charts (displayed by the chart viewers) visualizing more detailed and less aggregated data, which favor an in-depth analysis of the image processing. As discussed in Section 2, a trade-o between recall and precision may be necessary. It is therefore useful that the designer could inspect detailed log data that account for the role of moments, metrics and weights in the similarity computation. Several charts can be visualized and manipulated, e.g., to make comparisons among moment sets, or to appreciate the discriminating power of individual moment in a given set. Information of this type are useful to determine adequate weight vectors to eciently answer queries of logical level.

3.

THE APPLICATION DOMAIN

The CBIR engine is being validated in the framework of the production of 2D animation, in particular within the Paperless system, an advanced high quality 2D animation environment [12] supporting cartoon episode management. It is very common that cartoonists reuse animation scenes and frames from previous episodes into new episodes. Possibilities for scene reuse usually stem from the memory of the animators, with little or no computational aid. Ecient archival and searching of animation material is hence appropriate. The creation of cartoons is based on two fundamental aspects: episode realization and animation. Generally, to create an episode it is necessary to animate a large number of scenes, in turn composed by a number of frames. The traditional realization of a scene utilizes an appropriate device, called rostrum (see Figure 3). A single frame is shot by a camera perpendicular to a series of parallel transparent trays, each carrying a slide with one or more objects of the frame (e.g., background, characters, etc). Therefore a single scene is composed by one xed background and one or more animated characters. Hence images in this application domain often possess few characteristics, and often belong to well-dened categories (background, characters, faces, etc). Cartoonists may want to retrieve an image by providing a sketch of it, or to retrieve images similar to an example one, or to retrieve images of given categories.

5.
Figure 3: Rostrum

CONCLUSIONS

We presented an overview of a system that supports tuning and evaluation of content-based image retrieval (CBIR) for vector images by a graphical interface providing query-bysketch and query-by-example, interaction with query results, and analysis of result quality. The graphical interface oers tools to help designers in the selection of appropriate parameters, necessary to tune the system to requirements of application domains wrt discriminating power. The architecture of the interface includes a

4.

THE PROPOSED INTERFACE

The proposed interface was designed to help CBIR system designers to tune the engine in an interactive way, based on system feedback. Broadly speaking, the architecture of the interface includes a main window and chart viewers. The

427

Figure 4: The main window of the interface main window and chart viewers. The former, always displayed on the screen, is used to handle users input actions and to visualize results and a rst system feedback. The latter are invoked to visualize additional system feedback to favor an in-depth tuning analysis (e.g., to single out the role of individual moments and weight them opportunely). The system is being validated in the framework of the production of 2D animation, with interest to cartoon episode management. The retrieval system supports retrieval by primitive as well as by logical features. The search engine was initially part of the Paperless system, which utilizes a proprietary format for the images. Since we aim at using the system with dierent application domains, a new version of the engine is being implemented to make it SVG-compliant. International Jurnal of Human Computer Interaction, 15(1):145181, 2003. [6] M.K. Hu. Visual pattern recognition by moments invariants. IRE Transactions on Information Theory, 8:179187, 1997. [7] D. Kapur, Y.N. Lakshman, and T. Saxena. Computing invariants using elimination methods. In IEEE International Conference on Computer Vision. IEEE, November 1995. [8] Scalable Vector Graphics (SVG), W3 Consortium, http://www.w3.org/Graphics/SVG/ [9] A.W.M. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain. Content-based image retrieval at the end of the early years. IEEE Trans. on Pattern Analysis and Machine Intelligence, 22(12):13491380, 2000. [10] O.D. Trier, A.K. Jain, and T. Taxt. Feature extraction methods for character recognition: a survey. Pattern Recognition, 29(4):641662, 1996. [11] S. E. Umbaugh. Computer vision and image processing. Prentice Hall International, 1998. [12] V. Vennarini and G. Todesco. Tools for paperless animation. Tech. report, IST Project fact sheet, 2001. http://inf2.pira.co.uk/mmctprojects/paperless.htm. [13] L. Yang and F. Albregtsen. Fast and exact computation of cartesian geometric moments using discrete greens theorem. Pattern Recognition, 29(7):10611073, 1996. [14] L. Yang and F. Algregtsen. Fast computation of invariant geometric moments: a new method giving correct results. In IEEE International Conference on Pattern Recognition, pages 201204, 1994.

6.

REFERENCES

[1] M.M. Babu, M. Kankanhalli, and W.F. Lee. Shape measures for content based image retrieval: a comparison. Information Processing and Management, 33(3):319337, 1997. [2] Y.C. Chim, A.A. Kassim, and Y. Ibrahim. Character recognition using statistical moments. Image and Vision Computing, 17:299307, 1997. [3] T. Di Mascio and L. Tarantino. Main features of a cbir prototype supporting cartoon production. In 10th International Human Computer Interaction (HCI2003), volume 1, pages 921925, 2003. [4] G. Gagaudakis and P. Rosin. Shape measures for image retrieval. In IEEE International Conference on Image Processing, pages 757760. IEEE, 2001. [5] H. R. Hartson, T. S.Andre, and R. C.Williges. Criteria for evaluating usability evaluation methods.

428

Você também pode gostar