Escolar Documentos
Profissional Documentos
Cultura Documentos
5, MAY 2008
Abstract—The adaptive-subspace self-organizing map of temporally consecutive speech signals or of image patches
(ASSOM) is useful for invariant feature generation and visu- subject to transformations. By learning the episode as a whole,
alization. However, the learning procedure of the ASSOM is slow. the ASSOM is able to capture the transformation coded therein.
In this paper, two fast implementations of the ASSOM are pro-
posed to boost ASSOM learning based on insightful discussions of The simulation results in [1] and [2] have demonstrated that the
the basis rotation operator of ASSOM. We investigate the objec- ASSOM can induce ordered filter banks to account for transla-
tive function approximately maximized by the classical rotation tion, rotation, and scaling. The relationship between the neurons
operator. We then explore a sequence of two schemes to apply the in the ASSOM and their biological counterparts are reported in
proposed ASSOM implementations to saliency-based invariant [2]. The ASSOM has been successfully applied to speech pro-
feature construction for image classification. In the first scheme, a
cessing [6], texture segmentation [7], image retrieval [8], and
cumulative activity map computed from a single ASSOM is used
as descriptor of the input image. In the second scheme, we use one image classification [8], [9]. A supervised ASSOM was pro-
ASSOM for each image category and a joint cumulative activity posed by Ruiz-del-Solar in [7].
map is calculated as the descriptor. Both schemes are evaluated The traditional learning procedure of the ASSOM involves
on a subset of the Corel photo database with ten classes. The computations related to a rotation operator matrix, which not
multi-ASSOM scheme is favored. It is also applied to adult image only is memory demanding, but also has a computational load
filtering and shows promising results.
quadratic to the input dimension, i.e., the dimension of input
Index Terms—Adaptive-subspace self-organizing map vectors. Therefore, this algorithm in its original form is costly
(ASSOM), adult image filtering, feature construction, image
for practical applications, especially for image processing,
classification.
where input patterns are often high dimensional. In order to
reduce the learning time, the adaptive subspace map (ASM)
I. INTRODUCTION proposed by De Ridder et al. [8] drops topological ordering and
performs a batch-mode updating of the subspaces with PCA.
(9)
(19)
(10) (21)
(11) so
(25)
(16)
where
Moreover
(26)
(18) (27)
750 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 19, NO. 5, MAY 2008
TABLE I
TIMING RESULTS OF THE BASIC ASSOM, THE FL-ASSOM, AND THE BFL-ASSOM WITH RESPECT TO THE INPUT DIMENSION N AND THE SUBSPACE DIMENSION
H . LEFT SUBTABLE: THE BASIS VECTOR UPDATING TIME (VU). REPRESENTS THE MEAN VALUE AFTER 20 RUNS. REPRESENTS THE CORRESPONDING
SAMPLE STANDARD DEVIATION. RIGHT SUBTABLE: THE WHOLE LEARNING TIME (WL), WHICH INCLUDES THE TIME FOR MODULE COMPETITION,
BASIS VECTOR UPDATING, BASIS VECTOR DISSIPATION, AND ORTHONORMALIZATION. ALL THE TIMES ARE GIVEN IN SECONDS
Fig. 4. Basis vector updating time with respect to (a) the input dimension at III. SALIENCY-BASED INVARIANT FEATURE CONSTRUCTION
the subspace dimension H = 2 and (b) the subspace dimension at the input FOR IMAGE CLASSIFICATION
dimension N = 200. For sake of clarity, the updating time of the FL-ASSOM
and that of the BFL-ASSOM are magnified by a factor of 50. In this section, we explore a sequence of two schemes where
the ASSOM is applied to saliency-based invariant feature con-
struction for image classification. The implemented ASSOM
initializations are recorded. As was anticipated, the basis vector may be the FL-ASSOM or the BFL-ASSOM. We will compare
updating time of the basic ASSOM increases rapidly with the their performance in Section IV.
input dimension and is the bottleneck of the learning procedure.
With the FL-ASSOM, the basis vector updating time is dramat- A. Salient-Point Single-ASSOM Scheme (SPSAS)
ically reduced, increasing slowly with the input dimension and The first scheme to be explored is based on a single ASSOM
with the subspace dimension. It is no longer a bottleneck of the as shown in Fig. 5. Salient points are first detected from the
learning procedure. However, learning time outside basis vector input image by using the wavelet-based detector proposed in
updating is not reduced. Now with the BFL-ASSOM, the basis [28]. Working with wavelets is justified by the consideration of
vector updating time is further reduced. Moreover, learning time the HVS, for which multiresolution, orientation, and frequency
outside the basis vector updating is also reduced considerably analysis is of prime importance. In order to extract the salient
compared to the other two networks. Thus, the whole learning points, a wavelet transform is first performed on the grayscale
time decreases from 1956 s with the basic ASSOM to 16.2 s image. The obtained wavelet coefficients are represented by a
with the BFL-ASSOM when and . zero tree structure [34]. This tree is then scanned at a first time
The relationship between the basis vector updating time from leaves to the root to compute the saliency value at each
and the input dimension or the subspace dimension for the node. Afterwards, this tree is scanned for the second time from
three implementations of ASSOM is visualized in Fig. 4. The the root to leaves in order to determine the salient path from the
basis vector updating time increases approximately linearly root to raw salient points on the original image. By working with
with respect to the input dimension for the FL-ASSOM and grayscale images, the points located on boundaries of highlights
for the BFL-ASSOM, but apparently nonlinearly for the basic or shadows are apt to be detected as salient whereas they are
ASSOM. In all the cases, the updating time increases approxi- only caused by illumination conditions. To remove such false
mately linearly with respect to the subspace dimension. salient points, a gradient image is calculated by using the color
752 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 19, NO. 5, MAY 2008
Fig. 8. Feature filters for the ten categories of the SIMPLIcity training set. b : first basis vectors; b : second basis vectors.
Fig. 10. Examples of the classification results. Horizontal lines are separation marks of different rows. The correct labels are given in the top row. The second
row presents examples of correctly classified images. The bottom row shows the incorrectly classified images with the assigned (wrong) labels.
TABLE V
CONFUSION MATRIX OF THE SPMAS. THE BOLDFACED FIGURES
CORRESPOND TO CORRECTLY CLASSIFIED TEST IMAGES
Fig. 11. BFL-ASSOMs generated for adult images and benign images.
[7] J. Ruiz-del Solar, “TEXSOM: Texture segmentation using self-orga- [35] J. Lu, X. Yuan, and T. Yahagi, “A method of face recognition based
nizing maps,” Neurocomputing, vol. 21, no. 1-3, pp. 7–18, 1998. on fuzzy c-means clustering and associated sub-NNs,” IEEE Trans.
[8] D. de Ridder, O. Lemmers, R. P. W. Duin, and J. Kittler, “The adap- Neural Netw., vol. 18, no. 1, pp. 150–160, Jan. 2007.
tive subspace map for image description and image database retrieval,” [36] I. H. Witten and E. Frank, Data Mining: Practical Machine Learning
in Lecture Notes in Computer Science. Berlin, Germany: Springer- Tools and Techniques, 2nd ed. San Mateo, CA: Morgan Kaufmann,
Verlag, 2000, vol. 1876. 2005.
[9] B. Zhang, M. Fu, H. Yan, and M. Jabri, “Handwritten digit recognition [37] X. Tan, S. Chen, Z.-H. Zhou, and F. Zhang, “Recognizing partially oc-
by adaptive-subspace self-organizing map (ASSOM),” IEEE Trans. cluded, expression variant faces from single training image per person
Neural Netw., vol. 10, no. 4, pp. 939–945, Jul. 1999. with SOM and soft kNN ensemble,” IEEE Trans. Neural Netw., vol.
[10] E. Löpez-Rubio, J. Muñoz-Pérez, and J. A. Gömez-Ruiz, “A principal 16, no. 4, pp. 875–886, Jul. 2005.
components analysis self-organizing map,” Neural Netw., vol. 17, no. [38] R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification, 2nd
2, pp. 261–270, 2004. ed. New York: Wiley, 2001.
[11] E. Löpez-Rubio, J. Muñoz-Pérez, J. A. Gömez-Ruiz, and E. [39] T. Deselaers, D. Keysers, and H. Ney, “Features for image retrieval: A
Domínguez-Merino, “New learning rules for the ASSOM net- quantitative comparison,” in Lecture Notes in Computer Science, ser.
work,” Neural Comput. Appl., vol. 12, no. 2, pp. 109–118, 2003. 3175. Berlin, Germany: Springer-Verlag, 2004, pp. 228–236.
[12] S. Mcglinchey and C. Fyfe, “Fast formation of invariant feature maps,” [40] M. J. Jones and J. M. Rehg, “Statistical color models with application
in Proc. Eur. Signal Process. Conf., Island of Rhodes, Greece, 1998, to skin detection,” Int. J. Comput. Vis., vol. 46, no. 1, pp. 81–96, 2002.
pp. 1329–1332. [41] H. Zheng, M. Daoudi, and B. Jedynak, “Blocking adult images based
[13] E. Oja, “Neural networks, principal components, and subspaces,” Int. on statistical skin detection,” Electron. Lett. Comput. Vis. Image Anal.,
J. Neural Syst., vol. 1, no. 1, pp. 61–68, 1989. vol. 4, no. 2, pp. 1–14, 2004.
[14] I. Gordon, Theories of Visual Perception, 2nd ed. New York: Wiley, [42] D. Liu, X. Xiong, B. DasGupta, and H. Zhang, “Motif discoveries in
1997. unaligned molecular sequences using self-organizing neural networks,”
[15] D. Marr, Vision. San Francisco, CA: W. H. Freeman, 1982. IEEE Trans. Neural Netw., vol. 17, no. 4, pp. 919–928, Jul. 2007.
[16] C. Schmid and R. Mohr, “Local gray value invariants for image re-
trieval,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 19, no. 5, pp.
530–535, May 1997.
[17] D. G. Lowe, “Object recognition from local scale-invariant features,” Huicheng Zheng (M’07) was born in 1974. He re-
in Proc. Int. Conf. Comput. Vis., Corfu, Greece, 1999, pp. 1150–1157. ceived the B.Sc. degree in electronics and informa-
[18] C. Schmid, R. Mohr, and C. Bauckhage, “Evaluation of interest point tion systems and the M.Sc. (Eng.) degree in commu-
detectors,” Int. J. Comput. Vis., vol. 37, no. 2, pp. 151–172, 2000. nications and information systems from Sun Yat-sen
[19] A. W. M. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain, University, Guangzhou, China, in 1996 and 1999, re-
“Content-based image retrieval at the end of the early years,” IEEE spectively, and the Ph.D. degree in computer science
Trans. Pattern Anal. Mach. Intell., vol. 22, no. 12, pp. 1349–1380, Dec. from University of Lille 1, France, in 2004.
2000. From 2005 to 2006, he was a Research Fellow with
[20] N. Sebe and M. S. Lew, “Salient points for content-based retrieval,” in European Research Consortium for Informatics and
Proc. British Mach. Vis. Conf., Manchester, U.K., 2001, pp. 401–410. Mathematics, during which period the host institu-
[21] K. Mikolajczyk and C. Schmid, “A performance evaluation of local tions where he stayed were France Telecom R&D,
descriptors,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Rennes, France, and Trinity College Dublin, Dublin, Ireland. He served as a
Madison, WI, 2003, pp. 257–263. Visiting Scholar at Center for Imaging Science, The Johns Hopkins University,
[22] C. Carson, S. Belongie, H. Greenspan, and J. Malik, “Blobworld: Baltimore, MD, in 2004. Since 2007, he has been a Lecturer at the Department
Image segmentation using expectation-maximization and its applica- of Electronics and Communication Engineering, Sun Yat-sen University. His
tion to image querying,” IEEE Trans. Pattern Anal. Mach. Intell., vol. current research interests include computer vision, neural networks, machine
24, no. 8, pp. 1026–1038, Aug. 2002. learning, and other related areas.
[23] J. Z. Wang, J. Li, and G. Wiederhold, “SIMPLIcity: Semantics-sen-
sitive integrated matching for picture libraries,” IEEE Trans. Pattern
Anal. Mach. Intell., vol. 23, no. 9, pp. 947–963, Sep. 2001.
[24] L. Itti, C. Koch, and E. Niebur, “A model of saliency-based visual atten- Grégoire Lefebvre was born in 1977. He received
tion for rapid scene analysis,” IEEE Trans. Pattern Anal. Mach. Intell., the M.E. degree in applied mathematics from Na-
vol. 20, no. 11, pp. 1254–1259, Nov. 1998. tional Institute of Applied Sciences (INSA), Rouen,
[25] T. Gevers and A. W. M. Smeulders, “Pictoseek: Combining color France, in 2000 and the Ph.D. degree in cognitive
and shape invariant features for image retrieval,” IEEE Trans. Image science from University of Bordeaux, Bordeaux,
Process., vol. 9, no. 1, pp. 102–119, Jan. 2000. France, in 2007.
[26] C. Harris and M. Stephens, “A combined corner and edge detector,” in Since 2007, he has been a Researcher at France
Proc. 4th Alvey Vis. Conf., Manchester, U.K., 1988, pp. 147–151. Telecom R&D—Orange Labs, Rennes, France. His
[27] K. Mikolajczyk and C. Schmid, “An affine invariant interest point de- research interests include multimedia indexing, face
tector,” in Proc. 7th Eur. Conf. Comput. Vis., Copenhagen, Denmark, detection and recognition, and neural networks.
2002, pp. 128–142.
[28] C. Laurent, N. Laurent, M. Maurizot, and T. Dorval, “In-depth analysis
and evaluation of saliency-based color image indexing methods using
wavelet salient features,” Multimedia Tools Appl., vol. 31, no. 1, pp. Christophe Laurent was born in 1971. He received
73–94, 2006. the Ph.D. degree from the University of Bordeaux,
[29] J.-M. Geusebroek, R. van den Boomgaard, A. W. M. Smeulders, and Bordeaux, France, in 1998, where he worked on par-
H. Geerts, “Color invariance,” IEEE Trans. Pattern Anal. Mach. Intell., allel processing in the field of computer vision.
vol. 23, no. 12, pp. 1338–1350, Dec. 2001. From 1998 to 2001, he was a Research Engineer
[30] D. G. Lowe, “Distinctive image features from scale-invariant key- at Thomson Multimedia R&D, where he worked on
points,” Int. J. Comput. Vis., vol. 60, no. 2, pp. 91–110, 2004. security of information technology. During that pe-
[31] G. Csurka, C. Bray, C. Dance, and L. Fan, “Visual categorization with riod, he led several projects related to e-commerce,
bags of keypoints,” in Proc. 8th Eur. Conf. Comput. Vis., Prague, Czech network security, and protection of multimedia
Republic, May 2004, pp. 327–334. contents. In 2001, he joined France Telecom R&D,
[32] H. Robbins and S. Monro, “A stochastic approximation method,” Ann. where his research interests were image indexing
Math. Statist., vol. 22, no. 3, pp. 400–407, 1951. and particularly salient point-based image representation, string-based repre-
[33] E. Oja, Subspace Methods of Pattern Recognition. Letchworth, U.K.: sentation, image classification, and face processing. Since 2005, he has been
Research Studies Press, 1983. a Technical Architect at the billing IT division of France Telecom, where he
[34] J. Shapiro, “Embedded image coding using zero trees of wavelet coef- is currently in charge of the evolution of the settlement technical architecture
ficients,” IEEE Trans. Signal Process., vol. 41, no. 12, pp. 3445–3462, to embed new technologies. He has published more than 30 papers and is the
Dec. 1993. author of 20 patents.