Você está na página 1de 12

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, NO.

11, NOVEMBER 2010

2901

Multiframe Super-Resolution Reconstruction of Small Moving Objects


Adam W. M. van Eekeren, Member, IEEE, Klamer Schutte, and Lucas J. van Vliet, Member, IEEE
AbstractMultiframe super-resolution (SR) reconstruction of small moving objects against a cluttered background is difcult for two reasons: a small object consists completely of mixed boundary pixels and the background contribution changes from frame-to-frame. We present a solution to this problem that greatly improves recognition of small moving objects under the assumption of a simple linear motion model in the real-world. The presented method not only explicitly models the image acquisition system, but also the space-time variant fore- and background contributions to the mixed pixels. The latter is due to a changing local background as a result of the apparent motion. The method simultaneously estimates a subpixel precise polygon boundary as well as a high-resolution (HR) intensity description of a small moving object subject to a modied total variation constraint. Experiments on simulated and real-world data show excellent performance of the proposed multiframe SR reconstruction method. Index TermsBoundary description, moving object, partial area effect, super-resolution (SR) reconstruction.

I. INTRODUCTION

N SURVEILLANCE applications, the most interesting events are dynamic events consisting of changes occurring in the scene such as moving persons or moving objects. In this paper, we focus on multiframe super-resolution (SR) reconstruction of small moving objects in under-sampled image sequences. Small objects are objects that are completely comprised of boundary pixels. Each boundary pixel is a mixed pixel, and its value has both contributions of the moving foreground object and the locally varying background. Hence, not only do the fractions change from frame-to-frame, but also the local background values change due to the apparent motion. Especially for small moving objects, an improvement in resolution is useful to permit classication or identication.

Manuscript received November 25, 2008; revised April 24, 2010; accepted April 24, 2010. Date of publication August 19, 2010; date of current version October 15, 2010. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Michael Elad. A. W. M. van Eekeren is with the Electro Optics Group at TNO Defence, Security, and Safety, The Hague, The Netherlands. He is also with the Quantitative Imaging Group, Delft University of Technology, Delft, The Netherlands (e-mail: adam.vaneekeren@tno.nl). K. Schutte is with the Electro Optics group at TNO Defence, Security and Safety, The Hague, The Netherlands (e-mail: klamer.schutte@tno.nl). L. J. van Vliet is with the Quantitative Imaging Group at Delft University of Technology, Delft, The Netherlands (e-mail: L.J.vanVliet@TUDelft.nl). Color versions of one or more of the gures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identier 10.1109/TIP.2010.2068210

Multiframe SR reconstruction1 improves the spatial resolution by exchanging temporal information of a sequence of subpixel displaced low-resolution (LR) images for spatial information. Although the concept of SR reconstruction already exists for more than 20 years [1], relatively little attention is given to SR reconstruction of moving objects. In [2][8], this subject was addressed for various dedicated tasks. Although [2] and [5] apply different SR reconstruction methods, i.e., iterative-back-projection [9] and projection onto convex sets [10], respectively, both use a validity map in their reconstruction process. This makes these methods robust to motion outliers. Both methods perform well on large moving objects that obey to a simple translational motion model. For large objects, only a small fraction of the pixels are boundary pixels. Hardie et al. [7] use optical ow to segment a moving object and subsequently apply SR reconstruction to it. In their work, the background is static and SR reconstruction is only applied to the masked area inside a large moving object. In [6], Kalman lters are used to reduce edge artifacts at the boundary between fore- and background. However, the foreand background are not explicitly modeled in this method. In previous work [3], we presented a system that applies SR reconstruction after a segmentation step simultaneously to a large moving object and the background using Hardies method [7]. Again, no SR reconstruction is applied to the boundary of mixed pixels separating the moving object from a cluttered background. In [4], we presented the rst attempt of SR reconstruction on small moving objects with simulated data. At that time no experiments were done on real-world data which lifted the need for a very precise estimate of the objects trajectory. In [8], SR reconstruction is performed on moving vehicles of approximately 10 by 20 pixels. For object registration a trajectory model is used in combination with a consistency measure of the local background and vehicle. However, in the SR reconstruction approach no attention is given to mixed pixels. An interesting subset of moving objects consists of faces. Efforts in that area using SR reconstruction include [11] and [12], in which the modeling of complex motion is a key element. However, the faces in the LR input images used are far larger than the small objects that we focus on in this paper. SR reconstruction on moving objects is also applied in astronomy. An overview can be found in [13], where it is explained that SR reconstruction is only possible under the condition that the solution is very sparse, i.e., very few samples having a value larger than zero. In contrast, our SR reconstruction method is designed to handle nonzero cluttered backgrounds.
1In the remainder of this paper SR reconstruction refers to multi-frame SR reconstruction.

1057-7149/$26.00 2010 IEEE

2902

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, NO. 11, NOVEMBER 2010

^ Fig. 1. Flow diagram illustrating the construction of a 2-D HR image z representing the cameras eld-of-view and the degradation thereof into a LR frame y via a camera model.

For small moving objects that consist completely of mixed pixels against a cluttered background, the state-of-the-art pixel-based SR reconstruction methods mentioned previously will fail. Pixel-based SR reconstruction methods make an error at the object boundary, because they cannot disentangle the contributions from the space-time variant background and foreground information within a mixed pixel. To tackle the aforementioned problem we incorporate a subpixel precise object boundary model with a high-resolution (HR) pixel grid. We simultaneously estimate this polygonal object boundary as well as a HR intensity description of a small moving object subject to a modied total variation constraint. Assuming rigid objects that move with constant speed through the real world, object registration is achieved by tting a trajectory through the objects center-of-mass in each frame. The approach assumes that a HR background image is estimated rst. Robust SR reconstruction methods can accomplish this. They treat the intensity uctuations after global registration caused by the small moving object as outliers. Especially for small moving objects our approach signicantly improves object recognition. Note that the use of the proposed SR reconstruction method is not limited to small moving objects. It can also be used to improve the resolution of boundary regions of larger moving objects as long as the size of the object does not prohibit proper SR reconstruction of the background. The paper is organized as follows. First, in Section II we present the forward model for a simulated HR scene and the observed LR image data by an electro-optical sensor system. In Section III, the three steps of the proposed SR reconstruction method for small moving objects are presented. Section IV presents experiments on simulated data, followed by a real-world experiment in Section V. Finally, in Section VI the main conclusions are presented.

A. 2-D HR Scene We model a cameras eld-of-viewthe sceneat frame as a properly sampled 2-D HR image . Each frame consists of pixels without signicant degradation due to motion, blur or noise. Let us express this image in lexicographical notation . The image is conas the vector structed from a translated HR background intensity description , consisting of pixels, and a translated HR , consisting foreground intensity description of pixels. This is depicted in the left part of Fig. 1. Note that the foreground has a different apparent motion with respect to the camera than the background . The small moving object in the foreground is not only represented by its HR intensity description , but also by a subpixel precise polygon boundary , the number of vertices. We impose the following aswith sumptions on the motion of the object: 1) the aspect anglethe angle between the direction of motion and the optical axis of the camerastays the same and 2) the object is moving with a constant velocity, i.e., the acceleration is zero. These are realistic assumptions if the object is far away from the camera and for a short duration up to a few seconds. The latter does not limit the acquisition of a large number LR frames due to the high frame rate of todays image sensors. At frame the HR background and the HR foreground are in which the translated and merged into the 2-D HR image th pixel is dened by

(1) and . is the number of frames. The summation over represent the translation of foreground pixel to by bilinear interpolation and similarly, the summation over translates background pixel to . The weight represents the foreground contribution at pixel in frame depending upon the polygon boundary . The foreground contribution varies between 0 and 1, so the corresponding . background contribution equals by denition th HR Fig. 2 depicts the construction of the image by masking both the translated background, , and the translated foreground, , after which the constituents are merged . The polygon boundary denes the foreground into with Here,

II. FORWARD MODEL: REAL-WORLD DATA DESCRIPTION This section describes the two steps of our forward model to constructs a LR camera frame from HR representations of the fore- and background in combination with a subpixel precise polygon model of our object. The rst step models the construction of a 2-D HR image including the moving object whereas the second step models the image degradation as a result of the physical properties of our camera system.

VAN EEKEREN et al.: MULTIFRAME SUPER-RESOLUTION RECONSTRUCTION OF SMALL MOVING OBJECTS

2903

Fig. 2. Flow diagram illustrating the masking of foreground and background constituents and the merging thereof into the HR image z . The polygon boundary p is superimposed on the background contributions (1 c ) for visualization purposes only. Note that in the weight images c and (1 c ) black (= 0) indicates no contribution, white (= 1) indicates full contribution and greys indicate a partial contribution.

contributions HR frame .

and the background contributions

in

All in all, the observed as follows:

th LR pixel from frame is modeled

B. Camera Model is obtained by applying the camera A LR camera frame representing the cameras model to the 2-D HR image eld-of-view. The camera model comprises two types of image blur, sampling, and degradation by noise. Blur: The optical point-spread-function (PSF), together with the sensor PSF, will cause a blurring in the image plane. In this paper, the optical blur is modeled by a . The Gaussian function with standard deviation sensor blur is modeled by a uniform rectangular function representing the ll-factor of each sensor element. A convolution of both functions represents the total blurring function. Sampling: The sampling as depicted in Fig. 1 reects the pixel pitch only. The integration of photons over the photosensitive area of a pixel is accounted for by the aforementioned sensor blur. Noise: The temporal noise in the recorded data is modeled by additive, independent and identically distributed with standard deviation . Gaussian noise samples For the recorded data used, independent additive Gaussian distributed noise is a sufciently accurate noise model. Other types of noise, like xed pattern noise (FPN) and bad pixels, are not explicitly modeled. For applications where FPN becomes a hindrance, it is advised to correct the captured data prior to SR reconstruction using a scene-based non uniformity correction algorithm, such as the one proposed in [14].

(2) for and . denotes the number of LR pixels in . The weight Here, represents the contribution of HR pixel to estimated . Each contribution is determined by the blurring LR pixel and sampling of the camera. represents an additive, independent and identically distributed Gaussian noise sample with standard deviation . III. DESCRIPTION OF PROPOSED METHOD The proposed SR reconstruction method can be divided into three parts: 1) applying SR reconstruction to the background for subsequent detection of moving objects from the residue between the observed LR frame and a simulated LR frame based upon the estimated HR background at that instance; 2) tting a trajectory model to the detected instances of the moving object through the image sequence to obtain subpixel precise object registration; and 3) obtaining a HR object representationcomprised of a subpixel precise boundary and a HR intensity descriptionby solving an inverse problem based upon the model of Section II. We start with the third step, because it is the key innovative part of the proposed method. A. SR Reconstruction of a Small Moving Object To nd the optimal HR description of the object (consisting of a polygon boundary and a HR intensity description ), we solve an inverse problem based upon the camera observation

2904

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, NO. 11, NOVEMBER 2010

model described in (1) and (2). To favor sparse solutions of this ill-posed problem we added two regularization terms: one to penalize intensity transitions in the HR intensity description and one to avoid unrealistically wild object shapes. These observations give rise to the following cost function:
Fig. 3. Two examples to illustrate the expression for polygon regularization 0 at vertex v of polygon p. (a) 0 is minimal for =  , (b) 0 is maximal for = 0 _ = 2 .

(3) where the rst summation term represents the normalized data mist contributions for all pixels . Normalization is performed with respect to the total number of LR pixels and the denotes the measured intensities noise variance . Here, the corresponding estimated of the observed LR pixels and intensities obtained using the forward model of Section II. Alare also dependent upon though the estimated intensities the background , only and are varied to minimize (3). The HR background is estimated in advance as described in Section III-B. is a regularization The second term of the cost function term which favors sparse solutions by penalizing the amount of intensity variation within the object according to a criterion similar to the bilateral total variation (BTV) criterion [15]. Here, is the shift operator that shifts by pixels in horizontal direction whereas shifts by pixels in vertical direction. The actual minimization of the cost function is done in an iterative way by the LevenbergMarquardt (LM) algorithm [16]. This optimization algorithm assumes that the cost function has a rst derivative that exists everywhere. However, the L1-norm used in the TV criterion does not satisfy this assumption. Therefore, we introduce the hyperbolic norm (4) This norm has the same properties as the L1-norm for large and it has a rst (and second) derivative that values is used. exists everywhere. For all experiments the value The third term of (3) constrains the shape of the polygon by penalizing the variation of the polygon boundary . Regularization is needed to penalize unwanted protrusions, such as spikes, which cover a very small area compared to the total object area. This constraint is embodied by the measure , which is small when the polygon boundary is smooth with ( (5)

angles, sharp protrusions. Note that this measure also becomes (inward pointing spike). very large for by a mulNote that in (3) normalization is performed on tiplication with the square of the mean edge length , the total edge length of with the number of vertices and . This normalization prevents extensive growth of edges. As mentioned previously, the actual minimization of the cost function is performed in an iterative way by the LevenbergMarquardt algorithm [16]. To allow this, we put the cost function of (3) in the LM framework, which expects a format where is the measurement like is the estimate depending upon parameter . In and general, it is straightforward to store all residues, for example , in a vector which forms the input of the LM algorithm. In our case, we have to be aware of the different norms in each of the terms of (3). The residue vector looks like

(6)

is the inverse of , which is the area spanned by the edges and ) at vertex and half the angle between those edges as indicated by the right part of (5). From example (a) in Fig. 3 it is clear why the area is calcu: if we would take the full angle lated with half the angle , would be zero, which would result in . Example (b) shows that the measure will be very large for small

where the letters on top indicate the number of elements used in each part of the cost function. The length of the vector in (6) is . The cost function in (3) is iteratively minimized to simultaneously nd the optimal and . A ow diagram of this iterative minimization procedure in steady state is depicted in Fig. 4. Here the Cost function refers to (3) and the Camera model to used for formulas (1) and (2). Note that the measured data the minimization procedure contains a small region-of-interest (ROI) around the moving object in each frame only. The optimization scheme depicted in Fig. 4 has to be initialized with an object boundary and an object intensity description . These can be obtained in several ways; we have chosen to use a simple and robust initialization that proved to initialize

VAN EEKEREN et al.: MULTIFRAME SUPER-RESOLUTION RECONSTRUCTION OF SMALL MOVING OBJECTS

2905

Fig. 4. Flow diagram illustrating the steady state of estimating a HR description of a moving object (p and f ). y denotes the measured intensities in a region ~ of interest containing the moving object in all frames after registration and y denotes the corresponding estimated intensities at iteration i. Note that the initial HR object description (p and f ) is derived from the measured LR sequence and the object mask sequence.

recorded LR frame we apply the camera model to simulate the LR frame with identical aliasing artifacts as in the recorded LR frame, but without the small object. Thresholding the absolute value of the residue image yields a powerful tool for object detection, provided that the apparent motion is sufcient given the number of frames to be used in background reconstruction. AsLR frames containing a moving object of width suming (expressed in LR pixels), the apparent lateral motion must exLR pixels/frame for a proper background ceed reconstruction. Several robust SR reconstruction methods have been reported [15], [18], [19]. We choose the method developed by Zomet et al. [19], which is robust to intensity outliers, such as those caused by small moving objects. This method employs the same camera model as presented in (2). Its robustness is introduced by a robust back-projection (7) where median denotes a scaled pixel-wise median over the is the projection operator from HR image to frames and LR frame . A LR representation of the background, obtained by applying the camera model to the shifted HR background image , is compared to the corresponding LR frame of the recorded image sequence

the method close enough to the global minimum to permit convergence to the global minimum in most practical cases. The initial object boundary is obtained by rst calculating the frame-wise median width and the frame-wise median height (dened in the of the mask in the object mask sequence next section). Subsequently, we construct an elliptical object boundary from the previously calculated width and height. Upon initialization the vertices are evenly distributed over the ellipse. The number of vertices is xed during minimization. The object intensity distribution is initialized by a constant intensity equal to the median value over all masked pixel intenin the measured LR sequence . sities Furthermore, the optimization procedure is performed in two steps. The rst step consists of the initialization described previously followed by a few iterations of the LM algorithm. We derived during experimentation that using more than ve iterations has no effect on the nal result. After this step the intensity description often contains large gradients perpendicular to the estimated object boundary, where pixels outside the contour still contain the initial initialization values. As this can cause getting stuck in local minima, a partial reinitialization step is proposed. In this step, all intensities of HR foreground pixels adjacent to a mixed boundary pixel but located completely inside the object boundary are propagated outwards. After this partial reinitialization, we continue the iterative procedure until convergence or for a xed number of iterations to be determined in a simulation experiment. B. SR Reconstruction of Background and Moving Object Detection A small moving object causes a temporary change of a small localized set of pixel intensities. In previous work [17], we presented a framework for the detection of moving point targets against a static cluttered background. A robust pixel-based SR reconstruction method computes a HR background image by treating the local intensity variations caused by the small object as outliers. After registration of the HR background to a

(8) where represents the blur and down-sample operation, is the th pixel of the shifted HR background in frame and is the recorded intensity of the th pixel in frame . All difference pixels constitute a residual image sequence in which a moving object can be detected. Thresholding this residual image sequence followed by tracking improves the detectability for low residue-to-noise ratios. Threshold selection is done with the chord method from Zack et al. [20], which is illustrated in Fig. 5. With this histogram based method an object mask sequence results for and , with the number of observed LR the number of pixels in each LR frame. frames and After thresholding, multiple events may have been detected . We apply tracking to link the most simin each frame of ilar events in each frame to a so-called reference event. This ref, the median erence event is dened by the median width and the median residual energy of the largest height event in each frame (median is computed frame-wise). Next, we search in each frame for the event with the smallest normalw.r.t. the reference event shown in ized Euclidian distance (9) at the bottom of the next page, with the index of the event in frame with the smallest normalized Euclidian distance to the reference event. After this tracking step an object mask seis generated with in each frame at most one event, quence the one corresponding to the object giving rise to the reference event. Note that a frame can be empty if no event was detected.

2906

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, NO. 11, NOVEMBER 2010

To t a trajectory, all object positions in time must be known w.r.t. a reference point in the background of the scene. This is done by adding the previously obtained apparent background to the calculated object position for each frame: translation . To obtain all object positions with subpixel precision, a robust is performed. Assuming t to the measured object positions constant motion, all object positions can be described by a referand a translation . Both the reference ence object position object position and the translation of the object are estimated by minimizing the following cost function: (11)
Fig. 5. Threshold selection by the chord method is based upon nding the value of  that maximizes the distance D between the histogram and the chord. The value T is used as threshold value.

where denotes the Euclidean distance in LR pixels between the measured object position and the estimated object position at frame (12) The cost function in (11) is known as the Gaussian norm [22]. This norm is robust to outliers (e.g., false detections in our case). The smoothing parameter is set to 0.5 LR pixel. Minimizing the cost function in (11) with the LevenbergMarquardt algorithm results in an accurate subpixel precise registration of the are used, the regismoving object. If, e.g., 50 frames tration precision is improved by a factor 7. D. Computational Complexity The computational complexity is dominated by calculating (3), i.e., computing the SR reconstruction of the HR foreground. At every iteration of the LM optimization procedure, the cost function has to be calculated for variations in the estimated parameters to estimate the gradient w.r.t. the parameters to be solved. The cost function has to be evaluated times, with the number of HR foreground intensities, the number of vertices and # the number of LM iterations. A recon, , struction as described in Section IV-B ( ) using Matlab code took 37 min on a Pentium-4, 3.2 GHz processor under Windows. The processing time can be drastically reduced if a precomputation of the partial derivatives of the cost function w.r.t. the HR foreground intensities is performed off-line and stored. In . this case, the computational complexity reduces to thereby forecasting a reduction in Note that typically the computation time by one order of magnitude.

C. Moving Object Registration The object mask sequence , obtained after thresholding and tracking, gives a rough quantized indication of the position of the object in each frame. For performing SR reconstruction, a more precise, subpixel registration is needed. For large moving objects which contain a sufcient number of internal pixels with sufcient structure, gradient-based registration [21] can be performed. In the setting of small moving objects, this is usually not the case and another approach is needed. Assuming a linear motion model for a moving object in the real-world, the projected model can be tted to the sequence of detected object positions. We assume a constant velocity without acceleration in the real world, which seems realistic given the nature of small moving objects: the objects are far away from the observer and will have a small acceleration frames due to the high frame rate of todays within the image sensors. is deterFirst, the position of the object in each frame mined by computing the weighted center-of-mass (COM) of the masked pixels as follows:

(10) with the number of LR pixels in frame , the location the corresponding mask value (0 or 1) and of pixel , is the measured intensity.

(9)

VAN EEKEREN et al.: MULTIFRAME SUPER-RESOLUTION RECONSTRUCTION OF SMALL MOVING OBJECTS

2907

IV. EXPERIMENTS ON SIMULATED DATA The proposed SR reconstruction method for small moving objects is rst applied to simulated data to study the behavior under controlled conditions. In a series of experiments, we tune the regularization parameters and the number of iterations. Then we study the convergence, the robustness in the presence of clutter and noise, and the robustness against violations of the underlying linear motion model. A. Generating the Simulated Car Sequence The simulated car sequence was generated to resemble the real-world sequence of the next section as good as possible. We simulated an under-sampled image sequence containing a small moving car using the camera model as depicted in Fig. 1. The parameters of the camera model were chosen to match the sensor properties of the real-world system, i.e., optical blurring LR pixel) (Gaussian kernel with standard deviation and sensor blurring (rectangular uniform lter with a 100% llfactor) and Gaussian distributed noise to resemble the actual noise conditions (see below). The car follows a linear motion trajectory with zero acceleration. It consists of two internal intensities, which are both above the median background intensity. The low object intensity is exactly in between the median background intensity and the high object intensity. The boundary of the car is modeled by a polygon with seven vertices. Fig. 7(a) shows a HR image of the simulated car, which serves as a ground-truth for all SR reconstruction results. Fig. 7(b) and (c) show two LR image frames in which the car covers approximately 6 pixels. All 6 pixels are so called mixed pixels and contain contributions of the fore- and background. The image quality is further quantied by the signal-to-noise ration (SNR) and the signal-to-clutter ratio (SCR). The SNR is a measure for the contrast between the object and the time-averaged local background compared to stochastic variations called noise. The SNR is dened as

Fig. 6. NMSE between the SR result and the ground truth as a function of the regularization parameters  and  . Here both parameters are kept constant throughout all iterations in step 1 and step 2.

are used to estimate the HR foreground and 85 LR frames are used to estimate the HR background. In all used reconstruction methods, the zoom factor is set to 4 and the camera parameters are the same as in generating the simulated data. B. Test 1: Tuning the Algorithm Our algorithm contains several parameters such as the camera parameters, the regularization parameters, and a stopping criterion. Although the camera parameters such as the PSF and ll-factor can be estimated rather well, the regularization paand are far more difcult to tune. To study the rameters inuence of the regularization parameters on the nal result and select the parameters for later use, a few experiments are performed on 50 LR frames of the simulated car sequence. In this experiment, we study the inuence of the regularand on the SR result for the simization parameters ulated car sequence with a SNR of 29 dB and a SCR of 14 dB. Note that both regularization parameters are kept constant during both steps of the optimization procedure. We use the normalized mean squared error (NMSE) between the SR result of and its ground truth as a the car gure-of-merit. Note that this measure considers only the foreground intensities, the background intensities are set to zero (15) with the number of HR pixels, the estimated foreground contributions using SR and the ground truth. Normalization . is done with the squared maximum value of has by far the From the result in Fig. 6 it can be seen that largest inuence on the NMSE. Therefore it is recommended to to . The value for is not critical and set to . set In a broad range around these values, more than three to ve iterations in step 1 did not change the nal result. After 10 to 15 iterations in step 2 the solution converged. Hence, we set the

(13) the mean foreground inwith the number of frames, the mean local background intentensity in frame and sity in frame . is calculated by taking the mean intensity is of LR pixels that contain at least 50% foreground and dened by the mean intensity of all 100% background pixels in a small neighborhood around the object. The SCR is a measure for the contrast between the object and the time-averaged local background compared to the variation in the local background. The SCR is dened as

(14) the standard deviation of the local background in with frame . In the LR domain, the SNR is 29 dB and the SCR is 14 dB. These are realistic values and derived from the real-world image sequence of the next section. In the next subsections, different experiments on the simulated data are performed. For all experiments 50 LR frames

2908

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, NO. 11, NOVEMBER 2010

Fig. 7. Four times SR reconstruction of a simulated under-sampled image sequence containing a small moving car. (a) HR image representing the scene serving as ground truth; (b), (c) two typical LR frames (5 4 pixels) of the moving car; (d) 4 SR by a robust state-of-the-art method [18]; and (e) 4 SR by the proposed method.

maximum number of iterations in step 1 to ve and in step 2 to 15. C. Test 2: Comparison With a State-of-the-Art Pixel-Based Technique To assess the value of the proposed algorithm we compare it with the visually best result obtained by a robust state-of-the-art pixel-based SR technique [18]. Note that the registration is performed by the trajectory tting technique of this paper (to 85 LR frames) to put both methods on equal footing. The state-ofthe-art pixel-based SR result is shown in Fig. 7(d) and bears very little resemblance to the ground truth. This is no surprise since the partial area effect at the boundary of the objectwhich affects all object pixelsis not accounted for. Using the optimal regularization parameters in both steps: , we performed a SR reconstruction with the proposed method to exactly the same LR image sequence. The result is depicted in Fig. 7(e) and shows a very good resemblance to the ground truth. Subtle changes along the boundary and along the intensity transition are caused by partial area effects due to the random placement of the reconstructed object w.r.t. the HR grid. The object boundary is approximated with 8 vertices, which is one more than used for constructing the data, so the boundary is slightly over-tted. Comparing the results in Fig. 7(d) and (e) shows that the result of our proposed method is clearly superior to the pixel-based method of Pham [18]. D. Test 3: Robustness in the Presence of Clutter and Noise To investigate the robustness of our method under different conditions, we varied 1) the clutter amplitude of the local background and 2) the noise level of the simulated car sequence described in Section IV-A. The clutter of the background is varied by multiplying the background with a certain factor after subtracting the median intensity. Afterwards the median intensity is added again to return to the original median intensity. The object intensities as well as the size and shape of the car remain

the same. All parameters that are used for the reconstruction are set to the same values as in test 2 in Section IV-C. The quality of the different SR results is expressed by the NMSE w.r.t. the ground truth as before. Fig. 8 depicts the NMSE as a function of SNR and SCR. We divided the results in , medium three different categories: good and bad . For each region a typical SR result is displayed to give a visual impression of the performance. It is clear that the SR result in the good region, obtained for values of the SNR and SCR that occur in practice, bears a good resemblance to the ground truth. Note that the visible background in these pictures is not used to calculate the NMSE. Fig. 8 shows that the performance decreases for a decreasing SNR. Furthermore, the boundary between the good and medium region indicates a decrease . in performance under high clutter conditions E. Test 4: Robustness Against Variations in Motion The proposed method assumes that the object moves with a constant speed and appears in all frames to be used for reconstruction with the same aspect angle. To demonstrate the robustness of our method to violations on these assumptions, two experiments are performed. The rst experiment shall determine the robustness w.r.t. an acceleration of the object. The second experiment shall establish the robustness w.r.t. scaling of the object. We modied the simulated car sequence of Section IV-A. , expressed in LR In the rst experiment an acceleration , is added and contributes to the object position by , with the frame number. In the second experiment, a scale factor, dened as the vehicle size last frame/vehicle size rst frame, is added. A scale factor of 0.8 indicates that the observed length of the car varies from 3 LR pixel in the rst frame to 2.4 LR pixel in the last frame. The NMSE as a function of acceleration and scaling is depicted in Fig. 9. Fig. 9(a) shows that a larger acceleration causes a larger error. An acceptable decrease of the is

VAN EEKEREN et al.: MULTIFRAME SUPER-RESOLUTION RECONSTRUCTION OF SMALL MOVING OBJECTS

2909

Fig. 8. NMSE for the SR results of the simulated car sequence as a function of the SNR and SCR. We have roughly divided the space in three categories: good, medium, bad and provided a typical SR result for each category.

Fig. 9. NMSE for the SR results of the simulated car sequence as a function of (a) acceleration and (b) object scaling.

Fig. 10. Top view of the acquisition geometry to capture the real-world data.

obtained for accelerations smaller than 0.001 LR . The error of a constant velocity model tted to a constant acceleration motion will follow a parabolic model. This parabola will be symmetric, and has a top to end point difference of . From the mid-point between its top and an , with end point we get a maximum error of and this gives a maximum translational error of 0.16 LR pixel. For the second experiment Fig. 9(b) shows that a maximum scaling of 15% is allowed with an acceptable performance loss. This is a 7.5% maximum scale change from a mean scale. For a 3 pixel size object this translates to a maximum pixel shift error LR pixel for both the front and back of object edges compared to its center of mass position.

Note that both experiments have well-comparable maximum position errors of 0.16 and 0.11 LR pixel, rather consistent with the requirement that the registration error for SR should at least be smaller than half the HR pixel pitch. This can easily be deduced from the argument below. Critical sampling of bandlimited signals can be modeled by a Gaussian low-pass lter followed by sampling with a sampling pitch of 1.1 times the standard deviation of the Gaussian PSF [23]. In [21], we showed that Gaussian noise in the LR image sequence leads to Gaussian distributed registration estimates. These registration errors act as an additional blur, even for sequences of innite length [24]. If the standard deviation of this registration-error induced image blur is substantially (say two times) smaller than the optical image blur it will not affect the image quality after SR. V. EXPERIMENT ON REAL-WORLD DATA To demonstrate the potential of the proposed method under realistic conditions we applied it to a real-world image sequence. Real-world data permits us to study the impact of changes in object intensities caused by variations in reection, lens aberrations, small changes in aspect angle of the object

2910

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, NO. 11, NOVEMBER 2010

Fig. 11. Four times SR reconstruction of a vehicle captured by an infrared camera (50 frames) at a large distance: (a) and (c) show the LR captured data; (b) and (d) show the SR reconstruction result obtained by the proposed method. (a) LR reference frame (64 64 pixels); (b) SR with zoom factor 4; (c) close-up of moving object in (a); and (d) Close-up of moving object in (b).

along the trajectory, and practical violations of the linear motion assumption. The data for this experiment is captured with an infrared camera (1 T from Amber Radiance). The sensor is composed of an indium antimonide (InSb) detector with 256 256 pixels wavelength band. Furthermore, we use optics in the 35 with a focal length of 50 mm and a viewing angle of 11.2 (also from Amber Radiance). We captured a vehicle (Jeep Wrangler) at 15 frames/second, driving with a continuous velocity ( 1 pixel/frame apparent velocity) approximately perpendicular to the optical axis of the camera. A top view of the acquisition geometry is depicted in Fig. 10. During image capture, the platform of the camera was gently shaken to provide subpixel motion of the camera. Panning was used to keep the moving vehicle within the eld of view of the camera. We selected the distance such that the vehicle appeared small 2 LR pixels in area) in the image plane. (covering appr. 5 Fig. 11(a) shows a typical LR frame (64 64 pixels). A close-up of the vehicle is depicted in Fig. 11(c). The vehicle is driving from left to right at a distance of approximately 1150 meters. The SNR of the vehicle with the background is 30 dB and the SCR is 13 dB. In the simulation experiments, we have shown

that for these values our method is capable of delivering good reconstruction performance. Fig. 11(b) shows the result after applying our SR reconstruction method with a close-up of the car in Fig. 11(d). The HR background is reconstructed from 85 frames with zoom factor 4. The camera blur is modeled by Gaussian optical , followed by uniform rectangular sensor blurring blurring (100% ll-factor). The HR foreground is reconstructed from 50 frames with zoom factor 4 with the same camera parameters. The object boundary is approximated with 12 vertices and during the reconstruction the following settings are used: , in both step 1 and 2. Note that much more detail is visible in the SR result than in the LR image. The shape of the vehicle is very well pronounced and the hot engine of the vehicle is well visible. For comparison we display in Fig. 12 the SR result next to a captured image of the vehicle at a 4 shorter distance. Please be aware that the intensity mapping is not the same for both images. So a grey level in Fig. 12(a) may not be compared with the same grey level in Fig. 12(b). Notice that Fig. 12(b) was captured at a later time. Differences in environmental conditions (position of the sun, clouds, etc.), heating of the engine and vehicle as well as

VAN EEKEREN et al.: MULTIFRAME SUPER-RESOLUTION RECONSTRUCTION OF SMALL MOVING OBJECTS

2911

Fig. 12. SR result with zoom factor 4 of a jeep in (a) compared with the same jeep captured at a 4 to camera.

2 shorter distance (b). (a) 42 SR result. (b) Object 42 closer

the pose of the vehicle contribute to the observed differences between the two images. The shape of the vehicle is reconstructed very well and the hot engine is located at a similar place. VI. CONCLUSION This paper presents a method for SR reconstruction of small moving objects. The method explicitly models the foreand background contribution to the partial area effect of the boundary pixels. The main novelty of the proposed SR reconstruction method is the use of a combined object boundary and intensity description of the target object. This enables us to simultaneously estimate the object boundary with subpixel precision and the foreground intensities from the boundary pixels subject to a modied total variation constraint. This modication permits the use of the LevenbergMarquardt algorithm for optimizing the cost function. This method is known to converge to the global optimum for a well behaved cost function and an initial estimate not too far away. The proposed multiframe SR reconstruction method clearly improves the visual recognition of small moving objects under realistic imaging conditions in terms of SNR and SCR. We showed that our method performs well in reconstructing a small moving object where a state-of-the-art pixel-based SR reconstruction method [18] fails. The robustness against deteriorations such as clutter and noise as well as violations of the linear motion model was established. Our method not only performs well on simulated data, but also provides an excellent result on a real-world image sequence captured with an infrared camera. REFERENCES
[1] R. Y. Tsai and T. S. Huang, Multiframe image restoration and registration, in Advances in Computer Vision and Image Proscessing. Greenwich, CT: JAI Press, 1984, vol. 1, pp. 317339. [2] M. Ben-Ezra, A. Zomet, and S. K. Nayar, Video super-resolution using controlled subpixel detector shifts, IEEE Trans. Pattern Anal. Mach. Intell., vol. 27, no. 6, pp. 977987, Jun. 2005. [3] A. W. M. van Eekeren, K. Schutte, J. Dijk, D. J. J. de Lange, and L. J. van Vliet, Super-resolution on moving objects and background, in Proc. IEEE 13th Int. Conf. Image Process., 2006, vol. 1, pp. 27092712. [4] A. W. M. van Eekeren, K. Schutte, and L. J. van Vliet, Super-resolution on small moving objects, in Proc. IEEE 15th Int. Conf. Image Process., 2008, vol. 1, pp. 12481251. [5] P. E. Eren, M. I. Sezan, and A. M. Tekalp, Robust, object-based high resolution image reconstruction from low-resolution video, IEEE Trans. Image Process., vol. 6, no. 10, pp. 14461451, Oct. 1997.

[6] S. Farsiu, M. Elad, and P. Milanfar, Video-to-video dynamic superresolution for grayscale and color sequences, J. Appl. Signal Process., pp. 115, 2006. [7] R. C. Hardie, T. R. Tuinstra, J. Bognart, K. J. Barnard, and E. E. Armstrong, High resolution image reconstruction from digital video with global and non-global scene motion, in Proc. IEEE 4th Int. Conf. Image Process., 1997, vol. 1, pp. 153156. [8] F. W. Wheeler and A. J. Hoogs, Moving vehicle registration and super-resolution, in Proc. IEEE Appl. Imagery Pattern Recognit. Workshop, 2007, pp. 101107. [9] M. Irani and S. Peleg, Improving resolution by image registration, Graph. Models Image Process., vol. 53, pp. 231239, 1991. [10] A. J. Patti, M. I. Sezan, and A. M. Tekalp, Superresolution video reconstruction with arbitrary sampling lattices and nonzero aperture time, IEEE Trans. Image Process., vol. 6, no. 8, pp. 10641076, Aug. 1997. [11] R. J. M. den Hollander, D. J. J. de Lange, and K. Schutte, Superresolution of faces using the epipolar constraint, in Proc. British Mach. Vis. Conf., 2007, pp. 110. [12] J. Wu, M. Trivedi, and B. Rao, High frequency component compensation based super-resolution algorithm for face video enhancement, in Proc. IEEE 17th Int. Conf. Pattern Recognit., 2004, vol. 3, pp. 598601. [13] J. Starck, E. Pantin, and F. Murtagh, Deconvolution in astronomy: A review, Pub. Astron. Soc. Pacic, no. 114, pp. 10511069, 2002. [14] K. Schutte, D. J. J. de Lange, and S. P. van den Broek, Signal conditioning algorithms for enhanced tactical sensor imagery, in Proc. SPIE: Infrared Imag. Syst.: Design, Anal., Model. and Testing XIV, 2003, vol. 5076, pp. 92100. [15] S. Farsiu, M. D. Robinson, M. Elad, and P. Milanfar, Fast and robust multi-frame super resolution, IEEE Trans. Image Process., vol. 13, no. 10, pp. 13271344, Oct. 2004. [16] J. J. Mor, The LevenbergMarquardt Algorithm: Implementation and Theory. New York: Springer-Verlag, 1978, vol. 630, pp. 105116. [17] J. Dijk, A. W. M. van Eekeren, K. Schutte, D. J. J. de Lange, and L. J. van Vliet, Super-resolution reconstruction for moving point target detection, Opt. Eng., vol. 47, no. 8, 2008. [18] T. Q. Pham, L. J. van Vliet, and K. Schutte, Robust fusion of irregularly sampled data using adaptive normalized convolution, J. Appl. Signal Process., vol. 2006, pp. 112, 2006. [19] A. Zomet, A. Rav-Acha, and S. Peleg, Robust super-resolution, in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2001, vol. 1, pp. 645650. [20] G. W. Zack, W. E. Rogers, and S. A. Latt, Automatic measurement of sister chromatid exchange frequency, J. Histochem. Cytochem., vol. 25, no. 7, pp. 741753, 1977. [21] T. Q. Pham, M. Bezuijen, L. J. van Vliet, K. Schutte, and C. L. L. Hendriks, Performance of optimal registration estimators, in Proc. Vis. Inf. Process. XIV, 2005, vol. 5817, pp. 133144. [22] J. van de Weijer and R. van den Boomgaard, Least squares and robust estimation of local image structure, Int. J. Comput. Vis., vol. 64, no. 23, pp. 143155, 2005. [23] P. Verbeek and L. van Vliet, On the location error of curved edges in low-pass ltered 2-d and 3-d images, IEEE Trans. Pattern Anal. Mach. Intell., vol. 16, no. 7, pp. 726733, Jul. 1994. [24] T. Q. Pham, L. J. van Vliet, and K. Schutte, Inuence of signal-tonoise ratio and point spread function on limits of super-resolution, in Proc. Image Process.: Algorithms Syst. IV, 2005, vol. 5672, pp. 169180, SPIE.

2912

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, NO. 11, NOVEMBER 2010

Adam W. M. van Eekeren (S00M02) received the M.Sc. degree from the Department of Electrical Engineering, Eindhoven University of Technology, The Netherlands, in 2002, and the Ph.D. degree from the Electro-Optics Group within TNO Defence, Security, and Safety, The Hague, in collaboration with the Quantitative Imaging Group at the Delft University of Technology, The Netherlands, in 2009. He did his graduation project within Philips Medical Systems on the topic of image enhancement using morphological operators. Subsequently, he worked for one year at the Philips Research Laboratory on image segmentation using level sets. He worked as a Research Scientist at the Electro-Optics Group, TNO Defence, Security, and Safety, where he works on image improvement, change detection, and 3-D reconstruction. His research interests include image restoration, super-resolution, image quality assessment, and object detection.

Klamer Schutte received the M.Sc. degree in physics from the University of Amsterdam in 1989 and the Ph.D. degree from the University of Twente, Enschede, The Netherlands, in 1994. He had a Post-Doctoral position with the Delft University of Technologys Pattern Recognition (now Quantitative Imaging) group. Since 1996, he has been employed by TNO, currently as Senior Research Scientist Electro-Optics within the Business Unit Observation Systems. Within TNO he has actively lead multiple projects in areas of Signal and Image Processing. Recently, he has led many projects include super resolution reconstruction for both international industries and governments, resulting in super resolution reconstruction based products in active service. His research interests include pattern recognition, sensor fusion, image analysis, and image restoration. He is Secretary of the NVBHPV, The Netherlands branch of the IAPR.

Lucas J. van Vliet (M02) studied applied physics and received the Ph.D. degree (cum laude) from the Delft University of Technology, Delft, The Netherlands, in 1993. He was appointed Full Professor in multidimensional image analysis in 1999. Since 2009, he has been Director of the Delft Health Initiative, head of the Quantitative Imaging Group and chairman of the Department Imaging Science & Technology. He was president (20032009) of the Dutch Society for Pattern Recognition and Image Analysis (NVPHBV) and sits on the board of the International Association for Pattern Recognition (IAPR) and the Dutch graduate school on Computing and Imaging (ASCI). He supervised 25 Ph.D. theses and is currently supervising 10 Ph.D. students. He was visiting scientist at Lawrence Livermore National Laboratories (1987), University of California San Francisco (1988), Monash University Melbourne (1996), and Lawrence Berkeley National Laboratories (1996). He has a track record on fundamental as well as applied research in the eld of multidimensional image processing, image analysis, and image recognition; (co)author of 200 papers and four patents. Prof. van Vliet was awarded the prestigious talent research fellowship of the Royal Netherlands Academy of Arts and Sciences (KNAW) in 1996.

Você também pode gostar