Você está na página 1de 37

Contents

CONTENTS

1. GEOMETRY-MEASUREMENT OF LARGE WORKPIECES BY MEANS OF A MODULAR VISUAL MULTI-SENSOR-SYSTEM ................................................................................................................................................ 2 1.1 INTRODUCTION .............................................................................................................................................................. 2 1.2 THE IMPORTANCE OF OBJECT EXTRACTING AND OBJECT CLASSIFICATION. THE PLACE OF IMAGE SEGMENTATION IN THE FRAME OF ENTIRE PROJECT.......................................................................................................................................... 2 2. ALGORITHMS ................................................................................................................................................................. 4 2.1 IMAGE SMOOTHING ....................................................................................................................................................... 4 2.1.1 Mean Value Smoothing......................................................................................................................................... 4 2.1.2 Median Filtering ................................................................................................................................................... 5 2.2 HISTOGRAMS ................................................................................................................................................................. 6 2.3 IMAGE SEGMENTATION.................................................................................................................................................. 7 2.3.1 Image Segmentation by Thresholding .................................................................................................................. 7 2.3.2 Pyramidal Segmentation....................................................................................................................................... 9 2.3.3 Combined Method............................................................................................................................................... 15 2.4 THE EXTRACTION OF THE OBJECTS FROM A SEGMENTED IMAGE ............................................................................... 16 2.5 MEASUREMENT AND CLASSIFICATION......................................................................................................................... 21 2.5.1 Features Extraction ............................................................................................................................................ 21 2.5.2 Classification. Maximum Likelihood Classification .......................................................................................... 23
2.5.2.1 Maximum Likelihood Classification.............................................................................................................................. 24 2.5.2.2 Implementation ............................................................................................................................................................... 25

2.6 OPERATIONS INVOLVING TWO OBJECTS ..................................................................................................................... 27 2.6.1 The Join of Two Objects ..................................................................................................................................... 27 2.6.2 Distances Between Objects................................................................................................................................. 27 3. IMPLEMENTATIONS. TECHNICAL SPECIFICATIONS AND OTHER DETAILS ABOUT THE IMPLEMENTATION......................................................................................................................................................... 29 4. TEST AND RESULTS .................................................................................................................................................... 33 5. FINAL CONCLUSIONS AND FURTHER IMPROVEMENTS .............................................................................. 35 REFERENCES.......................36 TABLE OF FIGURES.......................37

Geometry-Measurement of Large Workpieces by Means of a Modular Visual Multi-Sensor-System

1. Geometry-Measurement of Large Workpieces by Means of a Modular Visual Multi-Sensor-System

1.1 Introduction Objective Owing to the increasing demand of the market to produce small and medium quantities of pieces with a great number of variations the industry is forced to develop suitable mechanisms to meet this demand. Quality control as a fixed part of the production process allows a sensor-based feedback to manufacturing during the production process. The call for high flexibility applies to manufacturing and sensor systems alike. This requires the use of flexible sensor-systems. The aim of this scientific investigation is the development of a sensor-module, whose opportunities of configuration allow the measurement of large objects with high resolution. The measurement should be carried out taking advantage of contactless measuring-methods, which are in use foe geometry measurements of small workpieces. Problems The goal of this scientific investigation is to design a modular optical multi-sensor-system, which allows to measure the geometry of bigger, in the beginning only flat, objects without taking sequential pictures. Several sensors in different positions have to be used to measure the dimensions of the workpiece individually, and after a calibration of the system it should be possible to link their data to obtain the overall result. Solution Concept The conception of the solution comprises the mechanical set-up of a device to take photos with the used sensors , the analysis of the different sensor-signals and a plausibility-check of the overlapping parts of different pictures. Results The project started with the construction of a device which allows to position the sensor modules. It is not necessary to adjust each sensor with high precision, the only requirement being that in all partial pictures the contours of the objects which are to be measured are visible. For this reason it was not necessary to develop expensive constructions. The distortions appearing in each recorded picture are eliminated afterwards by means of appropriate software. Therefore the calibration of the system is of decisive importance. During the calibration a 4x4 matrix, which is equivalent to the optical and geometric parameters of the sensor system, is determined using a procedure similar to the linear transformation. Having calibrated the system as described above several series of measurements were carried out in order to establish the accuracy of measurement which was found to be limited to 1 pixel. Therefore a special interpolation procedure has to be integrated into the measuring-software in order to improve the precision by a factor 10. In addition to this, strategies to expand the range of application are being explored to enable sure statements about the dimensional stability of big workpieces for non-overlapping images. As not all measures have to be determined with the same precision, the employment of zoom lenses is being analysed. Their advantage is the possibility of realising different display scales within one measuring structure in which all sensor modules are positioned at the same distance from the object. An additional emphasis has been put on investigations into the suitable of the developed system for the field of object- and pattern recognition with a view to opening a further working field for this multi-sensor-system.

1.2 The importance of Object Extracting and Object Classification. The Place of Image Segmentation in the Frame of Entire Project

Geometry-Measurement of Large Workpieces by Means of a Modular Visual Multi-Sensor-System

The correctness of object finding and extracting from one picture has a vital importance for future processing and measuring. The process of extracting objects from a picture is named image segmentation. This field is one of the most developed domain of image processing and a lot of work has been done until now. Uncertainty still occurs in this task due to the real nature of pictures. Finding the right method in this case was one of the most important steps performed at the beginning. The segmentation of the image is followed by object recognition process. First step in this direction is features computing for each object. After that follows object classification where a lot of concepts and methods were available, from Bayesian classification to the approaches based on neural network. Whereas the objects that we are looking for have different sizes and different shapes a neural network approach was not possible. So a Bayesian classification is used in order to classify the objects. Once we detect and classify the objects one can compute several relations between them, like the distance for example. And from now a lot of measurements can take place. As it can be seen very easily the accuracy of measurement rely totally on a correct and precise image segmentation and object classification.

Algorithms

2. Algorithms Digital image processing , the manipulation of images by computer, is a very dynamic domain. In its short history (about 20 years) a lot of concepts were proposed and many of them were dropped out. It is a vast umbrella under which fall diverse aspects of optics, electronics, mathematics, digital signal processing and computer technology, a truly multidisciplinary domain. In order to extract the objects from one image some of the digital image processing algorithms were of interest. First of all the image should be smoothed to remove some artifacts that could lead to undesired objects in the end. The section Image Smoothing presents these algorithms. The histogram of an image is very useful for a lot of operations, especially the segmentation. Second section of this chapter deals with the problems about histograms. After this two operations, the central point of this chapter follows: image segmentation. There are presented three methods and a comparison of results obtained using each of them is given. This process is not an 100% safety one, it contains an unavoidable uncertainty. The segmentation is followed by objects identification process and the section The Extraction of the Objects from a Segmented Image covers this particular field. Once we have all the objects we can start to measure them and to classify them. This two operations are discussed in the next section. The last section of this chapter presents several operations that can be applied to two objects. Each section contains references for further readings and additional informations. 2.1 Image Smoothing In order to enhance the quality of the picture some preliminary operations will be applied. The main aim of this operations is to eliminate noises from picture and to improve some geometric details before segmentation. To do this some standard neighbourhood operations were take in considerations. 2.1.1 Mean Value Smoothing In the following chapters, I(m,n) m,n = 0...511 denote the pixel brightness value at position (m,n). A lot of methods presented here can be expressed as template techniques in which a template, box or window is defined and then moved over the image row by row and column by column. The products of the pixel brightness values, covered by the template at a particular position, and the template entries are taken and summed to give the template response, which will be as follows

r (i , j ) = I (m, n)t (m, n)


m =1 n =1

where r(i,j) is the response, t(m,n) is the template entry at that location and M, N are the template dimensions. Images can contain random noise superimposed on the pixel brightness values owing to noise generated by the nature of workpieces. It can be removed by the process of low pass filtering or smoothing, usually at the expense of some high frequency information in the image. To smooth an image a uniform template is used with uniform entries:

t (m, n) =

1 MN

for all m, n

The template response is a simple average of the pixel brightness values currently within the template

r (i , j ) =

1 MN

I (m, n) .
m =1 n =1

The pixel at the centre of the template is represented thus by the average brightness level in a neighbourhood defined by the template dimensions. In our case a template with M = N = 3 is used. It is obviously that high frequency information such as edges will be averaged and lost. First of all, a mean value smoothing is applied to the image in order to eliminate the salt-andpepper noise.

Algorithms

It can be note that in order to reduce the lost of high frequency detail a threshold is applied to the template response in the following manner:

1 M N I (m, n) MN m=1 n =1 r (i , j ) = I (i , j ) r (i , j ) =

if

I (i , j )

1 MN

I (m, n) < T
m =1 n =1

otherwise

where T is a prespecified threshold that could be determined a priori based upon knowledge of or an estimated scene signal to noise ratio. In order to treat all template operations together, the method ImageProcessing :: ApplyTemplate was implemented . This method takes as parameter an 3x3 template an performs convolution of this template with entire image. Mean value smoothing is performed using a uniform template as parameter. The threshold alternative is not used in this application. 2.1.2 Median Filtering Disadvantages of the method described above for avoiding edge deterioration are that it takes more time for smoothing operation and T must be determined. An alternative technique for smoothing in which edges are maintained is that of median filtering. In this the pixel at the centre of the template is given the median brightness value of all the pixels covered by the template, i.e. that value which has as many values higher or lower. For example, the median of 6,5, 9, 3, 11, 11, 13 is 9, whereas the mean is 8.28. Figure 2-1 shows the effect of median filtering on a single line of image data compared with simple mean value smoothing.

8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 Original Image Smoothing (3x1) Median Filtering (3x1)

Figure 2-1 Comparison of simple averaging and median filtering

This method is well suited for removal of impulse-like noise. This is because pixels corresponding to noise spikes are in their neighbourhood and will be replaced by the most typical pixel in the neighbourhood. The disadvantage of this method is the overhead of a sorting operation for brightness values in each pixel neighbourhood. I have implemented this technique in method ImageProcessing :: MedianFiltering. An important property of median filtering is that it doesnt introduce new brightness values in the image so that it can be used after the segmentation process in order to eliminate possible spikes noise which appears sometimes. Remarks: Template technique can be used also for many other enhancements such as edge detection and enhancement or linear line detection. For example, following 3x3 template:

Algorithms

6 t (0,0) = t (0,1) = t (0,2) = t (1,0) = t (1,2) = t (2,0) = t (2,1) = t (2,2) = a t (11) = 2 a ,

where a = 1/9 implements edge enhancement by subtractive smoothing. 2.2 Histograms If each pixel in the image is examined and its brightness value noted, a graph of number of pixels with a given brightness versus brightness value can be constructed. This is referred as the histogram of the image. An image which makes good use of the available range of brightness has a histogram with occupied bins (or bars) over its full range, but without significantly large bars at black or white. In opposite, an image make up from some objects on a background will have a histogram with peaks around objects colour and background colour. This property will be used in order to segmentate an image. It is obviously that an image has an unique histogram, but the reverse is not true in general since histogram contains only radiometric and no spatial information. The histogram can be viewed as a discrete probability distribution since the relative height of a particular bar indicates the chance of finding a pixel with that particular brightness value somewhere in the image. The histogram can be regarded as a continuos function of a continuously varying brightness value. Therefore, leth(x) be the histogram function, where x represents the brightness value an the value h(x) associated with this value is the number of pixels with this brightness value. It is easy to see that:

h( x )dx = MN
0

where M, N represents the width and the height of the image. A lot of operations can be applied to the image using the informations from the histogram. As an example can be given linear/logarithmic/exponential contrast enhancement, or histogram equalisation. Frequently it is desirable to match the histogram of one image to that of another image and is so doing make the apparent distribution of brightness values in the two images as close as possible. This technique is known as histogram matching and is used in the process of joining two images to form a mosaic or to match the histogram of an image with a predefined histogram (with a particular form, e.g. Gaussian ). These problems are covered in [1] and [2] very well. In order to perform a thresholding segmentation of the image it is necessary that the histogram to be as smooth as possible, consequently first of all a smoothing algorithm is applied. This one is performed in the class CHistogram constructor. A mean value smoothing algorithm for a 3x1 neighbourhood is used. If we note with maxgray the value of the highest peak in the histogram, with lHistogram the histogram (lHistogram[i] = number of pixels with brightness value equal to i), and WIDTH represents the dimension of the neighbourhood and GRAY_LEVELS represents the number of grey levels in the image., the C code for histogram smoothing will be as one can see in Figure 2-2. This code is taken directly from program source files improc.cpp). (

Algorithms

#define WIDTH 3 for( i = 0 , maxgray = 0 ; i < GRAY_LEVELS ; i++ ) // Smoothing histogram (use a weight-operator) { sum = 0 ; for( int j = 0 ; j < WIDTH ; j++ ) // Compute the sum in the neighbourhood of one gray level if( i+j < GRAY_LEVELS ) sum += lHistogram[i+j] ; else sum += lHistogram[i] ; if( (lHistogram[i] = sum / WIDTH ) > maxgray ) maxgray = lHistogram[i] ; }
Figure 2-2 C++ code for histogram smoothing

2.3 Image Segmentation So far I have described some methods for the enhancement of the image. After that the analyse of the image content can begin. This mean I shall attempt to find out what is in the image. As mentioned in the section 1.2 this particular task has an overwhelming importance for further processing. Image segmentation is the process by which a computation translates the original image description into a description in terms of objects or a process that partitions a digital image into disjoint regions (objects). Because of its great importance, I have developed three methods for this process. Each of them will be describe in the next sections. There are several ways to do the segmentation of an image. The techniques can be grouped as follows: 1. image segmentation by thresholding 2. gradient based methods 3. region techniques 3.1. region growing 3.2. pyramidal segmentation 4. mixed techniques The three methods developed by me are as follows: 1. image segmentation by thresholding with multiple thresholds 2. pyramidal segmentation 3. one mixed technique between region and thresholding techniques In order to make the implementation more clearly, each method was implemented in an individual class. There is an abstract class CSegmentation which provide the general interface for all the segmentation methods and three particular classes CThresholdingSegmentation, CPyramidalSegmentation and CMixedSegmentation all derived from the abstract one, everyone implementing one of the techniques mentioned above. How to add a new segmentation algorithm can be read in the Technical specifications section. 2.3.1 Image Segmentation by Thresholding This method is a point oriented method because it does not make use of the regional relationships which exist between adjoining pixels. It is used only the brightness value (or gray level) of one pixel. Thresholding is a useful technique for scenes containing solid objects upon a contrasting background. It is computationally simple and never fails to define disjoint regions with closed boundaries. If the object differs from its background by some property other than gray level (texture, etc.), one can first use an operation that converts that property to gray level or can use a pyramidal segmentation technique. The original method assigns all pixels with brightness value below the threshold to one value (0 for example) and all pixels with brightness value at or above the threshold to second value (1 for example). Therefore the original image will be

Algorithms

transformed into a binary image with one value for background and another value for objects. Clearly, the main problem with this method is correct choosing of the threshold value. The exact value of threshold gray level can have considerable effect on the boundary position and overall size of the extracted objects. Automatic threshold choosing is possible with histogram techniques. Lets suppose that we have an image containing an object on a contrasting dark background. The image histogram is a bimodal gray level histogram (see Figure 2-3). The two peeks correspond to the relatively large numbers of pixels inside and outside the object. The dip between the peaks corresponds to the relatively few points around the edge of the object and is commonly used to establish the threshold gray level.

h(x)

x
Figure 2-3 The bimodal histogram

All the pixels with gray level below threshold T will be assign to 0 (background) and all pixels with gray level above T will have value 1 (the object). Note that increasing threshold T with a little amount causes a slight decrease in area if the threshold corresponds to the dip in the histogram. The dip between the peaks corresponds to a local minimum for continuos function h(x) (histogram function). A generalised method of the one described above uses more than one threshold in order to segmentate an image. A picture which contains objects with different colours (or brightness values, or gray levels) have a multimodal histogram, the number of peaks is correlated to the number of different colours of the objects. In this case each dip between two peaks will become a valid threshold. Evidently, instead of 0 and 1 values, a set of labels will be used for the processed image. The number of distinct labels will be equal to number_of_thresholds + 1. Figure 2-4 represents a histogram with 3 peaks, so that 2 thresholds will be used to segmentate the original image. The numbers in the small black boxes represent the value used to label the objects from original image. All pixels with gray level below T1 will be labelled with 0, pixels between T1 and T2 receive value 1 and pixels with gray level above T2 are labelled with 2.

0 h(x)

T1

x
Figure 2-4 Thresholding selection

T2

The problem consists in finding the local minima for histogram viewed as a continuos function. Once the thresholds are ascertained the algorithm of image segmentation is trivial. In order to find out the thresholds are implemented three methods in the class CHistogram: GoesUp, GoesDown and ComputeThreshold. The idea is to find all descending slopes and all ascending grades of the histogram. Afterwards a threshold will be set at half distance between the end of a slope and the start of next grade. It is worthy to note that plane sections are skipped. A slope consists from a descending sequence of h(x) values, whereas a grade is formed from a ascending sequence of h(x) values. The method CHistogram :: GoesUp finds the grades of histogram. The C++ code of this method is shown in Figure 2-5 where pos is the starting position and total will contain the total number of pixels included in this grade. The function will return the width of the grade (number of gray levels in grade). The code for CHistogram :: GoesDown is similar. After one slope and one grade are determined it is possible to compute the threshold value.

Algorithms

int CHistogram :: GoesUp( int pos, long* total ) { int i = pos ; *total = 0 ; while( *(lHistogram+pos) < *(lHistogram+pos+1) ) if( ++pos >= GRAY_LEVELS ) break ; else *total += lHistogram[pos-1] ; return pos-i+1 ; }
Figure 2-5 C++ code for grades extraction

The method CHistogram :: ComputeThreshold computes all thresholds, after one slope and one grade were found. In order to eliminate some artifacts (undesirable threshold values) a low limit (the parametersmooth passed to the method) is used for the length of slopes and grades. Figure 2-6 shows us the code of this mode taken fromimproc.cpp source file. After the computation of the array of local minima, the method CThresholdingSegmentation :: Segmentate1 will perform the segmentation process. The declaration of this method looks like: BOOL ThresholdingSegmentation1( POINT& start, POINT& end, int*& pArray ) ; where start is the left-top corner of the region to be segmented, end represents right-down corner of the region to be segmented and after the segmentation pArray will receive precisely the array of local minima computed in CHistogram :: ComputeThreshold. The function starts with histogram computation, followed by local minimum array determination and, finally, the segmentation of the image. The specification of one particular region of the image to be segmented offers the possibility to segment the entire image with different sets of thresholds (local minima in histogram), each one for one subregion or to segmentate just a region of the image. Method CThresholdingSegmentation :: Segmentate segmentates the entire image by calling ThresholdingSegmentation1 and passing the left-top and bottom-right corners of the image as parameters. More informations about gray level segmentation and thresholding segmentation can be found in [2], [3], and [4]. 2.3.2 Pyramidal Segmentation Pyramidal segmentation belongs to the third class of segmentations techniques, namely regional techniques, which brings a new aspect in segmentation process. Usually these techniques are good where the first two classes of algorithms failed. A common characteristic of all algorithms which belong to this class is that the neighbouring pixels will contribute to the final state of one pixel. Most common techniques of this class are region growing and pyramidal segmentation. I have implemented the second one, therefore I will describe this one in next pages. In [5], the article Node Linking Strategies in Pyramids for Image Segmentation by J. Cibulskis and C.R. Dyer improve the technique describe by P.J. Burt in the paper The Pyramid as a Structure for Efficient Computation from the same book. Let us note that the beginning of this method was around 1981. This technique iteratively refines segmentations alternately with updating property measurements taken over the current partition. An overlapped pyramid data structure is used in which at each level is stored an image which is half resolution of the image at the level below it. A 512 by 512 (2n by 2n where n = 9) original image defines the bottom level (level 0) of the pyramid. Above n-1 this level are defined successively reduced-resolution versions of size 2 by 2n-1, , 2 by 2, and 1 by 1. The neighbourhood of a node is specified so that a node at level k > 0 has sixteen children at level k-1 (the level below it) corresponding to a 4 by 4 block of nodes. (A node which is on the border of an array at any level has its missing children defined by reflecting the top and bottom rows and the left and right columns of the image outward). Each node at level k < n also has four parent nodes at level k+1 so that the parent and child relations are inverses. Finally, each node has eight siblings at its own level, defined as the eight nearest neighbours of the node. Note that a pair of horizontally or vertically adjacent nodes at one level have child neighbourhoods overlapping by 50%.

Algorithms

10

Each node has associated a property, precisely the average gray level of selected children. Image segmentation is defined by a grouping process which merges nodes at a given level by having each node select a single best parent node from its four candidate parent neighbours. Best is defined as the parent with the most similar gray level to the given nodes gray level. It can be said that each node is linked to its most similar parent. The nodes property (gray level) can be updated using the nodes current subset of children which is linked to it. This process of alternately updating property values and reselecting most similar parents is iterated, allowing nodes to merge into set of trees based on gray level similarity. void CHistogram :: ComputeThreshold( int smooth ) { int down, up ; long area ; noOfLocMin = 0 ; pLocmin[noOfLocMin++] = 0 ; for( int start = 0 ; start < GRAY_LEVELS; start++ ) { // Extract a slope from histogram down = GoesDown( start, &area ) ; if( down > smooth && area > 512 ) // If the width of the region is greater the a specified value, and area is bigger // than a threshold limit { for( int start1 = start + down - 1 ; start1 < GRAY_LEVELS; start1++ ) { // Extract a grade from histogram up = GoesUp( start1, &area ) ; if( up > smooth && area > 512 ) // If the width of the region is greater the a specified value, // and area is bigger than a threshold limit , then add a new // local minimum to the list { pLocmin[noOfLocMin++] = start + down + ( start1 - start -down ) / 2 ; start = start1 + up ; break ; } } } } pLocmin[noOfLocMin++] = 255 ; }
Figure 2-6 C++ code for thresholdings array building

The final step of the algorithm consists in propagating the property values of all nodes from a specified level to all its children. The process is repeated until 0 level is reached. One form of the basic procedure is shown in Figure 2-7.

Algorithms

11

void PyramidalSegmentation( int StartLevel ) { // Initial property values are computed for each node of the pyramid by averaging // gray levels of nodes children for( each node n of the pyramid ) ComputeProperty( n ) ; // Build the links between nodes in pyramid do { // Update best parent link for all nodes for( each node n of the pyramid ) if( (m = MostSimilarParent( n )) != n.parent ) n.parent = m ; // Update property value using only children linked to the current node // (Also a gray level averaging technique is used) for( each node n of the pyramid ) Update( n.property ) ; } while( parents are updated ) ; // Obtain final segmentation by tracing trees of linked nodes down pyramid starting // from a given level StartLevel LabelAllDescendands( StartLevel ) ; }
Figure 2-7 An outline of pyramidal segmentation algorithm

In Figure 2-8, the original image is a one-dimensional image of a step edge in a noisy environment and the pyramid obtained after the first step is shown. The process is then iterating until a stable state is reached.

51 49 43 48 39 44 49 55 56 56 56 55 53 55 53

50 46 38 34 38 54 50 58 58 50 58 66 50 58 46 54
Figure 2-8 Node linking in standard pyramidal segmentation

The procedure described above is guaranteed to converge to a final solution. The proof can be found in [5], pages 111-112. In the effort to speed up the algorithm, J. Cibulskis and C.R. Dyer improved the original version in the following areas: node-linking sequence pyramid initialisation root and leaf marking

Algorithms

12

From the proof of convergence, results that a node need not begin to compute its property value or best parent link until all of its descendants have stabilised to their final values. Thus for this form of the linking process, a strictly one-pass, bottom-up procedure is all that is need. Beginning at level 1, each level is processed until all of its nodes attain stable property values and all of the nodes at the next lower level attain stable best parent links. Level by level processing is enforced since a nodes final property value and best parent link depend only on properties and links of its descendants. Clearly, this form of node sequencing does not affect the guarantee of convergence, but it does not provide the same solution as the original method. These differences are rarely noticeable. Other improvement to original version is in the pyramid initialisation. Originally, each nodes initial property value has been computed by a simple averaging process using a subset of the nodes sixteen children. In order to sped up the algorithm, it is used a simple average of the stabilised property values of all the children of a node, instead of initialising the entire pyramid. In this way it needs only initialize the bottom level of the pyramid before initializing the node linking process. In earlier work, it has been observed that if we force each node in the pyramid (up to a prespecified level) to link to its best parent, then in many cases a node which has no good parents is forced to merge with a region that has vastly different properties from those of the given node. Other problems with forced linking include: final segmentation trees exist for each node at the designated level, no matter how many pixel population exist in the original image the height of the segmentation tree is dependent on the image size, not the size of the region being represented small regions will be merged with other regions of the same type into one conglomerate segmentation tree for all the pixels in a given population To overcome these kinds of artifacts, methods have been developed which associate a weight with each nodes parent link. If a node is not linked to any of its parents, then this node is a root node of a segmentation tree in the pyramid. The root marking has a crucial importance to the success of all pyramid linking algorithms. Marking a node corresponds to making a decision about what the best level of resolution is for a particular image patch. I would like to define an operator which marks a node as the root of a segmentation tree whenever it detects that a particular property becomes local. That is, at level where a part of a region becomes line-like or spot-like, that feature is locally detected by a node in the pyramid which is then made the root of a tree for that patch of the region. Care must be taken, since marking roots too liberally means regions will be fragmented into many pieces, while marking roots too conservatively will cause separate regions to be merged into a single segmentation tree. Rather than adjust weights I have tried to define operators which can be applied to each level of resolution in order to detect spot-like and line-like features. A node is marked as a root if after its level has stabilized, its final property value is significantly different from most of its eight siblings values. I use an operator which marks a node as a root if it has at least four leaf neighbours or if the number of non-leaf neighbours with property value similar to its property is less or equal with three. The similarity between nodes property value is defined using a similarity threshold value. So that, if the module of the difference between two property values is less or equal with the specified threshold, then one may say that these two properties are similar. The choice of the threshold is very important and it can be done only experimental. The method CPyramidalSegmentation :: Root implements the operator described above, and its code is depicted in Figure 2-9.

Algorithms BOOL CPyramidalSegmentation :: Root ( level& l, int leveldim, int lin, int col ) { int nLeafNodes = 0 ; // Number of leaf nodes int nSimilar = 0 ; // Number of similar nodes int nSimilarThreshold = 100 ; // The threshold for colour similarity // Checking all the 8 neighbours to see if they are leaves or not if( Leaf(l[(UINT)(lin-1)%leveldim][(UINT)(col-1)%leveldim]) ) nLeafNodes++ ; else { if( abs( l[lin][col].color - l[UINT(lin-1)%leveldim][UINT(col-1)%leveldim].color) < nSimilarThreshold ) nSimilar ++ ; } // .. // And so on for all eight neighbours of current node // .. if( nLeafNodes >= 4 || nSimilar <= 3 ) return TRUE ; return FALSE ; }
Figure 2-9 C++ code for root marking algorithm

13

In the figure above l is the level of the pyramid, leveldim means level dimension, lin, col are the line and column of the node being tested. The function returns TRUE if the node is a root and FALSE otherwise. The variable nSimilarThreshold represents the threshold used for similarity checking. In order to increase the performances of the pyramidal segmentation method new forms for this operator should be searched or some modifications in the final comparisons (nLeafNode >= 4 || nSimilar <= 3) may give better results. Now, we have all elements to build the new form for this algorithm. It follows the description of the algorithm in pseudocode language, whilst the method CPyramidalSegmentation :: Segmentate implements this algorithm in C++ as it is indicated in Figure 2-10. In Figure 2-10 lev is the level where the tracing down process will start and is given as a parameter for the algorithm. A node of the pyramid is described by following data structure: struct _element { BYTE color ; // the property value for the node, which is the average colour of its sons BYTE mark ; // the node is marked or not (the node is marked if it is a leaf node or a root node) // also , the father of this node is encoded on this field int children ; // encodes the sixteen children of this node };

Algorithms

14

1. Allocate memory for all pyramid levels and other local variables 2. Initialize level 0 of the pyramid with original pixels brightness values (gray levels) 3. Node-linking process for( nLevel = 1 ; nLevel < lev + 1 ; nLevel++ ) { if( nLevel > 1 ) // Mark the 'root' and 'leaf' nodes at level 'nLevel-1' for( each node of level nLevel ) // Initialize the property value (nodes colour) BOOL over = FALSE ; // TRUE if no updates in node linking are made while( !over ) // While the process is not stabilized (links are modified) { over = TRUE ; for( each node of level nLevel-1 ) if( node is not marked ) // Update the best parent-child link for this node { Find the best parent of this node if( it is different from actual nodes parent ) { Delete the old parent-child link Add the new parent-child link over = FALSE ; } } for( each node of level nLevel ) // Update property value using only node children } } 4. Segmentation process ( tracing top-bottom the pyramid, every child taking the property value of its parent ) for( nLevel = lev ; nLevel > 0 ; nLevel-- ) for( each node of the level nLevel ) for( every child of this node ) // Childs property value = parents property value 5. Initialize the destination TBildArray with the new values obtained during the segmentation process 6. Deallocate the memory for all pyramid levels and other local variables
Figure 2-10 The final form of pyramidal segmentation

The color field of _element structure represents the property value associated with each node of the pyramid and is the average value of childrens property value. First, it is initialised with the average value of the color of all children nodes included into 4 by 4 block (from node (2i-1, 2j-1) to node (2i+2, 2j+2)) from the level below (it is assumed that current node has (i, j) coordinates, where i denotes the line and j denotes the column in the current level). After the links were established it will contain the average value only for the children connected to the node. The mark field of _element structure is organised as follows:

Bit 0 - mark flag (1 if the node is marked, otherwise 0)

Algorithms Bit 1 - unused Bit 2 - unused Bit 3 - unused Bit 4 - parents code Bit 5 - parents code Bit 6 - parents code Bit 7 - parents code

15

Each node has four possible parents to link with. The parents are situated one level above and if the node has (i, j) coordinates, then available parents coordinates are as follows: (i/2, j/2) ( (UINT)(i/2+EvenOdd(i))%nLevelDim, j/2) (i/2, (UINT)(j/2+EvenOdd(j))%nLevelDim) ((UINT)(i/2+EvenOdd(i))%nLevelDim, (UINT)(j/2+EvenOdd(j))%nLevelDim)

Here nLevelDim is the dimension of the level above, % denotes modulo operation and EvenOdd is a macro which returns +1 if the argument is odd and -1 if it is even. Bit 4 is set if the parent has (i/2, j/2) coordinates, bit 5 is set if the parents coordinates are ( (UINT)(i/2+EvenOdd(i))%nLevelDim, j/2), bit 6 is for the (i/2, (UINT)(j/2+EvenOdd(j))%nLevelDim) parents coordinates and last bit 7 is set if the parent coordinates are ((UINT)(i/2+EvenOdd(i))%nLevelDim, (UINT)(j/2+EvenOdd(j))%nLevelDim). Each node has sixteen children which are encoded into children field of the same structure. The values for children are displayed in next figure: 32768 2048 128 8 16384 1024 64 4 8192 512 32 2 4096 256 16 1

where if the node has i, j coordinates at level L, then first line of the table corresponds to the line 2*i-1 at level L-1, first column corresponds to the column 2*j-1, last line corresponds to line 2*i+2 and last column corresponds for column 2*j+2 at level L-1. For example if the node has two children at positions (2*i-1, 2*j) and (2*i+2,2*j+2) at level L-1, then its children field is set to 16385 = 16384+1. In the source file bidimensional array tChildTable contains exactly the same values as in the table above and it acts like a mask operator. Considerable improvement may be possible by using a combined top-down/bottom-up linking method or by modifying the Root function in order to mark more precisely the roots of the subtrees, which means a better and fine segmentation. 2.3.3 Combined Method This method combines a quadtree smoothing followed by a thresholding segmentation. I have developed this technique starting with a similar approach which can be found in [6]. It is clearly that the smoothing phase uses the regional informations whereas the second step needs only global informations stored in histogram. Consider an image defined as in the previous section. A quadtree of this image is defined recursively as

q (2i , 2 j , k 1) + q (2i + 1, 2 j , k 1) + q (2i , 2 j + 1, k 1) + q (2i + 1, 2 j + 1, k 1) 4 q (i , j ,0) = image(i , j ) q (i , j , k ) =


where 0 k n is the current level, 0 i, j 2n-k are the coordinate of a point in the image denoted byimage. Hence a quadtree is based on 2 x 2 block averaging. The level just above the base consists of nodes representing nonoverlapping 2 x 2 blocks of pixels in the original image so that the size of this level is 256 x 256. This process can be repeated until the root node is reached whose value is the mean gray level of the entire image. Any parent node has four sons. The gray level histograms of upper levels (greater k) indicate an increase in class resolution, in other words the histograms will contain more peaks as the original one. Figure 2-11 shows us what is happening:

Algorithms

16

Figure 2-11 Comparison of histograms obtained at different levels in quadtree smoothing process

Quadtrees are useful in image segmentation because the averaging process, which produces the next quadtree level, reduces variances of the signal within a single homogeneous region. Note, however, that the smoothing procedure also introduces a bias due to merging of data from different regions. The stopping level for smoothing process is given as a parameter to the segmenting procedure because it cant be choose automatically due to the large number of images which can be segmented. Therefore the levels should satisfy the following condition: 0 k MAX_LEVEL. When the MAX_LEVEL level is reached the smoothing step is ended. Following this step it comes the thresholding phase. Using the histogram of the image obtained on last smoothing level, the array which contains the threshold values is computed. For more informations about the way this array is built see the section Image segmentation by thresholding. The original image is next segmented in a standard way using the values from the array computed before. This algorithm is implemented in the method BOOL CMixedSegmentation :: Segmentate() There are two additional parameters passed to the class constructor CMixedSegmentation :: CMixedSegmentation, namely level which means the maximum level in the quadtree smoothing process (MAX_LEVEL) and smooth that is passed forward to CHistogram :: ComputeThreshold method. 2.4 The Extraction of the Objects from a Segmented Image The result of the segmentation process is an image which contains one or more objects, each of them having a particular gray level. This gray level is constant for an object and it is named the objects colour. In one image may exist more than one object with the same colour. Next step is to extract from the segmented image all the objects which lies in it. This means that each object has to receive an unique identifier. Later an object can be selected using its identifier. The process of objects extracting is subdivided into two distinct phases: 1. first of all, it is built a matrix with the same dimensions as the original image, which contains at each position the identifier of the object 2. after that, using the matrix built at previous step, a list of objects is constructed; each object has an entry in that list First phase of objects extracting process is implemented in the method ImageProcessing :: MarkObjects and it is a two steps operation. In the first one, the matrix is initialised so that each position will contain the objects identifier. Because in this step a lot of aliases can appear a second step is applied in order to eliminate these aliases. First of all it is allocated memory for the matrix denoted by m_pObjMapImage. After that, the segmented image is scaned row by row, and each pixel is compared with the upper pixel and the left pixel. If the pixels gray level is equal with the left pixels gray level and it is different from the upper pixels gray level, then the pixel will be assigned to the same object as the left pixel (the corresponding element in the matrix will have the same value as the left one). Otherwise, if the pixels gray level is equal with the upper pixels gray level and it is different from the left pixels gray level, then it is assigned to the upper pixels object. If the pixels gray level is different from both, the upper and the left pixels gray levels, it is assigned to a new object (a new

Algorithms

17

identifier, differently from all before is stored in the matrix at this position). If pixels gray level is equal with the left and the upper pixels gray level, then it is assigned, by default, to the left pixels object and it will be take into consideration in the second step of this phase. The code is depicted in Figure 2-12. The i index refers the current line and j the current column in the m_pObjMapImage matrix. From now on, each pixel of the segmented image belongs to an object, the one whose identifier is stored in the m_pObjMapImage matrix at pixels position. The object which includes the pixel above the current pixel is called upper object. In the same way is defined the left object. As mentioned in the first step of this phase, if both the upper and the left pixel have the same gray levels as the current pixel, the segmented image is traversed for the second time and these situations are corrected. In this situation, its clearly that the upper and left objects are one and only object. Consequently, they must have the same identifier, in other words left_object_identifier = upper_object_identifier (remember that the current objects identifier is equal with the left objects identifier). One can say that these two objects are equivalently, therefore a list of equivalencies is built. This list is noted with EqList. It has so many entries as the number of objects found in the first step (nCurrentId). Afterwards the list is initialised, EqList[i] = i for all i, that is every object is equivalent with himself. During the scanning of the image if the pixels gray level is equal with both the left and the upper pixels gray levels and the upper objects identifier is different from the left objects identifier, we have to change the equivalence classes for upper and left objects. Also the equivalence classes of the upper and left objects are compared to see if the objects are not already assigned to the same equivalence class. Following test checks the conditions described above: if( left <= 0 && up <= 0 && m_pObjMapImage[i][j-1] != m_pObjMapImage[i-1][j] && EqList[m_pObjMapImage[i][j-1]] != EqList[m_pObjMapImage[i-1][j]] ) In order to change the equivalence class of an object, the object with greater identifier will take the identifier of the other object. For example, if it is found that objects with identifiers 3 and 14 are equivalent, then EqList[14] = EqList[3]. When an equivalence class is modified it is necessary to modify the equivalence class value for all objects having the old class value equivalence class value into the new value. In the example given above, it is necessary to modify the equivalence class for all objects having the value equal to EqList[14] in EqList[3] value. Following this way, at the end, the equivalence list has the property that for all i we have i EqList[i]. This could be also a good criteria for testing the correctness of list building. After the construction of the equivalence list is completed, all the values in the m_pObjMapImage matrix will be changed according to the new values of EqList. These operations are also implemented in the method ImageProcessing :: MarkObjects. At the end of this step the m_nObjectsNo indicates the total number of the objects extracted from the image. The following section code is taken directly from this function and it is shown in Figure 2-13. Because the code for this operation is too long to fit in one single page, it was broken in three parts as illustrated in Figure 2-13-1, 2-13-2 and 2-13-3. But it is referred as one single figure, Figure 2-13.

Algorithms

18

for( i = 0 ; i < 512 ; i++ ) // First step of marking process { m_hSourceImage->get_line( i, lpSrcBuffer ) ; for( j = 0 ; j < 512 ; j++ ) { // Compute the distance between current pixel colour and up pixel colour if( j > 0 ) left = abs( (int)lpSrcBuffer[j-1] - (int)lpSrcBuffer[j] ) ; else left = MAXINT ; // Compute the distance between current pixel colour and left pixel colour if( i > 0 ) up = abs( (int)lpPrevSrcBuffer[j] - (int)lpSrcBuffer[j] ) ; else up = MAXINT ; // Set the id for current pixel ( the id will be the id of the object // to which this pixel belongs to ) if( left <= 0 && up > 0 ) m_pObjMapImage[i][j] = m_pObjMapImage[i][j-1] ; else if( up <= 0 && left > 0 ) m_pObjMapImage[i][j] = m_pObjMapImage[i-1][j] ; else if( up > 0 && left > 0 ) m_pObjMapImage[i][j] = nCurrentId++ ; else if( left <= 0 && up <= 0 ) // This case will be take in consideration again , in the second step m_pObjMapImage[i][j] = m_pObjMapImage[i][j-1] ; } for( j = 0 ; j < 512 ; j++ ) // Copy current buffer in previous buffer lpPrevSrcBuffer[j] = lpSrcBuffer[j] ; }
Figure 2-12 C++ code of first step in objects marking process

UINT *EqList = new UINT [nCurrentId+1] ; // Equivalence list (if EqList[i] = j then objects 'i' and 'j' are the same) if( EqList == NULL ) { m_Error = MEMORY_ERROR ; delete [] lpSrcBuffer ; delete [] lpPrevSrcBuffer ; return FALSE ; } for( i = 0 ; i < nCurrentId+1 ; i++ ) // Initialize the equivalence list EqList[i] = i ;
Figure 2-13-1 C++ code of second step in objects marking process (I)

Algorithms

19

for( i = 0 ; i < 512 ; i++ ) // Second step of marking process , in which are established the equivalence classes for // objects ( two object that are equivalent will be glue together into a single object ) { m_hSourceImage->get_line( i, lpSrcBuffer ) ; for( j = 0 ; j < 512 ; j++ ) { // Compute the distance between current pixel colour and up pixel colour if( j > 0 ) left = abs( (int)lpSrcBuffer[j-1] - (int)lpSrcBuffer[j] ) ; else left = MAXINT ; // Compute the distance between current pixel colour and left pixel colour if( i > 0 ) up = abs( (int)lpPrevSrcBuffer[j] - (int)lpSrcBuffer[j] ) ; else up = MAXINT ; if( left <= 0 && up <= 0 && m_pObjMapImage[i][j-1] != m_pObjMapImage[i-1][j] && EqList[m_pObjMapImage[i][j-1]] != EqList[m_pObjMapImage[i-1][j]] ) // Change the equivalence class for this object { eqclass = EqList[m_pObjMapImage[i-1][j]] ; if( EqList[m_pObjMapImage[i][j-1]] != m_pObjMapImage[i][j-1] ) { oldclass = EqList[m_pObjMapImage[i][j-1]] ; // oldv = maximum , newv = minimum if( eqclass < oldclass ) { oldv = oldclass ; newv = eqclass ; } else { newv = oldclass ; oldv = eqclass ; } for( int k = 0; k < max(m_pObjMapImage[i][j-1] , m_pObjMapImage[i-1][j])+1 ; k++ ) // Modify the equivalence class for 'oldv' into 'newv' if( EqList[k] == oldv ) EqList[k] = newv ; } else if( eqclass < EqList[m_pObjMapImage[i][j-1]] ) EqList[m_pObjMapImage[i][j-1]] = eqclass ;
Figure 2-13-2 C++ code of second step in objects marking process (II)

Algorithms

20
else { for( int k = 0; k < max(m_pObjMapImage[i][j-1] , m_pObjMapImage[i-1][j])+1 ; k++ ) // Modify the equivalence class for 'eqclass' into //'EqList[m_pObjMapImage[i][j-1]]' if( EqList[k] == eqclass ) EqList[k] = EqList[m_pObjMapImage[i][j-1]] ; } } for( j = 0 ; j < 512 ; j++ ) // Copy the current buffer into previous buffer lpPrevSrcBuffer[j] = lpSrcBuffer[j] ;

} for( i = 0 ; i < 512 ; i++ ) // Relabel all the pixels according to latest equivalence class { for( j = 0 ; j < 512 ; j++ ) m_pObjMapImage[i][j] = EqList[m_pObjMapImage[i][j]] ; } m_nObjectsNo = nCurrentId-1 ;
Figure 2-13-3 C++ code of second step in objects marking process (III)

The second phase, list building process, is implemented in the method ImageProcessing :: ListBuilding and has the steps illustrated in Figure 2-14.
1. allocate memory for a temporary array temp of pointers to CObject elements based on the number of objects found in the previous phase (usually this number is greater than the real number of objects from the image) and intialize all the arrays entries with NULL; 2. initialize the objects counter to 0 (nObjNo = 0) 3. scan the matrix built in the previous phase and for each position specified by a pair (i, j) execute the following steps: if the entry in the temp array corresponding to m_ObjMapImage[i][j] (this is the objects identifier) is not NULL jump to 3.4 3.2. else, being the first point which belongs to this object, allocate memory for a new object and store the returned address in the temp array at the m_ObjMapImage[i][j] entry 3.3. increment objects counter (nObjNo++) 3.4. add the point of coordinates (i, j) to its object 4. now, that we now the real number of objects in the image (nObjNo) we can allocate memory for m_pObjectList, namely nObjNo * sizeof( CObject ) bytes 5. copies all non NULL entries from temp array into m_pObjectList 6. free the memory for temp array 3.1.
Figure 2-14 The outline of the list of objects building phase

All variables started with m_ are attributes of class ImageProcessing and the rest of them are local variables. The class CObject is used to store information about an object and to operate with it and has the following description (the complete declaration can be found in the file improc.h):

class CObject

Algorithms

21

{ private: int m_Error ; // Error code UINT m_nId ; // Object's identifier BYTE m_nColor ; // Object's colour CFeatures m_Features ; // Object's features BOOL m_bComputed ; // TRUE if object's features were computed, otherwise FALSE UINT** m_pImage ; // Address of the matrix map which contains the object TIListImp <TPoint> m_lPerimeter ; // Object's perimeter ( as a list of points ) public: // Public methods following }; 2.5 Measurement and Classification After the objects were isolated and extracted from the scene, they have to be measured and then classified into disjoint groups. The field of computer science which deal with this problem is named pattern recognition. 2.5.1 Features Extraction In order to classify the objects into different classes, first some objects features has to be computed. In this section will be described only the features that I use in this application. First three of them (area, width and height, perimeter length ) reflect the size of the object, as long as the others offers information about the shape of the object. Area The area measurement is simply the number of pixels inside (and including) the boundary, multiplied by the area of a single pixel. As this measure is given in pixels, the area of a pixel is equal to 1. This feature is computed during the extraction of the object from the image, in function CObject :: AddPoint. Height and Width It is easy to compute the horizontal and vertical extent of an object while it is being extracted from the image. One needs only the minimum and the maximum line number and column number for this operation. Therefore, these two properties are also computed in CObject :: AddPoint. They reflect the dimension of the boundary rectangle (the smallest rectangle which enclose the object entirely). The boundary rectangle has the edges parallel to the X and Y axis.

Perimeter and Perimeter Length Frequently the circumferential distance around the boundary is useful for classification purposes. The perimeter is the pixels which form the border of one object and the perimeter length is simply the number of pixels included in the objects border. Like all others features from now on, the perimeter and perimeter length are computed inside the method CObject :: Features. Rectangularity

Algorithms A measurement that reflects the rectangularity of an object is the rectangle fit factor

22

R=

AO AR

where AO is the objects area and AR is the area of its minimum enclosing rectangle. It represents how well an object fills its minimum enclosing rectangle. This parameter takes on a maximum value of 1.0 for rectangular objects. It assumes the value /4 for circular objects and becomes small for slender, curved objects. The rectangle fir factor is bounded between 0 and 1. Circularity A group of shape features are called circularity measures because they are minimised by the circular shape. Their magnitude tends to reflect the complexity of the boundary. The most commonly used circularity measure is

P2 C= A
where P represents the perimeter length and A is the objects area. This feature takes on its minimum value of 4 for a circular shape. More complex objects yield higher values. This measurement is roughly correlated with the subjective concept of complexity of the boundary. Invariant moments The moments of a function are commonly used in probability theory. There is a class of shape features having several desirable properties that can be derived from moments. Definition The set of moments of a bounded function f(x,y) of two variables is defined by

M jk =

- -

x y
j

f(x, y)dxdy

where j and k take on all nonnegative integer values. As j and k take on all nonnegative values, they generate an infinite set of moments. The parameter j+k is called order of the moment. The set {Mjk } is unique for the function f(x,y) and only one function has that particular set of moments. For shape descriptive purposes, suppose f(x,y) takes on the value 1 inside the object and 0 elsewhere. This silhouette function reflects only the shape of the object and ignores internal gray level details. For one function, there is only one zero-order moment

M jk =

- -

f(x, y)dxdy

and it is clearly the area of the object. There are two first-order moments, and so on. We can make all first- and higher-order moments invariant to object size by dividing them by M00. Center of gravity The coordinates of the center of gravity of the object are given by

xC =

M 10 M 00

yC =

M 01 M 00

The so-called central moments are computed using the center of gravity as the origin

jk =

- -

(x - x

) j (y - y C ) k f(x, y)dxdy

Algorithms

23

In our case (discrete case) the integration operation becomes a summation over the entire domain, so that the above formula becomes:

jk =

(x - x
x=0 y = 0

) j (y - y C ) k f(x, y)

where M and N are the width and height of the image. For Mjk a similar formula is obtained. These formulas are used in function CObject :: Features for mass center (= center of gravity) and rotation angle computation (as described below). Angle of Rotation and Principal Axes The angle of rotation that causes the second-order central moment11 to vanish may be obtained from

tan( 2 ) =

2 11 20 02

The coordinate axes x, y at an angle from the x, y axes are called the principal axes of the object. The 90 ambiguity can be eliminated if we specify that 20 < 02 and 30 > 0. An object with NE-SW orientation will have a positive rotation angle and an object with NW-SE orientation will a negative rotation angle. For example, the code fragment from Figure 2-15 is taken from CObject :: Features method and computes the rotation angle of an object using the formula given above. double miu11 = 0.0 , miu02 = 0.0 , miu20 = 0.0 for( int i = m_Features.pntInsertionPoint.x ; i < m_Features.pntInsertionPoint.x + m_Features.nHeight ; i++ ) for( int j = m_Features.pntInsertionPoint.y ; j < m_Features.pntInsertionPoint.y + m_Features.nWidth ; j++ ) { if( m_pImage[i][j] == m_nId ) // Point belongs to the object { miu11 += ( i - m_Features.pntMassCenter.x ) * ( j - m_Features.pntMassCenter.y ) ; miu02 += ( j - m_Features.pntMassCenter.y ) * ( j - m_Features.pntMassCenter.y ) ; miu20 += ( i - m_Features.pntMassCenter.x ) * ( i - m_Features.pntMassCenter.x ) ; } } if( miu20 == miu02 ) m_Features.dRotation = 0.0 ; else m_Features.dRotation = -atan( 2 * miu11 / (miu20-miu02) ) / 2 ;
Figure 2-15 C++ code for the determination of rotation angle

More about feature selection can be found in [2].

2.5.2 Classification. Maximum Likelihood Classification Once all features had been computed, the next step can start. This phase named object classification must classify a selected object into one available class. In the beginning of this section I will describe some of the most used techniques for supervised classification and in the final I will explain in detail the implementation used in this application.

Algorithms 2.5.2.1 Maximum Likelihood Classification

24

Maximum likelihood classification is the most common supervised classification method. This is developed in the following in a statistically acceptable manner. Bayes Classification Let the available classes in which an object can be classified be represented by i, i = 1, , M where M is the total number of classes. In trying to determine the class or category to which a pixel at a location x belongs it is strictly the conditional probabilities p(i/x), i = 1, , M that are of interest. The position vector x is a column vector containing objects features. Classification is performed according to following classification rule

x i if p(i | x ) > p( j | x )

for all j i

(*)

i.e., the pixel at x belongs to class i if p(i/x) is the largest. This approach is called Bayes classification. The Maximum Likelihood decision rule Despite its simplicity, the p(i/x) are unknown. Suppose however that sufficient training data is available for each class. This can be used to estimate a probability distribution for each type that describes the chance of finding a pixel from classi, say, at the position x. Later the form of this distribution function will be made more specific. For the moment however it will be retained in general terms and represented by the symbol p(x|i) for each i = 1, , M. In other words, for a pixel at a location x a set of probabilities can be computed that give the relative likelihoods that the pixel belongs to each available class. The desired p(i/x) and the available p(x|i) - estimated from the training data - are related by the Bayes theorem:

p( i | x ) = p( x| i ) p( i ) / p( x )
where p(i) is the probability that class i occurs in the image. If, for example, 30% of the objects happen to belong to class i then p(i) = 0.3. p(x) is the probability of finding a pixel from any class at locationx. It is of interest to note in passing that

p( x ) = p( x| i ) p( i )
i =1

although p(x) itself is not important in the following. The p(i) are called a priori or prior probabilities, since they are the probabilities with which class membership of a pixel could be guessed before classification. By comparison the p(i/x) are called posterior probabilities. The classification rule (*) becomes:

x i if p( x| i ) p(i ) > p( x| j ) p( j )

for all j i

(**)

where p(x) has been removed as a common factor. The rule of (**) is more acceptable than that of (*) sincep(x|i) are known from training data, it is conceivable that the p(i) are also known or can be estimated from the analysts knowledge of the image. Since I dont have any informations about these values I suppose that all p(i) are equal with 1, so that this terms can also be factored out from (**) and (**) becomes:

x i if p( x|i ) > p( x| j )

for all j i

(***)

Because usually p(x|i) is a Gaussian distribution mathematical convience results if in (***) the definition gi(x) = ln(p(x|i)) is used, where ln is the natural logarithm, so that the final form is:

x i if gi ( x ) > g j ( x )

for all j i

(****)

This is the decision rule used in maximum likelihood classification: thegi(x) are referred to as discriminant functions. Decision Surfaces As a means for assessing the capabilities of the maximum likelihood decision rule it is of value to determine the essential shapes that separate one class from another. The surfaces, albeit implicit, can be devised in the following manner. The classes

Algorithms

25

are defined by those regions where their discriminant functions are the largest. Clearly these regions are separated by surfaces where the discriminant functions for adjoining classes are equal. The ith and the jth classes are separated therefore by the surface gi(x)-gj(x) = 0. This is referred to as a decision surface since, if all surfaces separating classes are known, decisions about class membership of an object can be made on the basis of its position relative to the complete set of surfaces. As mentioned before, Gaussian distribution functions are used which yield quadratic discriminant functions and therefore decisions surfaces implemented by the maximum likelihood classification are quadratic and take the form of parabolas, circles and ellipses. Thresholds It is implicit in the foregoing development that every object will be classified into one of the available classesi, irrespective of how small the actual probabilities of class membership are. Poor classification can result as indicated in Figure 2-16:

T0

T1 T2

T3

Figure 2-16 Use of thresholds to remove poor classification

Such situations can arise if not enough training data were available for example. In situations such as these it is sensible to apply thresholds to the decision process in the manner showed in figure above. Objects which have probabilities for all classes below the threshold are not classified. For x values between T1 and T2 or under T0 or above T3 are not classified. In practice, thresholds are applied to the discriminant functions and the probability distributions, since the latter are never computed. With the incorporation of a threshold therefore, the decision rule becomes:

x i if gi ( x ) > g j ( x ) and gi ( x ) > Ti


where Ti is the threshold seen to be significant for the classi.

for all j i

More detailed informations about maximum likelihood classification and another methods for classification can be found in [1] and [2].

2.5.2.2 Implementation In this application there are two classes: rectangle objects and ellipse objects. The others objects are not classified, denoted by the unknown class. The distribution functions used have the form of a Gaussian distribution:

p( x ) = e

( x m)2 2

where m is the mean value and is the variance of the distribution. Given particular values to these two parameters the distribution functions for the rectangle and ellipse classes are obtained. x is replaced with the rectangle fit factor computed for each object.

Algorithms

26

Therefore two probability distribution functions p1 (for rectangle class) and p2 (for ellipse class) are used for classification:

p1 ( x ) = e

( x 1) 2 0.08 ( x / 4 ) 2 0.11

p2 ( x ) = e

where m1 = 1, 1 = 0.28, m2 = /4, 2 = 0.34. The values for were determined experimentally whereas the values for m 1 and m2 are known as particular values for rectangle fit factor. Using these distributions the discriminant functions for these two classes becomes:

( x 1) 2 0.08 ( x / 4) 2 g2 ( x ) = 0.11 g1 ( x ) =
where with x is denoted rectangle fit factor of the object. The threshold values used with this two functions are T1 = T2 = 0.1024. Now, the code for objects classification can be written and it will look as in Figure 2-17 (taken from TObjInfoDialog :: SetupWindow method): double dProbEll = (m_Features.dRectangularity-M_PI/4.0)* (m_Features.dRectangularity-M_PI/4.0) / 0.11, dProbRect = (m_Features.dRectangularity-1.0)* (m_Features.dRectangularity-1.0) / 0.08 ; if( dProbEll < dProbRect && dProbEll <= 0.1024 && m_Features.dCircularity >= 7 && m_Features.dCircularity <= 18 ) // Ellipse object { m_nType = ELLIPSE ; GetModule()->LoadString(IDSTRING_ELLIPSE, buff, 63 ) ; } else if( dProbEll > dProbRect && dProbRect <= 0.1024 ) // Rectangle object { m_nType = RECTANGLE ; GetModule()->LoadString(IDSTRING_RECTANGLE, buff, 63 ) ; } else // Unknown object { m_nType = UNKNOWN ; GetModule()->LoadString(IDSTRING_UNKNOWN, buff, 63 ) ; }
Figure 2-17 C++ code for objects classification

In order to improve the classification performances for ellipse class it is used also the circularity factor computed for each object. It was noted that this takes the value of 4 (12.566) for circular objects therefore a neighbourhood around this value is used to test the circularity. It was observed that there is a tendency to classify the objects with small area (small objects) into ellipse class. This is due to the computation of rectangle fit factor which takes values around 0.75 for this objects. Usually the difference between the boundary rectangle and the object is about two or three pixels. This small difference yield a value about 0.75 for rectangle fit factor which is very close to /4, so that the object will be classified as an ellipse. Anyway, for small objects it is hard to distinguish between a rectangle and an ellipse even for the human eye.

Algorithms

27

2.6 Operations Involving Two Objects After we have identify and extract the object from one image and all features were computed, one can operate with these objects. In fact this was the aim of my job. I have implemented two operations which involves two different objects: the union of two objects (join operation) and the computation of the various distances between two objects. 2.6.1 The Join of Two Objects Because of the uncertainty of the segmentation process, sometimes the result of the segmentation is not what the user would like to obtain. If the result is far away from the desired classification it is recommended to use another segmentation technique. But, if only few extracted objects doesnt correspond to the users wishes, then it is possible to modify the result of the automatically segmentation by using the facility of objects joining. Usually this is the case: an object is segmented in two or more distinct objects because of its very disproportionate illumination. To eliminate this artifact the union of two objects was the first operation involving more than one object which was implemented. The join (union) of two objects creates a new object which has the identifier of the first selected one and includes both objects. The method BOOL CImageProcessing :: JoinObjects( CObject* o1, CObject* o2 ) takes as parameters two pointers to CObject class (the selected objects) and performs the join of the two selected objects. After testing the correctness of the passed parameters (not NULL, not pointing the same location) in order to delete the object o2 from the m_pObjMapImage matrix it will replace all occurrences of o2-> m_nId in the matrix m_pObjMapImage with o1-> m_nId. Finally it calls CObject :: Join for o1 object for updating the features of o1 object according to the new o1 object. It can be noticed that no new object is created but the first one is modified while the second is erased from the matrix. 2.6.2 Distances Between Objects After the user gets the desired result it is possible to make more measurement with the obtained objects. A useful class of measurements is the various distances which can be computed taking in consideration two objects. First I have to mention that all the distances are in pixels. To convert these distances in centimetres (or other real-world metrics) it is needed to know the dimension of one pixel and the magnification through the optical lenses. Let us suppose that we have two objects, o1 and o2, rotated with 1 and 2 degree. First computed distance is the euclidian distance between mass centres of the two objects. If o1 has mass center in the point m1 and o2 in m2, then the distance is given by:

d = (m1. x m2. x ) 2 + (m1. y m2. y ) 2


This distance is denoted by the word Distance in the distances message box. Another useful distances are those between the mass centres along the x and y axis. The formulas for these are

dx = m1. x m2. x dy = m1. y m2. y


where | | denotes absolute value of the argument. These distances are specified by corresponding DistanceX and DistanceY words in the same message box. Now let us take care of the rotation angle of the objects. The distance between two parallel lines containing the mass centres of the objects and rotated by 1 degrees is named Distance1 and is computed using the following formula:

d 1 = m2. y m1. y cos( 1) m2. x m1. x sin( 1)

Algorithms

28

The distance Distance2 is similar to Distance1 but 2 is used instead of 1, so that

d 2 = m2. y m1. y cos( 2) m2. x m1. x sin( 2)


The computation for these distances is implemented in the method TMDISegmentedImage :: EvLButtonUp after the objects were selected.

Implementations. Technical Specifications and Other Details About the Implementation

29

3. Implementations. Technical Specifications and Other Details About the Implementation Short description of the new features of the application In this part of my work I will describe the new features added to the application and how one uses these. First of all it is necessary to load at least one image. If there is more than one image, the segmentation is applied to the image contained into the active window. The segmentation process is started by theBildverarbeitungObjekt Klassifizirung command. After that the Segmentation dialog box is displayed and the user has the opportunity to select the desired segmentation technique. For each technique it is possible to tune some parameters which control the quality and accuracy of the segmentation. For thresholding segmentation the user is prompt to choose the smooth level which is in fact the only one parameter passed to the CHistogram :: ComputeThreshold method. Small values for this argument yield many objects in the final segmented image. Greater values will reduce the number of objects. Recommended values are around 4 ( 1). Figure 3-1 shows the differences obtained due to different smooth level values. The left image was segmented using a smooth level equal with 3 and the right image was obtained using value 4.

Figure 3-1 The influence of the smooth level in thresholding segmentation

For pyramidal segmentation it is necessary to specify the number of pyramids levels. Valid values are in range 1 to 8. The 0 level means that the pyramid will have just the bottom level which will contain the original image, so that in fact no pyramidal segmentation takes place. As the images are 512x512 and 512 = 29 it comes that the 9th level will contain just one element and again no real segmentation is performed. If the argument is 1 it means the pyramid will have two levels (0 and 1): the bottom level and one level above. A value of 2 indicates that there are two levels above the basis and so on. If one increases the levels number, then the number of operations increase also and the segmentations time grows proportionally. It is clearly that small values produce many objects whereas greater values reduce the total number of objects in their final segmented image. If the results does not please the user, the method CPyramidalSegmentation :: Root can be modified in order to make a better root marking. Figure 3-2 illustrates the influence of the level in segmentation process. The left image was segmented using a pyramid with 6 levels and the right picture is the result of the segmentation using a pyramid with 8 levels. It can be noticed that in the first case the number of extracted objects is greater than in the second.

Implementations. Technical Specifications and Other Details About the Implementation

30

Figure 3-2 Different level values in pyramidal segmentation

For mixed segmentation there are two parameters in order to control both the quadtree smoothing phase and the thresholding segmentation. Tree level specify the number of levels for smoothing process. Its range is from 0 to 9. A value of 0 means that no smoothing is done whereas 9 will compute the medium gray level for the entire image. Greater values produce images with better histogram (with very well specified peaks). For this parameter I suggest values about 3. The second argument, smooth level, has the same significance as in case of thresholding segmentation and it is used when the array of thresholds is computed for smoothed image (at the highest level). After the OK button was pressed the process of image segmentation starts. A hourglass cursor indicates that it is working. At the end of segmentation process, the segmented image will be displayed. In this image each object is painted with a different colour, so that it will be easier to recognise a particular object. Now, one can access the Objekt verbinden, Entfernung and Information options from Bildverarbeitung menu. Only one of these options can be checked at one time. By default Information mode is selected. After you selected one option from the above available options, it is expected to select one (forInformation option) or two objects (for the rest) from the segmented image. One object is selected by simply clicking with the left mouse button inside the desired object. If the Information option is activated, after one selects an object a dialog box with all the properties of the object is displayed. Object Information dialog box contains the following fields with their significance: Left & Top specify the top-left corner of the smallest rectangle which enclose the object (in pixel coordinate relative to origin which is situated in the top-left corner of the image) Width specifies the width of this rectangle (in pixels) Height is the height of the enclosing rectangle (in pixels) Mass Centre contains the coordinates of the mass centre of the object Circularity represents the circularity feature computed for this object Rectangle fit factor represents the rectangularity feature of this object Area is the total number of pixels from the object Perimeter length is the number of pixels on the objects perimeter Object class identifies the class to which the object belongs (ellipse, rectangle); if the object was not classified the class is denoted by word unknown Rotation angle specifies the rotation angle of the object relative to the horizontal axis More... This button gives more informations about an object according to the objects class. For rectangle objects is shown the width and the height of the rectangle whereas for ellipses class the radius in X and Y direction are computed. When Entfernung mode is activated after the selection of two objects a message box with all four distances is shown to the user. More about the meaning of displayed data can be read in the Distances between objects section. When the option Objekt verbinden is checked after the selection of two objects they will be joined together, so that they will have the same colour (the colour of the first selected object). My files

Implementations. Technical Specifications and Other Details About the Implementation Here are all files included in SPG project by me: IMPROC.CPP all the image processing operations IMPROC.H the header for IMPROC.CPP SEGDLG.CPP the code for Segmentation dialog box SEGDLG.H the header for SEGDLG.CPP SEGIMAGE.CPP the operations on the segmented image (objects joining, object info, distances) SEGIMAGE.H the header for SEGIMAGE.CPP OBJINFO.CPP informations about a selected object (including objects classification) OBJINFO.H the header for OBJINFO.CPP OBJINFO.RC all resources added in the project OBJINFO.RH the header for OBJINFO.RC

31

In order to make as easy as possible the searching of code portions in my source files, I will specify for each class the file which implements the code for it in the table Table 3-1. CLASS CHistogram CImageProcessing CMixedSegmentation CObject CPyramidalSegmentation CSegmentation CThresholdingSegmentation TEllipseDialog TMDISegmentedImage TObjInfoDlg TRectangleDialog TSegmentationDialog FILES(S) improc.h + improc.cpp improc.h + improc.cpp improc.h + improc.cpp improc.h + improc.cpp improc.h + improc.cpp improc.h improc.h + improc.cpp objinfo.h + objinfo.cpp segimage.h + segimage.cpp objinfo.h + objinfo.cpp objinfo.h + objinfo.cpp segdlg.h + segdlg.cpp
Table 3-1 Classes and files

My signature If I modified or added something else in others files than specified above, those sections begin with the following comment line: // Added by Daniel Pop These files are T_BILD.CPP, T_BILD.H, BILDSHOW.CPP. File header Each source file (.cpp or .h) begins with a short summary of all functions, methods and classes included in it. It will be quicker and easier to find a specific method or function having this short description and with just one look can be seen the entire content of the file. To know what contains this file will be no more a problem. Messages and other user interface elements Although all messages are displayed in English language it will be very easy to translate them into German language because the resource script file objinfo.rc contains them all. Besides the messages, this file contains also all dialog boxes implemented by me. All identifiers of the objects from objinfo.rc can be found in objinfo.rh file. Only the three new options included in the BILDMENU menu are stored in spg.rc file. Figure 3-3 presents the code added by ResourceWorkshop for this additional options.

Implementations. Technical Specifications and Other Details About the Implementation

32

BILDMENU MENU { POPUP "&Bildverarbeitung" { MENUITEM "&Filtern", CM_FILTERN MENUITEM "&Kantenextraktion", CM_KANTEN_EXTR MENUITEM "&Markanter Punkt", CM_MARKANTER_PUNKT MENUITEM "&Farbmischung", CM_FARBEN_MISCHEN MENUITEM SEPARATOR MENUITEM "Bild klonen", CM_BILD_KLONEN MENUITEM "Bildanzeige drehen...", CM_BILD_ROTIEREN MENUITEM SEPARATOR MENUITEM "3D-Bild erzeugen", CM_DREI_D_BILD MENUITEM "Objekt Klassifizirung", CM_ BILDVERARBEITUNGJOINOBJECT_KLASSIFIZIRUNG MENUITEM SEPARATOR MENUITEM "Objekt verbinden", CM_BILDVERARBEITUNGJOIN, GRAYED MENUITEM "Entfernung", CM_BILDVERARBEITUNGDISTANCE, GRAYED MENUITEM "Information", CM_BILDVERARBEITUNGINFORMATIONS, GRAYED } }
Figure 3-3 Bildverarbeitung menu

Adding a new segmentation algorithm If you need to implement a new segmentation algorithm, then you have to create a new class derived from CSegmentation and overload all the pure virtual methods of this class. The CSegmentation class has the following description:

class CSegmentation { protected: // Some protected data .. public: CSegmentation( TBildArray* hSrc, TBildArray* hDest ) ; virtual ~CSegmentation() ; int GetError() ; virtual BOOL Segmentate() = 0 ; }; The only one virtual pure method is CSegmentation :: Segmentate which performs the segmentation algorithm and therefore has to be overloaded for each derived class. After that you must introduce a new type identifier for your segmentation and to add a new line in ImageProcessing :: Segmentation method that instantiate an object of your own class.

Test and Results

33

4. Test and Results In order to test all the methods implemented in this application, a lot of images were taken using the camera system and the three segmentation methods were applied, choosing for every one the convenient values for parameters. Figure 4-1 offers an example of an image of a real case with the corresponding segmented images.

a.

b. A

c.

d.
Figure 4-1 Original image (a) and segmented images obtained using different segmentation algorithms (b, c, d)

In Figure 4-1, a. represents original image, b. is the segmented image obtained using thresholding segmentation with smooth factor equal to 4, c. is the result of pyramidal segmentation using 8 levels and d. was obtained using the mixed technique with 2 levels and smooth factor equal to 4. Thresholding and mixed methods required almost the same time period (7 seconds, respectively 6 seconds) and pyramidal segmentation needed a longer time period (52 seconds). The tests were made using a PENTIUM 120MHz computer with 16MB RAM memory. It can be seen that mixed segmentation offers the closest result to what user would obtain. The thresholding segmentation includes a little bit more artifacts, such as false objects identified due to their gray level. In pyramidal segmented image too many objects were extracted. Maybe the Root method should be modified in order to mark the roots of the subtrees more rigorously.

Test and Results

34

The informations about an classified object are displayed to user into a dialog box. As an example, for the object marked with A in Figure 4-1.d is presented in Figure 4-2 the informations window presented by program.

Figure 4-2 Informations about an object

About the tests and results related to the classification problem it was observed that for objects with dimensions larger than 25 pixels the correct class is computed. For small objects it was observed that very frequently the ellipse class is chosen as a final classification. In this case the characteristics of one ellipse (radius X and Y) and the characteristics of one rectangle (width and height) are related by relation radius-X = width / 2 and radius-Y = height / 2 so that if a mistaken classification is performed it will be easy to compute the real objects characteristics. The source of those possible mistakes is explained in section 2.5.2.2.

Final Conclusions and Further Improvements

35

5. Final Conclusions and Further Improvements The central aim of this work is to extract individual objects from an image. This problem is known as image segmentation. In last years a lot of methods were developed for this task. Most recently approaches are in the domain of parallel computing and distributed systems where some interesting results were obtained. I try to select a method which suites best in this particular application from the large variety of available techniques for image segmentation. Three ways are available to do this. In the first row I recommend the mixed technique. For some images the thresholding method gives also very good results. The pyramidal method is a little bit more expensive in terms of computing time and the root marking function should be tuned when the real situation is known (the images will be taken using the final illumination conditions). The classification routine can be enlarged in order to cover more object classes. For this purpose it would be necessary to compute more features of an object. A directly neural network approach is not so easy to implement because of the objects sizes which are different from an object to another. With some modifications to standard method (e.g. pattern resizing and scaling, which means that the objects should be scaled before a comparisons with memorised patterns take place) or using a features vector for each object, a neural network approach of object classification would be possible. Another interesting problem is the boundary rectangle computed for each object. Now it is parallel to the X and Y axis. It would be of interest to correlate this rectangle to the principal axis of the objects. For example is a rectangle object is rotated with 45, then the rectangle fit factor would be 0.5 and the object would not be classified as a rectangle. But, if the boundary rectangle is parallel to the objects principal axes, the rectangle fit factor will be 1.0 and the a correct classification is made. In case of ellipse objects the radius along the principal axes will be also very easy to compute if the boundary rectangle is aligned to this axis. The implementation files contain a lot of remarks for each programs section. Almost every line has a little comment that describe its actions. Each function or method has a comment header in which the meaning of the parameters and the returned value as well as the functionality of the method are described. I hope that all these remarks will help the successors programmers who maybe would like to modify my sources. Acknowledgments Last but not least I would like to thank to the entire staff of the Chair of Production Systems and Process Control, especially to Dipl.-Ing. Matthias Dohmen for its support during my work here.

References and table of figures

36

REFERENCES
Richards, John A. - Remote Sensing Digital Image Analysis, Springer-Verlag, Berlin, 1994 Castleman, Kenneth R. - Digital Image Processing, Prentice-Hall, New Jersey, 1979 Jhne, Bernd - Digitale Bildverarbeitung, Springer-Verlag, Berlin, 1991 Bssmann H., Besslich Ph. W. - Bildverarbeitung AdOculos, Springer-Verlag, Berlin, 1993 Rosenfeld, A. - Multiresolution Image Processing and Analysis, Springer-Verlag, Berlin, 1984 Wilson Roland, Spann Michael - Image Segmentation and Uncertainity, Research Studies Press, Letchworth Hertfordshire, 1988 7. Insitute for Automation Research, Chair of Production Systems and Process Control - Annual Report 1995, Bochum, 1995 1. 2. 3. 4. 5. 6.

References and table of figures

37

TABLE OF FIGURES Figure 2-1 Comparison of simple averaging and median filtering Figure 2-2 C++ code for histogram smoothing Figure 2-3 The bimodal histogram Figure 2-4 Thresholding selection Figure 2-5 C++ code for grades extraction Figure 2-6 C++ code for thresholdings array building Figure 2-7 An outline of pyramidal segmentation algorithm Figure 2-8 Node linking in standard pyramidal segmentation Figure 2-9 C++ code for root marking algorithm Figure 2-10 The final form of pyramidal segmentation Figure 2-11 Comparison of histograms obtained at different levels in quadtree smoothing process Figure 2-12 C++ code of first step in objects marking process Figure 2-13-3 C++ code of second step in objects marking process (III) Figure 2-14 The outline of the list of objects building phase Figure 2-15 C++ code for the determination of rotation angle Figure 2-16 Use of thresholds to remove poor classification Figure 2-17 C++ code for objects classification Figure 3-1 The influence of the smooth level in thresholding segmentation Figure 3-2 Different level values in pyramidal segmentation Figure 3-3 Bildverarbeitung menu Figure 4-1 Original image (a) and segmented images obtained using different segmentation algorithms (b, c, d) Figure 4-2 Informations about an object 5 7 8 8 9 10 11 11 13 14 16 18 20 20 23 25 26 29 30 32 33 34

Você também pode gostar