Você está na página 1de 33

# IMAGE PROCESSING AND PATTERN RECOGNITION BEG476CO Year: IV : II Teaching Schedule Hours/Week Examination Scheme Theory Tutorial Practical

Internal Assessment 3 3/2 Theory Practical* Theory** 125 20 25 80 * Continuous ** Duration: 3 hours

Semester

Final Total Practical

Course Objectives: to provide the knowledge of image processing and pattern reco gnition and their applications. 1. (4 hrs) 1.1 1.2 1.3 1.4 Introduction to Digital Image Processing: Digital image representation Digital image processing: problems and applications Elements of visual perception Sampling and quantization, relationships between pixels `

2. Two-Dimensional Systems: (5 hrs) 2.1 Fourier transform and Fast Fourier Transform 2.2 Other image transforms and their properties: Cosine transform, Sine tran sform, Hadamard transform, Haar transform 3. Image enhancement and restoration: (8 hrs) 3.1 Point operations, contrast stretching, clipping and thresholding, digita l negative, intensity level slicing, bit extraction 3.2 Histogram modeling: equalization modification, specification 3.2 Spatial operations: averaging, directional smoothing, median, filtering spatial low pass, high pass and band pass filtering, magnification by replicatio n and interpolation 4. Image coding and compression: (4 hrs) 4.1 Pixel coding: run length, bit plan 4.2 Predictive and inter-frame coding 5. Introduction to pattern recognition and images: (3 hrs) 6. (5 hrs) 6.1 6.2 6.3 6.4 Recognition and classification: Recognition classification Feature extraction Models Division of sample space

7.0 Grey level features edges and lines: (6 hrs) 7.1 Similarity and correlation 7.2 Template matching

7.3 7.4 7.5

Edge detection using templates Edge detection using gradient models Model fitting, line detection, problems with feature detectors

8. Segmentation: (3 hrs) 8.1 Segmentation by thresholding 8.2 Regions for edges, line and curve detection 9. Frequency approach and transform domain: (3 hrs) 10. 10.1 10.2 10.3 Advanced Topics: (4 hrs) Neural networks and their application to pattern recognition Hopfield nets Hamming nets, perceptron

Laboratory: Laboratory exercises using image processing and pattern recognition packages. Reference books: 1. K. Castlemann, Digital image processing, Printice Hall of India Pvt. Ltd., 1996. 2. A. K. Jain, Fundamental of Digital Image processing, Printice Hall of Indi a Pvt. Ltd., 1995. 3. R. C. Gonzalez and P. Wintz, Digital Image Processing, Addison-Wesley Publishing, 1987. 4. Sing_tze Bow, M. Dekker, Pattern recognition and Image Processing, 1992 5. M. James, Pattern recognition, BSP professional books, 1987. 6. P. Monique and M. Dekker, Fundamentals of Pattern recognition, 1989.

Chapter 1 Introduction to Digital Image Processing An image is digitized to convert it to a form which can be stored in a computer' s memory or on some form of storage media such as a hard disk or CD-ROM. This di gitization procedure can be done by a scanner, or by a video camera connected to a frame grabber board in a computer. Once the image has been digitized, it can be operated upon by various image processing operations. Image processing operations can be roughly divided into three major categories, Image Compression, Image Enhancement and Restoration, and Measurement Extraction . Image compression is familiar to most people. It involves reducing the amount o f memory needed to store a digital image. Image defects which could be caused by the digitization process or by faults in the imaging set-up (for example, bad lighting) can be corrected using Image Enha ncement techniques. Once the image is in good condition, the Measurement Extract ion operations can be used to obtain useful information from the image. Digital image representation Vector images One way to describe an image using numbers is to declare its contents using posi

tion and size of geometric forms and shapes like lines, curves, rectangles and c ircles; such images are called vector images. A vector image is resolution indep endent, this means that you can enlarge or shrink the image without affecting th e output quality. Vector images are the preferred way to represent Fonts, Logos and many illustrations. Coordinate system We need a coordinate system to describe an image, the coordinate system used to place elements in relation to each other is called user space, since this is the coordinates the user uses to define elements and position them in relation to e ach other. The coordinate system used for many examples in this subject has the origin in t he upper left, with the x axis extending to the right and y axis extending downw ards. Bitmap images Bitmap-, or raster -, images are digital photographs, they are the most common for m to represent natural images and other forms of graphics that are rich in detai l. Bitmap images are how graphics is stored in the video memory of a computer. T he term bitmap refers to how a given pattern of bits in a pixel maps to a specif ic color. A bitmap images take the form of an array, where the value of each element, call ed a pixel picture element, correspond to the color of that portion of the image . Each horizontal line in the image is called a scan line. The letter 'a' might be represented in a 12x14 matrix as depicted in Figure 3., the values in the matrix depict the brightness of the pixels (picture elements). Larger values correspond to brighter areas whilst lower values are darker. Sampling When measuring the value for a pixel, one takes the average color of an area aro und the location of the pixel. A simplistic model is sampling a square, this is called a box filter, a more physically accurate measurement is to calculate a we ighted Gaussian average (giving the value exactly at the pixel coordinates a hig h weight, and lower weight to the area around it). When perceiving a bitmap imag e the human eye should blend the pixel values together, recreating an illusion o f the continuous image it represents. Raster dimensions The number of horizontal and vertical samples in the pixel grid is called Raster dimensions, it is specified as width x height. Resolution Resolution is a measurement of sampling density, resolution of bitmap images giv e a relationship between pixel dimensions and physical dimensions. The most ofte n used measurement is ppi, pixels per inch Megapixels Megapixels refer to the total number of pixels in the captured image, an easier metric is raster dimensions which represent the number of horizontal and vertica l samples in the sampling grid. An image with a 4:3 aspect ratio with dimension 2048x1536 pixels, contain a total of 2048x1535=3,145,728 pixels; approximately 3 million, thus it is a 3 megapixel image. Table 1.1. Common/encountered raster dimensions Dimensions Megapixels Name Comment 640x480 0.3 VGA VGA 720x576 0.4 CCIR 601 DV PAL Dimensions used for PAL DV, and PAL DVDs 768x576 0.4 CCIR 601 PAL full PAL with square sampling grid ratio 800x600 0.4 SVGA

1024x768 dimensions. 1280x960 1600x1200 1920x1080 ormat. 2048x1536 lms. 3008x1960 3088x2056 4064x2704

0.8 1.2 2.1 2.1 3.1 5.3 6.3 11.1

XGA

The currently (2004) most common computer screen

UXGA 1080i HDTV 2K

interlaced, high resolution digital TV f

Typically used for digital effects in feature fi

Scaling / Resampling When we need to create an image with different dimensions from what we have we s cale the image. A different name for scaling is resampling, when resampling algo rithms try to reconstruct the original continous image and create a new sample g rid. Scaling image down The process of reducing the raster dimensions is called decimation, this can be done by averaging the values of source pixels contributing to each output pixel. Scaling image up When we increase the image size we actually want to create sample points between the original sample points in the original raster, this is done by interpolatio n the values in the sample grid, effectively guessing the values of the unknown pixels Sample depth The values of the pixels need to be stored in the computers memory, this means t hat in the end the data ultimately need to end up in a binary representation, th e spatial continuity of the image is approximated by the spacing of the samples in the sample grid. The values we can represent for each pixel is determined by the sample format chosen. 8bit A common sample format is 8bit integers, 8bit integers can only represent 256 di screte values (2^8 = 256), thus brightness levels are quantized into these level s. 12bit For high dynamic range images (images with detail both in shadows and highlights ) 8bits 256 discrete values does not provide enough precision to store an accura te image. Some digital cameras operate with more than 8bit samples internally, h igher end cameras (mostly SLRs) also provide RAW images that often are 12bit (2^ 12bit = 4096). 16bit The PNG and TIF image formats supports 16bit samples, many image processing and manipulation programs perform their operations in 16bit when working on 8bit ima ges to avoid quality loss in processing. Floating point Some image formats used in research and by the movie industry store floating poi nt values. Both "normal" 32bit floating point values and a special format called half which uses 16bits/sample. Floating point is useful as a working format bec ause quantization and computational errors are kept to a minimum until the final render. Floating point representations often include HDR, High Dynamic Range. High Dynam ic Range images are images that include sampling values that are whiter than whi te (higher values than 255 for a normal 8bit image). HDR allows representing the

light in a scene with a greater degree of precision than LDR, Low Dynamic Range images. Colors The most common way to model color in Computer Graphics is the RGB color model, this corresponds to the way both CRT monitors and LCD screens/projectors reprodu ce color. Each pixel is represented by three values, the amount of red, green an d blue. Thus an RGB color image will use three times as much memory as a gray-sc ale image of the same pixel dimensions. Palette / Indexed images It was earlier common to store images in a palletized mode, this works similar t o a paint by numbers strategy. We store just the number of the palette entry use d for each pixel. And for each palette entry we store the amount of red, green a nd blue light. Digital image processing: problems and applications Application Computer vision Face detection Feature detection Lane departure warning system Non-photorealistic rendering Medical image processing Microscope image processing Morphological image processing Remote sensing Computer Vision Computer vision is the science and technology of machines that see. As a scienti fic discipline, computer vision is concerned with the theory for building artifi cial systems that obtain information from images. The image data can take many f orms, such as a video sequence, views from multiple cameras, or multi-dimensiona l data from a medical scanner. As a technological discipline, computer vision seeks to apply the theories and m odels of computer vision to the construction of computer vision systems. Example s of applications of computer vision systems include systems for: Controlling processes (e.g. an industrial robot or an autonomous vehicle). Detecting events (e.g. for visual surveillance or people counting). Organizing information (e.g. for indexing databases of images and image sequence s). Modeling objects or environments (e.g. industrial inspection, medical image anal ysis or topographical modeling). Interaction (e.g. as the input to a device for computer-human interaction). Computer vision can also be described as a complement (but not necessarily the o pposite) of biological vision. In biological vision, the visual perception of hu mans and various animals are studied, resulting in models of how these systems o perate in terms of physiological processes. Computer vision, on the other hand, studies and describes artificial vision system that is implemented in software a nd/or hardware. Interdisciplinary exchange between biological and computer visio n has proven increasingly fruitful for both fields. Sub-domains of computer vision include scene reconstruction, event detection, tr acking, object recognition, learning, indexing, ego-motion and image restoration . Face Detection Face detection is a computer technology that determines the locations and sizes of human faces in arbitrary (digital) images. It detects facial features and ign ores anything else, such as buildings, trees and bodies. Feature Detection

In computer vision and image processing the concept of feature detection refers to methods that aim at computing abstractions of image information and making lo cal decisions at every image point whether there is an image feature of a given type at that point or not. The resulting features will be subsets of the image d omain, often in the form of isolated points, continuous curves or connected regi ons. Lane Departure warning system A lane departure warning system (LDW) is a mechanism designed to warn a driver w hen the vehicle begins to move out of its lane (unless a turn signal is on in th at direction) on freeways and arterial roads. Non-photorealistic rendering Non-photorealistic rendering (NPR) is an area of computer graphics that focuses on enabling a wide variety of expressive styles for digital art. In contrast to traditional computer graphics, which has focused on photorealism, NPR is inspire d by artistic styles such as painting, drawing, technical illustration, and anim ated cartoons. NPR has appeared in movies and video games in the form of "toon s haders," as well as in architectural illustration and experimental animation. An example of a modern use of this method is that of Cel-shaded animation. Medical image processing Medical imaging refers to the techniques and processes used to create images of the human body (or parts thereof) for clinical purposes (medical procedures seek ing to reveal, diagnose or examine disease) or medical science (including the st udy of normal anatomy and physiology). As a discipline and in its widest sense, it is part of biological imaging and in corporates radiology (in the wider sense), radiological sciences, endoscopy, (me dical) thermography, medical photography and microscopy (e.g. for human patholog ical investigations). Measurement and recording techniques which are not primarily designed to produce images, such as electroencephalography (EEG) and magnetoencephalography (MEG) a nd others, but which produce data susceptible to be represented as maps (i.e. co ntaining positional information), can be seen as forms of medical imaging. Microscope image processing Microscope image processing is a broad term that covers the use of digital image processing techniques to process, analyze and present images obtained from a mi croscope. Such processing is now commonplace in a number of diverse fields such as medicine, biological research, cancer research, drug testing, metallurgy, etc . A number of manufacturers of microscopes now specifically design in features t hat allow the microscopes to interface to an image processing system. Mathematical morphology Mathematical morphology (MM) is a theory and technique for the analysis and proc essing of geometrical structures, based on set theory, lattice theory, topology, and random functions. MM is most commonly applied to digital images, but it can be employed as well on graphs, surface meshes, solids, and many other spatial s tructures. Topological and geometrical continuous-space concepts such as size, shape, conve xity, connectivity, and geodesic distance, can be characterized by MM on both co ntinuous and discrete spaces. MM is also the foundation of morphological image p rocessing, which consists of a set of operators that transform images according to the above characterizations. Remote sensing In the broadest sense, remote sensing is the small or large-scale acquisition of information of an object or phenomenon, by the use of either recording or realtime sensing device(s) that is not in physical or intimate contact with the obje ct (such as by way of aircraft, spacecraft, satellite, buoy, or ship). In practi ce, remote sensing is the stand-off collection through the use of a variety of d evices for gathering information on a given object or area. Thus, Earth observat

ion or weather satellite collection platforms, ocean and atmospheric observing w eather buoy platforms, monitoring of a pregnancy via ultrasound, Magnetic Resona nce Imaging (MRI), Positron Emission Tomography (PET), and space probes are all examples of remote sensing. In modern usage, the term generally refers to the us e of imaging sensor technologies including but not limited to the use of instrum ents aboard aircraft and spacecraft, and is distinct from other imaging-related fields such as medical imaging.

Fundamental steps in digital image processor

Image acquisition Image enhancement It is the most simplest and most appealing area of digital image processing. Bas ically the idea behind enhancement techniques is to bring out detail that is obs cure, or simply to highlight certain features of interest in an image. Image restoration It is an area that also deals with improving the appearance of an image. However , unlike enhancement, which is subject, image restoration is objective, in the s ense that restoration techniques tend to be based on mathematical or probabilist ic models of image degradation. Color Image Wavelets Wavelets are the foundation for representing images in various degrees of resolu tion. In particular, for image data compression and for pyramidal representation , in which image are subdivide successively into smaller regions. Compression Name implies, deals with techniques for reducing the storage required to save an image, or the bandwidth required to transmit it. Morphological processing It deals with tools for extracting image components that are useful in the repre sentation and description of shape. Segmentation This procedures partition an image into its constituent parts objects. Representation and Description Almost always follows the output of a segmentation stage, which usually is raw p ixel data. Description, also called feature selection, deals with extracting att ributes that result in some quantitative information of interest or are basic fo r differentiating one class of objects from another. Elements of visual perception This section introduces visual design variables. The visual design variables av ailable to you comprise the visual language you use to create your application's i nterface. Some variables, such as texture, have limited availability on today's systems, and are not addressed. However, the variables described can provide a w ide variety of elements to create an infinite number of unique visual designs. V isual design variables are:

Color Shape Size Contrast Anomaly Color The computer display image is formed of pixels radiating light toward the user. As a consequence, such an image covers a wider range of contrasts than reflectiv e images like printed pages and paintings. Computer display images have ranges f rom black to bright and luminous white. In addition, the color contrasts that ar e possible are much stronger than in traditional reflective media. If you do not have a graphic designer available for consultation, consider the f ollowing principles for using color: Think of color in terms of hue, saturation, and value. Color is three dimensional. The color of a pixel on the screen is specified by t hree numbers: a level of red, green, and blue (RGB). A more intuitive, useful re presentation of color uses three defining dimensions: hue, saturation, and value (HSV). Developers often offer HSV versions of their color settings as a standar d. The following list provides a brief description of these three dimensions: Hue Along the hue dimension, colors on a color wheel (or in a rainbow) vary from red to orange to yellow to green. Saturation Along the saturation dimension, color varies from gray to colorful. A greenish g ray and a vivid green have the same hue (green) but different levels of saturati on. The greenish gray has a low saturation; it is green watered down with gray. Value Along the value dimension, colors vary from dark to light. A dark red and a ligh t red (pink) have the same hue (red) but different values (a dark one and a ligh t one). Shape The shape of interface elements can convey meaning. For example, an organic, rou nded shape appears more inviting than a pointed shape. Introducing different shapes provides visual interest and gains attention. Becau se of today's system implementation techniques, you can incorporate a variety of shapes in icons. Note that selected emphasis can emphasize different icon shape s. A change in the shape of an element can also convey meaning. For instance, a tra sh can might change shape to show that it contains trash or a folder icon might open when an object is dragged over it to indicate that the object can be droppe d there. Size Element size is relative; that is, the terms large and small are relative to what yo u are comparing. Elements in the interface can vary in size depending on differe nt factors, such as meaning. You can also reflect relative status of elements by contrasts in the size of elements, as described in Contrast. When designing your application's interface, keep in mind the various elements t hat will be required. For example, consider the appearance and size of window ic ons, the variety of monitors (including size and resolution), and the proportion of elements as they appear on the screen. Contrast You can use color, shape, and size alone or in concert to express the concept of contrast. Contrast ranges far beyond simple opposites, for example, mild or sev ere, vague or obvious, simple or complex. Use contrast to differentiate elements. For example, two elements can be similar in some aspects and different in others. Their differences become emphasized wh en the elements are presented with contrast. In other words, one element might l

ook small by itself, but it might appear large compared to small elements next t o it. An element can have a similar shape to the elements around it and can have a contrasting color. The following list describes contrast in conjunction with other design elements: Color contrast Some examples of color contrast would be light and dark, brilliant and dull, and warm and cool. Elements can be the same hue, but have contrasting saturation: o ne would be light, the other dark. The opposite is true as well; that is, elemen ts can have the same saturation with different hues (light red, light blue, or l ight green) but still be perceived as a group because they all have the same sat uration and value. Pastel elements in an image otherwise dominated by dark, satu rated colors could be perceived as a group. Shape contrast Contrasting shapes are complicated because you can describe a shape in many ways . Geometric shapes contrast with organic shapes, but two geometric shapes can co ntrast if one is angular but the other is not. Other common cases of shape contr ast include: o Curvilinear or rectilinear o Planar or linear o Mechanical or calligraphic o Symmetrical or asymmetrical o Simple or complex o Abstract or representational o Undistorted or distorted Size contrast Contrast of size is straightforward. That is, size contrasts such as big or smal l forms and long and short lines are familiar concepts and usually easily recogn ized. Anomaly An anomaly is a kind of contrast: the irregular introduced into a regular patter n. If a pattern of any kind is maintained and then broken, an anomaly is created . In other words, there is contrast between anomaly and regularity because regul arity is the observation of, and anomaly is the departure from, a certain kind o f discipline or pattern. The element might be similar in many aspects to other e lements, but an anomaly in one aspect gives it emphasis. Two examples are a set of elements that are all green with one yellow element or a set of rectangle ele ments with one circle. In design, use an anomaly only when necessary. It must have a definite purpose, which can be one of the following: To attract attention To relieve monotony To transform or break up regularity Image Sampling and Quantization The output of most sensors is a continuous voltage waveform whose amplitude and spatial behavior are related to the physical phenomenon being sensed. To create a digital image, we need to convert the continuous sensed data into digital form . This involves two processes: sampling and quantization. The spatial and amplitude digitization of f(x,y) is called: image sampling when it refers to spatial coordinates (x,y) and gray-level quantization when it refers to the amplitude Digital Image Sampling and quantization Digital Image

Important terms for future discussion: Z: set of real integers R: set of real numbers Sampling: partitioning xy plane into a grid

the coordinate of the center of each grid is a pair of elements from the Cartesi an product Z x Z (Z2) Z2 is the set of all ordered pairs of elements (a,b) with a and b being integers from Z. f(x,y) is a digital image if: (x,y) are integers from Z2 and f is a function that assigns a gray-level value (from R) to each distinct pair o f coordinates (x,y) [quantization] Gray levels are usually integers then Z replaces R The digitization process requires decisions about: values for N,M (where N x M: the image array) and N=2n the number of discrete gray levels allowed for each pixel. Usually, in DIP these quantities are integer powers of two: M=2m and G=2k number of gray levels

Another assumption is that the discrete levels are equally spaced between 0 and L-1 in the gray scale. ExampleI Example II

If b is the number of bits required to store a digitized image then: b = N x M x k (if M=N, then b=N2k) How many samples and gray levels are required for a good approximation?

Resolution (the degree of discernible detail) of an image depends on sample numb er and gray level number.

i.e. the more these parameters are increased, the closer the digitized array app roximates the original image. But: storage & processing requirements increase rapidly as a function of N, M, a nd k Different versions (images) of the same object can be generated through: Varying N, M numbers Varying k (number of bits) Varying both

Conclusions: Quality of images increases as N & k increase Sometimes, for fixed N, the quality improved by decreasing k (increased contrast ) For images with large amounts of detail, few gray levels are needed Chapter 2: Two-Dimensional Systems Fourier transform and Fast Fourier Transform The Fourier Transform is an important image processing tool which is used to dec ompose an image into its sine and cosine components. The output of the transform ation represents the image in the Fourier or frequency domain, while the input i mage is the spatial domain equivalent. In the Fourier domain image, each point r epresents a particular frequency contained in the spatial domain image. The Fourier Transform is used in a wide range of applications, such as image ana lysis, image filtering, image reconstruction and image compression. How it works? As we are only concerned with digital images, we will restrict this discussion t o the Discrete Fourier Transform (DFT). The DFT is the sampled Fourier Transform and therefore does not contain all freq uencies forming an image, but only a set of samples which is large enough to ful ly describe the spatial domain image. The number of frequencies corresponds to t he number of pixels in the spatial domain image, i.e. the image in the spatial a nd Fourier domain are of the same size. For a square image of size NN, the two-dimensional DFT is given by: F (k, l) = f(a, b) e j2 ( ) where f(a,b) is the image in the spatial domain and the exponential term is the basis function corresponding to each point F(k,l) in the Fourier space. The equa tion can be interpreted as: the value of each point F(k,l) is obtained by multip lying the spatial image with the corresponding base function and summing the res ult. The basis functions are sine and cosine waves with increasing frequencies, i.e. F(0,0) represents the DC-component of the image which corresponds to the average brightness and F(N-1,N-1) represents the highest frequency. In a similar way, the Fourier image can be re-transformed to the spatial domain. The inverse Fourier transform is given by: F (a, b) = f(k, l) e j2 ( ) Even with these computational savings, the ordinary one-dimensional DFT has N2 c omplexity. This can be reduced to Nlog2N if we employ the Fast Fourier Transform (FFT) to compute the one-dimensional DFTs. This is a significant improvement, i n particular for large images. There are various forms of the FFT and most of th em restrict the size of the input image that may be transformed, often to N= 2n where n is an integer. The mathematical details are well described in the litera ture. The Fourier Transform produces a complex number valued output image which can be displayed with two images, either with the real and imaginary part or with magn itude and phase. In image processing, often only the magnitude of the Fourier Tr ansform is displayed, as it contains most of the information of the geometric st

ructure of the spatial domain image. However, if we want to re-transform the Fou rier image into the correct spatial domain after some processing in the frequenc y domain, we must make sure to preserve both magnitude and phase of the Fourier image. The Fourier domain image has a much greater range than the image in the spatial domain. Hence, to be sufficiently accurate, its values are usually calculated an d stored in float values. Cosine transform In mathematics, the Fourier cosine transform is a special case of the continuous Fourier transform, arising naturally when attempting to transform an even funct ion. In particular, a DCT is a Fourier-related transform similar to the discrete Four ier transform (DFT), but using only real numbers. DCTs are equivalent to DFTs of roughly twice the length, operating on real data with even symmetry (since the Fourier transform of a real and even function is real and even), where in some v ariants the input and/or output data are shifted by half a sample. There are eig ht standard DCT variants, of which four are common. Sine transform In mathematics, the Fourier sine transform is a special case of the continuous F ourier transform, arising naturally when attempting to transform an odd function . Consider the general Fourier transform: In mathematics, the discrete sine transform (DST) is a Fourier-related transform similar to the discrete Fourier transform (DFT), but using a purely real matrix . It is equivalent to the imaginary parts of a DFT of roughly twice the length, operating on real data with odd symmetry (since the Fourier transform of a real and odd function is imaginary and odd), where in some variants the input and/or output data are shifted by half a sample. A related transform is the discrete cosine transform (DCT), which is equivalent to a DFT of real and even functions. See the DCT article for a general discussio n of how the boundary conditions relate the various DCT and DST types. Hadamard transform The Hadamard transform (also known as the Walsh-Hadamard transform, Hadamard-Rad emacher-Walsh transform, Walsh transform, or Walsh-Fourier transform) is an exam ple of a generalized class of Fourier transforms. It is named for the French mat hematician Jacques Solomon Hadamard, the German-American mathematician Hans Adol ph Rademacher, and the American mathematician Joseph Leonard Walsh. It performs an orthogonal, symmetric, involutary, linear operation on 2m real numbers (or co mplex numbers, although the Hadamard matrices themselves are purely real). The Hadamard transform can be regarded as being built out of size-2 discrete Fou rier transforms (DFTs), and is in fact equivalent to a multidimensional DFT of s ize . It decomposes an arbitrary input vector into a superposition of Walsh fun ctions. The Hadamard transform Hm is a matrix, the Hadamard matrix (scaled by a normali zation factor), that transforms 2m real numbers xn into 2m real numbers Xk. We c an define the Hadamard transform in two ways: recursively, or by using the binar y (base-2) representation of the indices n and k. Recursively, we define the Hadamard transform H0 by the identity H0 = 1, and th en define Hm for m > 0 by: is a normalization that is sometimes omitted. Thus, other than this normalizatio n factor, the Hadamard matrices are made up entirely of 1 and 1. Equivalently, we can define the Hadamard matrix by its (k,n)-th entry by writing and , where the kj and nj are the binary digits (0 or 1) of n and k, respecti vely. In this case, we have: . This is exactly the multi-dimensional DFT, normalized to be unitary, if we rega rd the inputs and outputs as multidimensional arrays indexed by the nj and kj, r espectively. Some examples of the Hadamard matrices follow.

(This H1 is precisely the size-2 DFT. It can also be regarded as the Fourier tra nsform on the two-element additive group of Z/(2).)

The rows of the Hadamard matrices are the Walsh functions. Haar Transform Probably the simplest useful energy compression process is the Haar transf orm. In 1-dimension, this transforms a 2-element vector (x(1), x(2))T into (y(1) , y(2))T using =T where T = . Thus y(1) and y(2) are simply the sum and difference of x(1) and x(2) scaled by to preserved energy. Note that T is an orthogonal matrix because its row is orthogonal to each other (dot product is zero) and they are normalized into unit magnitude. Therefore T-1 =TT. Hence we may recover x from y using =TT In 2- dimension x and y become 2 x 2 matrix . We may transform first the col umns of x, by premultiplying by T , and then the rows of the result by postmulti plying by TT. Hence Y= T X TT And invert: X = TT Y T To show more clearly what is happening: X= . Then Y =

Chapter 3 Image enhancement and restoration Point Operation: The simplest image filters are point operations, where the new value of a pixel are only determined by the original value of that single pixel alone. 1. Contrast Stretching Contrast stretching (often called normalization) is a simple image enhancement t echnique that attempts to improve the contrast in an image by `stretching' the r ange of intensity values it contains to span a desired range of values, e.g. the full range of pixel values that the image type concerned allows Before the stretching can be performed it is necessary to specify the upper and lower pixel value limits over which the image is to be normalized. Often these l imits will just be the minimum and maximum pixel values that the image type conc erned allows. For example for 8-bit graylevel images the lower and upper limits might be 0 and 255. Call the lower and the upper limits a and b respectively. The simplest sort of normalization then scans the image to find the lowest and h ighest pixel values currently present in the image. Call these c and d. Then eac h pixel P is scaled using the following function: Pout = (Pin c) + a Values below 0 are set to 0 and values about 255 are set to 255. The problem with this is that a single outlying pixel with either a very high or very low value can severely affect the value of c or d and this could lead to v ery unrepresentative scaling.

2. Clipping and Thresholding Thresholding an image is the process of making all pixels above a certain thresh old level white, others black. When changing the brightness of an image, a constant is added or subtracted from the luminnance of all sample values. This is equivalent to shifting the content s of the histogram left (subtraction) or right (addition). 3. Digital Negative The Negative of an image with gray levels in the range (0, L-1) is obtained by u sing the negative transformation is given by the expression: S = L -1 r. Where r is the gray level value of specified pixel point and S be the equivalent negative value of r. 4. Intensity Level Slicing

Gray level slicing Grey level slicing is the spatial domain equivalent to band-pass filtering. A grey level slicing function can either emphasize a group of intensities and dimi nish all others or it can emphasize a group of grey levels and leave the rest al one. Highlighting a specific range of gray-levels in an image is often desired. Applications include enhancing features such as masses of water, crop regions, o r certain elevation area in satellite imagery. Another application is enhancing flaws in x-ray. There are two main different approaches: highlight a range of intensities while diminishing all others to a consta nt low level. highlight a range of intensities but preserve all others. The fig. illustrates the intensity level slicing process. The left figures show a transformation function that highlights a range [A,B] while diminishing all th e others. The right figures highlights a range [A,B] but preserves all the other s. Bit Plane Slicing Bit plane slicing is new way of looking at an image. In bit plane slicing the im age is considered to be a stack of binary images. The images closes to the botto m are least significant and the images on top are most significant. Instead of highlighting intensity ranges, highlighting the contribution made to the total image appearance by specific bit might be desired. Imagine that the image is composed of eight 1-bit planes, ranging from plane 0 f or least significant bit to plane 7 for the most significant bit. Bit-plane slicing reveals that only the five highest order bits contain visually significant data. Also, note that plane 7, corresponds exactly with an image th reshold at gray-level 128. Fig: A model of bit plane Histogram Processing 1. The Histogram: The histogram shows us how many times a particular grey level appears in an imag e. Image histograms consist of peaks and low plains where peaks represent many pi xels concentrated in a few grey levels while plains account for small number of pixels distributed over a wider range of grey levels. The histogram of a digital image with gray-levels in the range [0, L-1] is a dis crete function p(rk ) = n k /n, where r k is the k th gray-level, n k is the num ber of pixels in the image with that gray-level, n is a total number of pixels i n the image, and k =0, 1, 2, ., L-1. P(r k ) gives an estimate of the probability of occurrence of gray-level r k .

A plot of this function for all values of k provides a global description of the appearance of an image. The following are the histograms of four basic types of images. 2. Histogram Equalization Histogram equalization is a process by which an image which has very low contras t (signified by a grouping of large peaks in a small area on the image's histogr am) can be modified bring out details not previously visible. The histogram is r eally just a probability density function (PDF), and the idea behind histogram e qualization is to get this PDF as close to uniform as possible. Transfer functio n for histogram equalisation is proportional to the cumulative histogram. Suppose that the pixel values are continuous quantities that have been normalize d so that they lie in the interval [0, 1]. The variable r represents the gray-le vel in the image to be enhanced, with r = 0 representing black and r = 1 represe nting white. For any r in the interval [0, 1], consider a transformation of the form s = T(r) . The transformation produces a level s for every pixel value r in the original im age. It is assumed that this transformation function satisfies the following two conditions: (a) T(r) is single-valued and monotonically increasing in the interval 0 < r < 1 ; and (b) 0 < T(r) < 1 for 0 < r < 1. The inverse transformation from s to r is denoted by r = T -1(s) 0 < s < 1 The assumption is that T -1 (s) also satisfies conditions (a) and (b) with respe ct to the variable s. The gray-levels in an image may be viewed as random quanti ties in the interval [0, 1]. In the continuous case, the original and the transf ormed gray-levels can be characterized by their probability density functions p r (r) and p s (s) respectively. Using probability theory, if p r (r) and T(r) are known and T -1(s) satisfies co ndition (a), then p s (s) = p r (r) dr/ds where r = T -1(s) (1) 3. Histogram Specification: As mentioned before, histogram equalization attempts to produce an image with a uniform PDF of pixel intensities from an image with poor contrast. This is usefu l, but not always the best type of transform. There is another technique called histogram specification in which an image's histogram is transformed according t o a desired function. . Spatial operations 1. Image Substraction The technique is basic - subtracting the grey level intensities of one image fro m another in a pixel-wise fashion. Image subtraction is an effective way to watch changes between images. When one image is slightly different from another, the pixel-wise difference between the two images will be exactly what is different between the two images. At the same time that which is similar between the two images will be subtracted out. The difference between two images f(x,y) and h(x,y) is expressed as: g(x,y) = f(x,y) - h(x,y) and involves computing the difference between all pair s of corresponding pixels f and h. Image subtraction has numerous applications i n enhancement and segmentation. A classical application of subtraction for enhancement can be found in the area of medical imaging called mask mode radiography or digital subtraction angiogra phy. In clinical practice, Digital Subtraction Angiography (DSA) is a powerful techni que for the visualization of blood vessels. A sequence of X-ray projection image s is taken to show the passage of a bolus of injected contrast material through one or more vessels of interest.

The example presents, an image taken prior to injection (the mask image), and a n image containing contrasted vessels (the live or contrast image). Then backgro und structures are largely removed by means of subtraction. Note that vessels are visible in DSA image! 2. Image Averaging The technique is also basic - averaging the grey level values for a series of im ages. Image averaging is an effective way to remove noise from a series of noisy images. This does, however, rely on images being corrupted with additive, uncor related, zero-mean noise. The idea of image averaging should follow simply from these constraints. Of course if the noise is zero mean over a large number of sa mples it will average to zero! Lets consider a noisy image g(x,y) formed by the addition of noise n(x,y) to an o riginal image f(x,y); g(x,y) = f(x,y) + n(x,y) Learn more on Noise Types! The assumption is that noise at each pixel is uncorrelated and has zero average value white noise. It is a simple statistical problem to show that if an image g(x ,y) is formed by averaging M different noisy images, then expected value of g ap proaches original image f: 3. Spatial Filtering Spatial filtering is a form of FIR filtering. The filter is actually a mask of w eights arranged in a rectangular pattern, see next Figure . The process is one of sliding the mask along the image and performing a multiply and accumulate ope ration on the pixels covered by the mask. 3 x 3 Spatial Mask Example: Given the mask above is placed on top of nine pixels in an image denote d as z1...z9, what is the filtered value of the pixel z5? [Assume that the pixel s are arranged in such a way that w1 covers z1 ...] Solution: w1z1+w2z2...+w9z9 The following types of filtering operators are possible: Lowpass Filter - rejects all but the low-frequency components of a signal. Highpass Filter - rejects all but the high-frequency components of a signal. Bandpass Filter - rejects all but a particular range of frequency components of a signal. 1. Smoothing Filter a. Lowpass Spatial Filtering: The simplest form of lowpass filtering is a u niform neighborhood averaging. This is accomplished using a spatial mask of all ones. The effect of lowpass filtering is to make edges (high frequency image com ponents) more diffuse, and lower contrast. The simplest form of lowpass filtering is a uniform neighborhood averaging. This is accomplished using a spatial mask of all ones. Since this could lead to valu es which exceed the maximum gray level intensity, the mask is scaled down by the number of coefficients in the filter. Example: 1/25 * 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 When using this filter, a single pixel with a very unrepresentative value can s

ignificantly affect the mean value of all the pixels in its neighborhood. When t he filter neighborhood straddles an edge, the filter will interpolate new values for pixels on the edge and so will blur that edge. This may be a problem if sha rp edges are required in the output. b. Median Filtering: Median filtering is done one neighborhood at a time, h owever the mask that it uses is not a linear function. A median filter replaces the pixel in question with the median of the neighborhood. This is useful in rem oving noise from a single image. The median filter does this by removing large n oise spikes from the signal (image). The median is calculated by first sorting all the pixel values from the surround ing into numerical order and then replacing the pixel being considered with the middle pixel value. 123 125 126 130 140 122 124 126 127 135 118 120 150 125 134 119 115 119 123 133 111 116 110 120 130 Neighborhood values: 115, 119,120,123,124,125,126, 127,150 Median Value: 124 As it can be seen, the central pixel value of 150 is rather unrepresentative of the surrounding pixels and is replaced with the median value The median is a more robust average than the mean and so a single very unreprese ntative pixel in a neighborhood will not affect the median value significantly. Since the median value must actually be the value of one of the pixels in the ne ighborhood, the median filter does not create new unrealistic pixel values when the filter straddles an edge. For this reason the median filter is much better a t preserving sharp edges than the mean filter. 2. Sharpening Filters: a. Highpass Spatial Filtering: The effect that highpass filters have on an image is exactly opposite that of lowpass filters. The primary goal of lowpass f iltering is to highlight detail or to enhance lost detail due to blurring or fau lts in image acquisition. This is achieved using a mask having a positive value in its center location and negative coefficients in the rest. The most basic highpass filter is of the form shown in below: 1/9 * -1 -1 -1 -1 8 -1 -1 -1 -1 Note that the sum of the filter coefficients is zero, this means that when the m ask is over an area of constant contrast the result will be zero or very small. This is another way of saying that the filter diminishes the low frequency terms of the image. The result is that edges are highlighted and the background is di minished. A highpass filtered image may be computed also as the difference between the ori ginal image and a lowpass filtered version of the image. Highpass= Original - Lowpass If the original image is multiplied by an amplification factor w, a high boost o r frecquency enhanced image will result. High boost = (A)* ( Original) -Lowpass = (A-1)* ( Original) + Original -Lowpass =( A-1)*( Original) + Highpass b. Derivative Filters: Spatial domain derivative filters are masks which ap proximate the derivative on a discrete image. These filters, like highpass filte rs, tend to highlight edges and sharpen the image. There are different masks whi ch approximate the derivative like Roberts Cross or Prewitt operators. Averaging is analogous to integration and tends to blur details in an image. Dif ferentiation can be expected to have the opposite effect and thus sharpens an im age. The most common method of differentiation in image processing applications

is the gradient. For a function f(x,y), the gradient of f at coordinate (x,y) is defined as a vec tor: The magnitude of this vector is: This magnitude is the basis for various approaches to image differentiation. Con sider the following neighborhood, The magnitude equation can be approximated at point z5 in a number of ways. Absolute values can be used instead of squares and square root. This will create 2 x 2 Normal cross-gradient operator. Absolute values can be used instead of squares and square root.

This will create 2 x 2 Normal cross-gradient operator:

Band rejected filter The best way to implement a band reject filter is to sum together the outputs of a low pass and high pass filter: Some points are worth noting: The figure shows that although both filters have identical 3 dB points, there is much more rejection of unwanted signals in the stop band with the low pass summ ed with the high pass than there is with the notch filter - with the single exce ption of the center frequency. The performance increase that comes with summing low pass and high pass filter o utputs comes at the expense of an additional opamp - the opamp that performs the summing function. Higher order low pass and high pass filters will improve the performance of the band reject filter. The farther apart the passbands are, the better the performance of the band reje ct filter. Band pass filter Magnification Replication The simplest zooming algorithms use replication to increase the size of an array . This is called nearest-neighbor or zero-order-hold interpolation. For example, in nearest-neighbor interpolation of the array given below, sample a is simply replicated to give rise to a new row and column. This technique, although computationally simple, may result in objectionable and highly visible edge artifacts and unnatural blockiness in the image, especially

at higher magnification factors. Here we increase the size of a list by replication. In[2]:= Out[2]//MatrixForm= Interpolation Functions Interpolate and Decimate change the dimensions by integer factors. Int erpolation of order 2 doubles the size of a 1D array and quadruples the size of an image. A new value is inserted between every pair of elements. Linear filteri ng is used to calculate the new value. Here is an example of using a simple twopoint linear interpolator. Linear interpolation generally gives satisfactory results with a minimum of comp utational overhead and is therefore the default choice. The default filter has t he form where k is the magnification or minification factor. Here we show the result of interpolating a 3 3 array of integers. In[3]:= Out[3]//MatrixForm=

Chapter 4 Image Coding and Compression Coding redundancy Usually in a digital image the number of bits used for the representation of eac h pixel is constant for all the pixels, regardless the value of the pixel and the frequency of occurrence of that value in the image. In general coding redundancy is prese nt in an image if the possible values are coded in such way that they use more c ode symbols that absolutely necessary. Coding redundancy is almost always presen t in images that are coded without taking into consideration the occurrence freq uency of each value. The most popular method for coding redundancy reduction is by employing variable length coding. By variable length coding the average number of bits used per pixel in t he image is reduced and thus the reduction the redundant code. Interpixel redundancy In many coding techniques designed for compression one usually assumes that ther e is no relationship between closely located pixels in the image. Obviously this is not true. If there was no relation between neighboring pixels, the image would fail to represent any type of information. In a matter of fact the relation is often so obvious that the value of one pixel can often be approximated reasonably form the values of the neighboring pixels.Interpixel redundancy refers to the redund ant information included in each pixel. Suppose that a pixel a represents the in formation set A and another neighboring pixel b represents the information set B . The interpixel redundancy might be defined as the intersection between the two sets A and B. Interpixel redundancy occurs in large quantities in natural image s such as portraits and landscapes. Psychovisual redundancy It is known that the human eye does not respond to all visual information with e qual sensitivity. Some information is simply of less relative importance. This i nformation is referred to as psychovisual redundant and can be eliminated withou t introducing any significant difference to the human eye. The reduction of redu ndant visual information has some practical applications in image compression. S

ince the reduction of psychovisual redundancy results in quantitative loss of in formation, this type of reduction is referred to as quantization. The most commo n technique for quantization is the reduction of number of colors used in the im age, thus color quantization. Since some information is lost, the color quantiza tion is an irreversible process. So the compression techniques that used such pr ocess are lossy. It should be noted that even if this method of compression is l ossy, in situations where such compression technique is acceptable the compressi on can be very effective and reduce the size of the image considerably. Pixel Coding 1. Run-length encoding (RLE) is a very simple form of data compression in w hich runs of data (that is, sequences in which the same data value occurs in man y consecutive data elements) are stored as a single data value and count, rather than as the original run. This is most useful on data that contains many such r uns: for example, relatively simple graphic images such as icons, line drawings, and animations. It is not recommended for use with files that don't have many r uns as it could potentially double the file size. For example, consider a screen containing plain black text on a solid white back ground. There will be many long runs of white pixels in the blank space, and man y short runs of black pixels within the text. Let us take a hypothetical single scan line, with B representing a black pixel and W representing white: WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW If we apply the run-length encoding (RLE) data compression algorithm to the abov e hypothetical scan line, we get the following: 12W1B12W3B24W1B14W Interpret this as twelve W's, one B, twelve W's, three B's, etc. While: WBWBWBWBWBWBWB would be: 1W1B1W1B1W1B1W1B1W1B1W1B1W1B The encoding data is quite longer. The run-length code represents the original 67 characters in only 18. Of course, the actual format used for the storage of images is generally binary rather tha n ASCII characters like this, but the principle remains the same. Even binary da ta files can be compressed with this method; file format specifications often di ctate repeated bytes in files as padding space. However, newer compression metho ds such as DEFLATE often use LZ77-based algorithms, a generalization of run-leng th encoding that can take advantage of runs of strings of characters (such as BW WBWWBWWBWW). 2. Bit Plane Coding Let I be an image where every pixel value is n-bit long Express every pixel in binary using n bits Form out of I n binary matrices (called bitplanes), where the i-th matrix consis ts of the i-th bits of the pixels of I. Example: Let I be the following 2x2 image where the pixels are 3 bits long 101 110 111 011 The corresponding 3 bitplanes are: 1 0 1 1 0 1 1 1 1 1 1 0 Bitplane coding: code the bitplanes separately, using RLE (flatten each plane ro w-wise into a 1D array), Golomb coding, or any other lossless compression techni que. Predictive Encoding Predictive coding operates by reducing the interpixel redundancy of near located

pixels. This type of coding extracts and stores only the new information includ ed in the new pixel. This information is defined by the difference of the pixel and the predicted value of that pixel. The prediction is based on the known valu es of the closely located pixels. The prediction error can be defined as : Where f(x,y) is the actual value of a pixel at location (x,y) and g(x,y) is the predicted value of that pixel. The obtained values are then variable length coded to form the compression stream. It is obvious that predictive coding assumes that the va lue of each pixel can be predicted from the past values. The greatest compressio n ratios are obtained by a predictor function with a constant output. For exampl e if e(x,y)=c for all pixels in all given images. The image can be compressed to only three values describing the size of the image (two integers), and c. It sh ould be noted that there is no optimal predictive function for all given images. In practical cases a decoder might have access to several different predictive function, and chooses one using a heuristic method. In such cases the choice of the function must then be included in the compressed data stream. Predictive cod ing is particularly effective in natural images where there is a great quantity of interpixel redundancy present. Interframe Coding In video compression, the coding of the differences between frames. Interframe c oding often provides substantial compression because in many motion sequences, o nly a small percentage of the pixels are actually different from one frame to an other. However, it depends entirely on the content. With interframe coding, a video sequence is made up of keyframes that contain th e entire image. In between the keyframes are delta frames, which are encoded wit h only the incremental differences. Depending on the compression method, a new k eyframe is generated based on a set number of frames or when a certain percentag e of pixels in the material have changed. Difference Coding This type of coding represents data streams by storing the only the first value in the data stream, all other values are represented by the difference between the value and the preceding value. For example the stream [34 23 45 76 8] will be represented by [34 11-22 -31 68]. Although this does not seem to reduce the size of data, i n most cases it does produce a sequence of values where some values occur more o ften than other. This is mainly due to the presence of interpixel redundancy. Wh en such data stream is obtained the stream can be encoded using compression meth ods such as variable length coding. Variable Length Coding One approach to reduction of coding redundancy is to assign a shorter code to th e most frequently occurred symbols in the data stream, and a longer code to the least f requent ones. This concept is probably best illustrated through an example. Imag ine a data stream formed from the integer set of 1 to 3. If the frequency of the occurrence of each symbol is not taken into consideration the data stream must be represented by at least 2 bits per symbol. Suppose the data stream to be compressed is [1 1 2 3 1 1]. In a fixed length code approach the data stream will be represented by [00 00 0110 00 00]. By assigning the following variable length code to each symbol: he stream can be represented by [1 1 00 01 1 1] causing a reduction of number of bits per symbol to 1.67, and achieving a compression ratio of 3:2. There are many methods for assigning a variable length code assignment, all of which are based on the Huffman Table described in the next section. Huffman coding One of the most popular and by all means most effective methods of reducing codi ng redundancy by variable length coding is the utilisation of the Huffman table, de

veloped by D. A. Huffman in 1952. This method is considered to be statistical an d produces lossless compression. The Huffman method produces the smallest numbe r of code symbol per source symbol. The Huffman's approach can be divided into two main steps : 1. Source reduction. 2. Code assignment. The input parameter for the Huffman's method is a list of all source symbols and the probability of their occurrence. In the first step the two source symbols w ith the lowest probabilities are combined into one symbol that replaces them in the next source reduction. The frequency of the new symbol is the sum of the fre quency of the two component symbols. This process is then repeated until only tw o symbols remain. The next step in the process is to assign an optimal code to each symbol. This s tarts with the two compound symbols that were obtained in the last reduction of step 1. It is clear that the optimal code to be assigned to two symbols is 0 and 1. Now each compound symbol is decomposed into its two component symbols, start ing with the symbol with the smallest frequency. Each of these new symbols inher its the code assigned to their compound. This process is repeated until no compo und symbols remain in the symbol list. Note that after each symbol decomposition the list is sorted in descending order. The Huffman's process is illustrated by the examples shown in fig 1. Although the process produces optimal variable code for each symbol, there are ( naturally!) some drawbacks. First of all, the process performs L-2 source reduct ions and L-2 code assignments for the general case of L symbols. This might not be desirable in full color images with 24 bits per pixel(16777216 colors). Secon dly the compressed data can not be decoded without the optimal code. So either t he whole generated table or the table for the frequency of occurrence of each sy mbol must be included with the compressed data. In the second case the whole tab le must be regenerated, thus increasing the decompression time. To partially ove rcome these problems several approaches to the Huffman coding are developed. The most usual is the truncated Huffman, in which only the first L most frequent sy mbols are Huffman coded and the remaining symbols are constant coded. Arithmetic Coding Arithmetic coders produce near-optimal output for a given set of symbols and pro babilities (the optimal value is log2P bits for each symbol of probability P, see source coding theorem). Compression algorithms that use arithmetic coding start by determining a model of the data basically a prediction of what patterns will be found in the symbols of the message. The more accurate this prediction is, t he closer to optimality the output will be. Example: a simple, static model for describing the output of a particular monito ring instrument over time might be: 60% chance of symbol NEUTRAL 20% chance of symbol POSITIVE 10% chance of symbol NEGATIVE 10% chance of symbol END-OF-DATA. (The presence of this symbol means that the st ream will be 'internally terminated', as is fairly common in data compression; t he first and only time this symbol appears in the data stream, the decoder will know that the entire stream has been decoded.) Models can handle other alphabets than the simple four-symbol set chosen for thi s example, of course. More sophisticated models are also possible: higher-order modeling changes its estimation of the current probability of a symbol based on the symbols that precede it (the context), so that in a model for English text, for example, the percentage chance of "u" would be much higher when it follows a "Q" or a "q". Models can even be adaptive, so that they continuously change the ir prediction of the data based on what the stream actually contains. The decode r must have the same model as the encoder.

A simplified example Now we discuss how a sequence of symbols is encoded. As a motivating example, co nsider the following simple problem: we have a sequence of three symbols, A, B, and C, each equally likely to occur. Simple block encoding would use 2 bits per symbol, which is wasteful: one of the bit combinations is never used. Instead, we represent the sequence as a rational number between 0 and 2 in base 3, where each digit represents a symbol. For example, the sequence "ABBCAB" coul d become 0.0112013. We then encode this ternary number using a fixed-point binar y number of sufficient precision to recover it, such as 0.0010110012 this is onl y 9 bits, 25% smaller than the naive block encoding. This is feasible for long s equences because there are efficient, in-place algorithms for converting the bas e of arbitrarily precise numbers. Finally, knowing the original string had length 6, we can simply convert back to base 3, round to 6 digits, and recover the string. LZW Coding (Lempel Ziv Welch) The Lempel-Ziv-Welch algorithm uses a dynamically generated dictionary and and e ncodes strings by a reference to the dictionary. It is intended that the diction ary reference should be shorter than the string it replaces. As you will see, LZ W achieves it's goal for all strings larger than 1. The LZW dictionary is not an external dictionary that lists all known symbol str ings. Instead, the dictionary is initialized with an entry for every possible by te. Other strings are added as they are built from the input stream. The code wo rd for a string is simply the next available value at the time it is added to th e dictionary. Based on the discussion above, encoding input consists of the following steps: Step 1. Initialize dictionary to contain one entry for each byte. Initialize the encoded string with the first byte of the input stream. Step 2. Read the next byte from the input stream. Step 3. If the byte is an EOF goto step 6. Step 4. If concatenating the byte to the encoded string produces a string that i s in the dictionary: concatenate the the byte to the encoded string go to step 2 Step 5. If concatenating the byte to the encoded string produces a string that i s not in the dictionary: add the new sting to the dictionary write the code for the encoded string to the output stream set the encoded string equal to the new byte go to step 2 Step 6. Write out code for encoded string and exit. Example 1: The string "this_is_his_thing" is encoded as follows: New Byte Encoded String New Code Code Output t t None None h h 256 (th) t i i 257 (hi) h s s 258 (is) i _ _ 259 (s_) s i i 260 (_i) _ s is None None _ _ 261 (is_) 258 (is) h h 262 (_h) _ i hi None None s s 263 (his) 257 (hi) _ s_ None None t t 264 (s_t) 259 (s_) h th None None i i 265 (thi) 256 (th) n n 266 (in) i g g 267 (ng) n None None None g

In the example above, a 17 character string is represented by 13 code words. Any actual compression that would occur would be based on the size of the code word s. In this example code words could be as short as 9 bits. Typically code words are 12 to 16 bits long. Of course the typical input stream is also longer than 1 7 characters. example 2: Decode the stream 't' 'h' 'i' 's' '_' 258 '_' 257 259 256 'i' 'n' 'g' produced by the previous example Input Code Encoded String Added Code String Output t t None t h h 256 (th) h i i 257 (hi) i s s 258 (is) s _ _ 259 (s_) _ 258 is 260 (_i) is _ _ 261 (is_) _ 257 hi 262 (_h) hi 259 s_ 263 (his) s_ 256 th 264 (s_t) th i i 265 (thi) i n n 266 (in) n g g 267 (ng) g The decode string matches the original encoded string, so I must have done somet hing right. One of my favorite things about LZW is that the decoder doesn't requ ire any additional information from the encoder. There's no need to include extr a information commonly required by statistical algorithms like Huffman Code and Arithmetic Code. So the space savings is never offset by extra data cost. Chapter 5 Introduction to pattern recognition and images Human Perception Humans have developed highly sophisticated skills for sensing their environment and taking actions according to what they observe, e.g., recognizing a face understanding spoken words reading handwriting distinguishing fresh food from its smell We would like to give similar capabilities to machines. Introduction to pattern recognition What is pattern? A pattern is an entity, vaguely defined, that could be given a name, e.g., fingerprint image handwritten word human face speech signal DNA sequence . . . What is pattern recognition? Pattern recognition is the study of how machines can observe the environment learn to distinguish patterns of interest make sound and reasonable decisions about the categories of the patterns Pattern recognition techniques are concerned with the theory and algorithms of p utting abstract objects, e.g., measurements made on physical objects, into categ ories. Typically the categories are assumed to be known in advance, although the re are techniques to learn the categories (clustering). Methods of pattern recog nition are useful in many applications such as information retrieval, data minin g, document image analysis and recognition, computational linguistics, forensics

, biometrics and bioinformatics. Much of the topics concern statistical classification methods. They include gene rative methods such as those based on Bayes decision theory and related techniqu es of parameter estimation and density estimation. Next come discriminative meth ods such as nearest-neighbor classification, support vector machines. Artificial neural networks, classifier combination and clustering are other major componen ts of pattern recognition. Applications of pattern recognition techniques are demonstrated by projects in f ingerprint recognition, handwriting recognition and handwriting verification. 1. etc. 2. 3. 4. 5. 6. Data acquisition and sensing: Measurements of physical variables Important issues: bandwidth, resolution, sensitivity, distortion, SNR, latency, Pre-processing: Removal of noise in data Isolation of patterns of interest from the background Feature extraction: Finding a new representation in terms of features Model learning and estimation: Learning a mapping between features and pattern groups and categories Classification: Using features and learned models to assign a pattern to a category Post-processing: Evaluation of confidence in decisions Exploitation of context to improve performance Combination of experts

1. 2. o o 3. ctural,

Data collection: Collecting training and testing data How can we know when we have adequately large and representative set of samples? Feature selection: Domain dependence and prior information Computational cost and feasibility Discriminative features Similar values for similar patterns Different values for different patterns Invariant features with respect to translation, rotation and scale Robust features with respect to occlusion, distortion, deformation, and variations in environment Model selection: Domain dependence and prior information Definition of design criteria Parametric vs. non-parametric models Handling of missing features Computational complexity Types of models: templates, decision-theoretic or statistical, syntactic or stru neural, and hybrid How can we know how close we are to the true model underlying the patterns?

4. Training: How can we learn the rule from data? Supervised learning: a teacher provides a category label or cost for each patter n in the training set Unsupervised learning: the system forms clusters or natural groupings of the inp

ut patterns Reinforcement learning: no desired category is given but the teacher provides fe edback to the system such as the decision is right or wrong 5. Evaluation: How can we estimate the performance with training samples? How can we predict the performance with future data? Problems of over fitting and generalization Chapter 6 Recognition and classification Image Analysis Techniques

Classification Statistical classification is a procedure in which individual items are placed i nto groups based on quantitative information on one or more characteristics inhe rent in the items (referred to as traits, variables, characters, etc) and based on a training set of previously labeled items. Formally, the problem can be stated as follows: given training data produce a c lassifier which maps an object to its classification label . For example, if the problem is filtering spam, then is some representation of an email and y is either "Spam" or "Non-Spam". Statistical classification algorithms are typically used in pattern recognition systems. Note: in community ecology, the term "classification" is synonymous with what is commonly known (in machine learning) as clustering. See that article for more i nformation about purely unsupervised techniques. The second problem is to consider classification as an estimation problem, where the goal is to estimate a function of the form

where the feature vector input is , and the function f is typically parameteriz ed by some parameters . In the Bayesian approach to this problem, instead of ch oosing a single parameter vector , the result is integrated over all possible t hetas, with the thetas weighted by how likely they are given the training data D : The third problem is related to the second, but the problem is to estimate the c lass-conditional probabilities and then use Bayes' rule to produce the class pr obability as in the second problem. Some Model of classifier 1. Linear Classifier In the field of machine learning, the goal of classification is to group items t hat have similar feature values, into groups. A linear classifier achieves this by making a classification decision based on the value of the linear combination of the features.

If the input feature vector to the classifier is a real vector , then the output score is Where is a real vector of weights and f is a function that converts the dot pr oduct of the two vectors into the desired output. The weight vector is learned from a set of labeled training samples. Often f is a simple function that maps all values above a certain threshold to the first class and all other values to the second class. A more complex f might give the probability that an item belon gs to a certain class. For a two-class classification problem, one can visualize the operation of a lin ear classifier as splitting a high-dimensional input space with a hyperplane: al l points on one side of the hyperplane are classified as "yes", while the others are classified as "no". A linear classifier is often used in situations where the speed of classificatio n is an issue, since it is often the fastest classifier, especially when is spa rse. However, decision trees can be faster. Also, linear classifiers often work very well when the number of dimensions in is large, as in document classificat ion, where each element in is typically the number of counts of a word in a doc ument (see document-term matrix). In such cases, the classifier should be well-r egularized. 2. A quadratic classifier is used in machine learning to separate measureme nts of two or more classes of objects or events by a quadric surface. It is a mo re general version of the linear classifier. Quadratic discriminant analysis (QD A) is closely related to linear discriminant analysis (LDA), where it is assumed that there are only two classes of points (so ), and that the measurements are normally distributed. Unlike LDA however, in QDA there is no assumption that th e covariance of each of the classes is identical. When the assumption is true, t he best possible test for the hypothesis that a given measurement is from a give n class is the likelihood ratio test. Suppose the means of each class are known to be y = 0,y = 1 and the covariances y = 0,y = 1. Then the likelihood ratio will be given by Likelihood ratio = For some threshold t. After some rearrangement, it can be shown that the resulti ng separating surface between the classes is a quadratic. Feature Extraction In pattern recognition and in image processing, Feature extraction is a special form of dimensionality reduction. When the input data to an algorithm is too large to be processed and it is suspe cted to be notoriously redundant (much data, but not much information) then the input data will be transformed into a reduced representation set of features (al so named features vector). Transforming the input data into the set of features is called features extraction. If the features extracted are carefully chosen it is expected that the features set will extract the relevant information from th e input data in order to perform the desired task using this reduced representat ion instead of the full size input. Feature extraction involves simplifying the amount of resources required to desc ribe a large set of data accurately. When performing analysis of complex data on e of the major problems stems from the number of variables involved. Analysis wi th a large number of variables generally requires a large amount of memory and c omputation power or a classification algorithm which overfits the training sampl e and generalizes poorly to new samples. Feature extraction is a general term fo r methods of constructing combinations of the variables to get around these prob lems while still describing the data with sufficient accuracy. Best results are achieved when an expert constructs a set of application-depende nt features. Nevertheless, if no such expert knowledge is available general dime nsionality reduction techniques may help. These include: Principal components analysis Semi definite embedding Multifactor dimensionality reduction

Nonlinear dimensionality reduction Isomap Kernel PCA Latent semantic analysis Partial least squares

1. Principal component analysis (PCA) is a vector space transform often use d to reduce multidimensional data sets to lower dimensions for analysis. Dependi ng on the field of application, it is also named the discrete Karhunen-Love trans form (KLT), the Hotelling transform or proper orthogonal decomposition (POD). PCA is the simplest and most useful of the true eigenvector-based multivariate a nalyses, because its operation is to reveal the internal structure of data in an unbiased way. If a multivariate dataset is visualized as a set of coordinates i n a high-dimensional data space (1 axis per variable), PCA supplies the user wit h a 2D picture, a shadow of this object when viewed from its most informative vi ewpoint. This dimensionally-reduced image of the data is the ordination diagram of the 1st two principal axes of the data, which when combined with metadata (su ch as gender, location etc) can rapidly reveal the main factors underlying the s tructure of data. PCA is especially useful for taming collinear data; where mult iple variables are co-correlated (which is routine in multivariate data) regress ion-based techniques are unreliable and can give misleading outputs, whereas PCA will combine all collinear data into a small number of independent (orthogonal) axes, which can then safely be used for further analyses. 2. In computer science, semidefinite embedding (SDE) or maximum variance un folding (MVU) is an algorithm in that uses semidefinite programming to perform n on-linear dimensionality reduction of high-dimensional vectorial input data. Non-linear dimensionality reduction algorithms attempt to map high-dimensional d ata onto a low-dimensional Euclidean vector space. Maximum Variance Unfolding is a member of the manifold learning family, which also include algorithms such as isomap and locally linear embedding. In manifold learning, the input data is as sumed to be sampled from a low dimensional manifold that is embedded inside of a higher dimensional vector space. The main intuition behind MVU is to exploit th e local linearity of manifolds and create a mapping that preserves local neighbo rhoods at every point of the underlying manifold. MVU creates a mapping from the high dimensional input vectors to some low dimens ional Euclidean vector space in the following three steps: 1. A neighborhood graph is created. Each input is connected with its k-near est input vectors (according to Euclidean distance metric) and all k-nearest nei ghbors are connected with each other. If the data is sampled well enough, the re sulting graph is a discrete approximation of the underlying manifold. 2. The neighborhood graph is "unfolded" with the help of semidefinite progr amming. Instead of learning the output vectors directly, the semidefinite progra mming aims to find an inner product matrix that maximizes the pair wise distance s between any two inputs that are not connected in the neighborhood graph. 3. The low-dimensional embedding is finally obtained by application of mult idimensional scaling on the learned inner product matrix. The steps of applying semidefinite programming followed by a linear dimensionali ty reduction step to recover a low-dimensional embedding into a Euclidean space were first proposed by Linial, London, and Rabinovich in a now classical article (see below). 3. Multifactor dimensionality reduction (MDR) is a data mining approach for detecting and characterizing combinations of attributes or independent variable s that interact to influence a dependent or class variable. MDR was designed spe cifically to identify interactions among discrete variables that influence a bin ary outcome and is considered a nonparametric alternative to traditional statist ical methods such as logistic regression. 4. Perhaps the principal method amongst those that provide a mapping from t he high dimensional space to the embedded space is kernel PCA. This method provi des non-linear principal components analysis (PCA) by applying the kernel trick.

Kernel PCA first (implicitly) constructs a higher dimensional space, in which t here are a large number of linear relations between the dimensions. Subsequently , the low-dimensional data representation is obtained by applying traditional PC A. 5. In statistics, isomap is one of several widely used low-dimensional embe dding methods, where geodesic distances on a weighted graph are incorporated wit h the classical scaling (metric multidimensional scaling). It is used for comput ing a quasi-isometric, low-dimensional embedding of a set of high-dimensional da ta points. It is one of representative isometric mapping methods, which extends metric multidimensional scaling (MDS), considering Dijkstra's geodesic distances (shortest paths) on a weighted graph, instead of Euclidean distances. The algor ithm provides a simple method for estimating the intrinsic geometry of a data ma nifold based on a rough estimate of each data points neighbors on the manifold. I t is highly efficient and generally applicable to a broad range of data sources and dimensionalities. 6. Kernel principal component analysis (kernel PCA) is an extension of prin cipal component analysis (PCA) using techniques of kernel methods. Using a kerne l, the originally linear operations of PCA are done in a reproducing kernel Hilb ert space with a non-linear mapping. 7. Latent semantic analysis (LSA) is a technique in natural language proces sing, in particular in vectorial semantics, of analyzing relationships between a set of documents and the terms they contain by producing a set of concepts rela ted to the documents and terms. 8. In statistics, the method of partial least squares regression (PLS-regre ssion) bears some relation to principal component analysis; instead of finding t he hyperplanes of minimum variance, it finds a linear model describing some pred icted variables in terms of other observable variables. Chapter 10 1.1 What is a Neural Network? An Artificial Neural Network (ANN) is an information processing paradigm that is inspired by the way biological nervous systems, such as the brain, process info rmation. The key element of this paradigm is the novel structure of the informat ion processing system. It is composed of a large number of highly interconnected processing elements (neurons) working in unison to solve specific problems. ANN s, like people, learn by example. An ANN is configured for a specific applicatio n, such as pattern recognition or data classification, through a learning proces s. Learning in biological systems involves adjustments to the synaptic connectio ns that exist between the neurons. This is true of ANNs as well. 1.2 Why use neural networks? Neural networks, with their remarkable ability to derive meaning from complicate d or imprecise data, can be used to extract patterns and detect trends that are too complex to be noticed by either humans or other computer techniques. A train ed neural network can be thought of as an "expert" in the category of informatio n it has been given to analyse. This expert can then be used to provide projecti ons given new situations of interest and answer "what if" questions. Other advantages include: 1. Adaptive learning: An ability to learn how to do tasks based on the data given for training or initial experience. 2. Self-Organization: An ANN can create its own organization or representat ion of the information it receives during learning time. 3. Real Time Operation: ANN computations may be carried out in parallel, an d special hardware devices are being designed and manufactured which take advant age of this capability. 4. Fault Tolerance via Redundant Information Coding: Partial destruction of a network leads to the corresponding degradation of performance. However, some network capabilities may be retained even with major network damage. Human and Artificial Neurons - investigating the similarities

1.3 How the Human Brain Learns? Much is still unknown about how the brain trains itself to process information, so theories abound. In the human brain, a typical neuron collects signals from o thers through a host of fine structures called dendrites. The neuron sends out s pikes of electrical activity through a long, thin stand known as an axon, which splits into thousands of branches. At the end of each branch, a structure called a synapse converts the activity from the axon into electrical effects that inhi bit or excite activity from the axon into electrical effects that inhibit or exc ite activity in the connected neurons. When a neuron receives excitatory input t hat is sufficiently large compared with its inhibitory input, it sends a spike o f electrical activity down its axon. Learning occurs by changing the effectivene ss of the synapses so that the influence of one neuron on another changes. 1.4 From Human Neurons to Artificial Neurons We conduct these neural networks by first trying to deduce the essential feature s of neurons and their interconnections. We then typically program a computer to simulate these features. However because our knowledge of neurons is incomplete and our computing power is limited, our models are necessarily gross idealizati ons of real networks of neurons. 1.5 A simple neuron An artificial neuron is a device with many inputs and one output. The neuron has two modes of operation; the training mode and the using mode. In the training m ode, the neuron can be trained to fire (or not), for particular input patterns. In the using mode, when a taught input pattern is detected at the input, its ass ociated output becomes the current output. If the input pattern does not belong in the taught list of input patterns, the firing rule is used to determine wheth er to fire or not. 1.6 Pattern Recognition - an example An important application of neural networks is pattern recognition. Pattern reco gnition can be implemented by using a feed-forward (figure 1) neural network tha t has been trained accordingly. During training, the network is trained to assoc iate outputs with input patterns. When the network is used, it identifies the in put pattern and tries to output the associated output pattern. The power of neur al networks comes to life when a pattern that has no output associated with it, is given as an input. In this case, the network gives the output that correspond s to a taught input pattern that is least different from the given pattern. Figure 1 For example: The network of figure 1 is trained to recognize the patterns T and H. The associ ated patterns are all black and all white respectively as shown below. If we represent black squares with 0 and white squares les for the 3 neurons after generalization are; X11: 0 0 0 0 1 X12: 0 0 1 1 0 X13: 0 1 0 1 0 OUT: 0 0 1 1 0 Top neuron X21: 0 0 0 0 1 X22: 0 0 1 1 0 X23: 0 1 0 1 0 OUT: 1 0/1 1 0/1 0/1 Middle neuron X21: 0 0 0 0 1 X22: 0 0 1 1 0 X23: 0 1 0 1 0 OUT: 1 0 1 1 0 with 1 then the truth tab 1 0 1 0 1 0 1 0 1 0 1 0 1 1 0 1 1 1 0 0/1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 0

Bottom neuron From the tables it can be seen the following associations can be extracted: In this case, it is obvious that the output should be all blacks since the input pattern is almost the same as the 'T' pattern. Here also, it is obvious that the output should be all whites since the input pa ttern is almost the same as the 'H' pattern. Here, the top row is 2 errors away from the T and 3 from an H. So the top output is black. The middle row is 1 error away from both T and H so the output is ran dom. The bottom row is 1 error away from T and 2 away from H. Therefore the outp ut is black. The total output of the network is still in favor of the T shape. Architecture of Neural network 1.7 Feed-forward networks Feed-forward ANNs (figure 1) allow signals to travel one way only; from input to output. There is no feedback (loops) i.e. the output of any layer does not affe ct that same layer. Feed-forward ANNs tend to be straight forward networks that associate inputs with outputs. They are extensively used in pattern recognition. This type of organization is also referred to as bottom-up or top-down. Figure 2 1.8 Feedback networks Feedback networks (figure 1) can have signals traveling in both directions by in troducing loops in the network. Feedback networks are very powerful and can get extremely complicated. Feedback networks are dynamic; their 'state' is changing continuously until they reach an equilibrium point. They remain at the equilibri um point until the input changes and a new equilibrium needs to be found. Feedba ck architectures are also referred to as interactive or recurrent, although the latter term is often used to denote feedback connections in single-layer organiz ations. 1.9 Network layers The commonest type of artificial neural network consists of three groups, or lay ers, of units: a layer of "input" units is connected to a layer of "hidden" unit s, which is connected to a layer of "output" units. (see Figure 2) The activity of the input units represents the raw information that is fed into the network. The activity of each hidden unit is determined by the activities of the input u nits and the weights on the connections between the input and the hidden units. The behavior of the output units depends on the activity of the hidden units an d the weights between the hidden and output units. This simple type of network is interesting because the hidden units are free to construct their own representations of the input. The weights between the input and hidden units determine when each hidden unit is active, and so by modifying these weights, a hidden unit can choose what it represents. We also distinguish single-layer and multi-layer architectures. The single-layer organization, in which all units are connected to one another, constitutes the most general case and is of more potential computational power than hierarchical ly structured multi-layer organizations. In multi-layer networks, units are ofte n numbered by layer, instead of following a global numbering. 1.10 Hopfield net. A Hopfield net is a form of recurrent artificial neural network invented by John Hopfield. Hopfield nets serve as content-addressable memory systems with binary threshold units. They are guaranteed to converge to a local minimum, but conver gence to one of the stored patterns is not guaranteed. The units in Hopfield nets are binary threshold units, i.e. the units only take on two different values for their states and the value is determined by whether or not the units' input exceeds their threshold. Hopfield nets can either have u nits that take on values of 1 or -1, or units that take on values of 1 or 0. So,

the two possible definitions for unit i's activation, ai, are: (1) (2) Where: wij is the strength of the connection weight from unit j to unit i (the weight o f the connection). sj is the state of unit j. i is the threshold of unit i. Training a Hopfield net involves lowering the energy of states that the net shou ld "remember". This allows the net to serve as a content addressable memory syst em, that is to say, the network will converge to a "remembered" state if it is g iven only part of the state. The net can be used to recover from a distorted inp ut the trained state that is most similar to that input. This is called associat ive memory because it recovers memories on the basis of similarity. For example, if we train a Hopfield net with five units so that the state (1, 0, 1, 0, 1) is an energy minimum, and we give the network the state (1, 0, 0, 0, 1) it will co nverge to (1, 0, 1, 0, 1). Thus, the network is properly trained when the energy of states which the network should remember are local minima. 1.11 Perceptrons The most influential work on neural nets in the 60's went under the heading of ' perceptrons' a term coined by Frank Rosenblatt. The perceptron (figure 3) turns out to be an MCP model ( neuron with weighted inputs ) with some additional, fix ed, pre--processing. Units labeled A1, A2, Aj , Ap are called association units and their task is to extract specific, localized featured from the input images. Perceptrons mimic the basic idea behind the mammalian visual system. They were mainly used in pattern recognition even though their capabilities extended a lot more. Figure 3 In 1969 Minsky and Papert wrote a book in which they described the limitations o f single layer Perceptrons. The impact that the book had was tremendous and caus ed a lot of neural network researchers to loose their interest. The book was ver y well written and showed mathematically that single layer perceptrons could not do some basic pattern recognition operations like determining the parity of a s hape or determining whether a shape is connected or not. What they did not reali zed, until the 80's, is that given the appropriate training, multilevel perceptr ons can do these operations. 1.12 Hamming Net Although the network works in an analog way, it processes binary signals on the other hand, those signals can be noisy, and have continuous values, not only zeros and ones. In the picture presented above we can see the Hamming Network. It can be divided into two basic sections: input layer a layer built with neurons, all of those neurons are connected to al l of the network inputs; output layer which is called MaxNet layer; the output of each neuron of this lay er is connected to input of each neuron of this layer, besides every neuron of t his layer is connected to exactly one neuron of the input layer (as in the pictu re above). Its easy to see, that both layers have the same number of neurons. How does the Hamming Network work? Input layer neurons are programmed to identify a fixed number of patterns; the n umber of neurons in this layer matches the number of those patterns (M neurons M patterns). Outputs of these neurons realize the function, which measures the simi larity of an input signal to a given pattern. The output layer is responsible for choosing the pattern, which is the most simi lar to testing signal. In this layer, the neuron with the strongest response sto ps the other neurons responses (it usually happens after few calculating cycles,

and there must be a 0 on x(i) inputs during these cycles). There should be the 1 of M code used on the output, which points at the network an swer (1 of M patterns is recognized). To achieve this, proper transfer function has to be used in our case the best function was