Você está na página 1de 35

Smart Cameras: A Review

Yu Shi*, Serge Lichman

Interfaces, Machines And Graphical ENvironments (IMAGEN) National Information and Communications Technology Australia (NICTA) Australian Technology Park, Bay 15 Locomotive Workshop Eveleigh, NSW 1430, Australia *Corresponding Author. Tel.: +61 2 8374 5565; Fax: +61 2 8374 5527. E-mail Addresses: yu.shi@nicta.com.au, serge.lichman@nicta.com.au

Abstract Smart cameras are cameras that can perform tasks far beyond simply taking photos and recording videos. Thanks to the purposely built-in intelligent image processing and pattern recognition algorithms, smart cameras can detect motion, measure objects, read vehicle number plates, and even recognize human behaviors. They are essential components to build active and automated control systems for many applications, and they will play significant role in our daily life in the near future. This paper aims to provide a first comprehensive review of smart camera technologies and applications. Here, we analyse the reasons behind the recent rapid growth of the smart cameras, discuss different categories of them and review their system architectures. We also examine their intelligent algorithms, features and applications. Finally we conclude with a discussion on design issues, challenges and future technological directions. Keywords: smart cameras, pattern recognition, machine vision, computer vision, video surveillance, embedded systems.

What is a smart camera? Different researchers and camera manufacturers offer different definitions.

There does not seem to be a well-established and agreed-upon definition in either the video surveillance or machine vision industries, probably the two most active and advanced applications for smart cameras at present. For the purpose of this paper, we define a smart camera as a vision system in which the primary function is to produce a high-level understanding of the imaged scene and generate applicationspecific data to be used in an autonomous and intelligent system. The idea of smart cameras is to convert data to knowledge by processing information where it becomes available, and transmit only results that are at a higher level of abstraction. A smart camera is smart because it performs application specific

information processing (ASIP), the goal of which is usually not to provide better quality images for human viewing but to understand and describe what is happening in the images for the purpose of better decision-making in an automated control system. For example, a motion-triggered surveillance camera captures video of a scene, detects motion in the region of interest, and raises an alarm when the detected motion satisfies certain criteria. In this case, the ASIP is motion detection and alarm generation. The important differences between a smart camera and normal cameras, such as consumer digital cameras and camcorders, lie in two aspects. The first is in camera system architecture. A smart camera usually has a special image processing unit containing one or more high performance microprocessors to run intelligent ASIP algorithms, in which the primary objective is not to improve images quality but to extract information and knowledge from images. The image processing hardware in normal cameras is usually simpler and less powerful with the main aim being to achieve good visual image quality. The other main difference is in the primary camera output. A smart camera outputs either the features extracted from the captured images or a high-level description of the scene, which is fed into an automated control system, while for normal cameras the primary output is the processed version of the captured images for human consumption. For this reason, normal video cameras have large output bandwidth requirements (in direct proportion to the resolution of the image sensor used), while smart camera can have very low data bandwidth requirements at the output (it can be just one bit in the simplest case, with 1 meaning there is motion and 0 meaning there is no motion, for example). These differences are illustrated in figure 1.

image sensing

image processing

image/video output generation and communication (a)

video to TV display or digital display for human consumption

image sensing


app. specific data generation and communication (b)

meta data to an automated control system for decision making

Figure 1: Differences between a normal camera (a) and a smart camera (b).

Smart cameras can exist where a camera is not expected to be. A good example is the ubiquitous optical mouse for PC. Most optical mice contain a miniature digital video camera inside the mouse casing. They work by shining a bright light onto the surface below, then using a camera to take up to 1 500 pictures a second of that surface. An intelligent image processing circuit inside the mouse performs image enhancement and calculates the mouse motion based on image difference between successive frames. This difference is then used to displace the mouse cursor on the screen. The optical mouse is a good example of a smart camera in three respects: firstly its a stand-alone camera with camera and processing in a single embedded device; secondly the camera is used not to take pictures or video for human consumption, but to produce a feature vector (motion vector of in x and y directions) to represent the object (the mouse in this case) displacement; thirdly it shows that smart cameras are not restricted to a niche market, but can be adopted ubiquitously. Strictly speaking, a smart camera is a stand-alone, self-contained device that integrates image sensing, ASIP and communications in one single box. It is designed for a special type of application (for example, surveillance, and industrial machine vision). However, there are other types of vision systems that are often referred to as smart cameras as well, such as PC-based smart cameras. Well analyze these different types of smart cameras in section 3. The term smart camera in this paper covers both stand-alone smart cameras and other types of smart cameras, as described in section 3.1, unless specified otherwise. The advent of smart cameras can be traced back to the early 1990s when PCs became popular and video frame grabbers became available. Early solid state CCD (Charge-Coupled Device) cameras of the mid1970s were analog cameras. Later digital signal processing (DSP) technologies pushed analog CCD cameras into the digital era with enhanced image quality, but the output of most of these cameras was still analog (e.g. NTSC/PAL signals). Frame grabbers allowed CCD cameras with analog output to be connected to computers and digitized for versatile processing by computers. This marked the beginning of smart camera systems, with the camera performing image capture and computer carrying out intelligent processing tasks such as motion detection and shape recognition. The first applications were in the area of industrial machine vision and surveillance. The real interest in and the growth of smart cameras started in late 1990s and early 2000s, spurred by factors such as technological advancements in chip manufacturing, embedded system design, the coming-

of-age of CMOS (Complementary Metal Oxide Semiconductor) image sensors and so on. Market demands from surveillance and machine vision also played significant roles. Advanced smart camera systems often integrate the latest technologies in image sensors, optics, imaging systems, embedded systems, computer vision, video analysis and communication, networking and etc. The heart of smart cameras is the intelligent ASIP algorithms and the hardware that runs them. Image feature extraction and pattern recognition are probably among the most widely used algorithms in smart cameras. In a way, a smart camera can be thought of as an image feature extractor or a visual pattern recognizer. Research in computer vision, image understanding and pattern recognition has yielded many algorithms and solutions that can be used by smart cameras. However, the performance and robustness of the ASIP algorithms when deployed into cameras operating under real-world conditions are among the most important issues facing the development and commercialization of new smart cameras. In the remainder of this paper, we analyze the main reasons behind the rapid growth of smart cameras (section 2), review system architectures of different smart cameras (section 3), review the state-of-the-art smart camera systems and ASIP algorithms for some applications (section 4), and finally discuss some design issues and conclude with some thoughts about technical challenges and future technological directions (section 5).


The Rapid Growth of Smart Cameras

Coming of Age of CMOS Image Sensors

The advent of CMOS image sensors (CIS) in late 1990s played an important role in the development of smart camera technology and systems, and has potential to make smart camera smaller, cheaper and more pervasive. Compared to CCD, CIS have several advantages which make them excellent candidates for smart camera front-end. These include smaller size, cheaper manufacturing cost, lower power consumption, the ability to build a camera-on-a-chip, the ability to integrate intelligent processing circuits onto the sensor chip, and significantly simplified camera system design. Most CISs are manufactured using the same process by which semiconductor chips (CPUs, memories, etc) are made. This means that many semiconductor manufacturers can make CIS, which drives up competition and reduces cost. CCD sensors, by contrast, are made using special chip manufacturing

process and there are only a few manufacturers in the world, mostly in Japan. CCD-based camera chipsets usually include at least three or four chips: a CCD pixel array, CDS (Correlated Double Sampling), a timing generator, and ADC (Analog-to-Digital Converter). In the case of CIS, all these functions can be integrated onto one single chip, making it a real camera-on-a-chip with light in and pixel out. This greatly simplifies camera system design and reduces cost. Compared with the CCD chip-set, there are many more sources from which a CIS can be purchased, even a single item at a time, which is very difficult to achieve in the case of CCD. All this makes it much easier for more researchers, students, and camera manufacturers alike to develop smart cameras of their own. Probably the most important advantage of CIS over CCD lies in its ability to have image sensor array and intelligent image processing circuits side by side on the same chip. This makes a single chip smart camera possible. One example is a vision-based single-chip fingerprint reader with on-chip CIS, a processing circuitry performing pattern matching and a memory storing templates of one or several user fingerprints for real-time comparison and identification [1]. A recent market survey by Gartner Dataquest [2] estimated that there are about 40 suppliers of CIS world-wide, and that the global CIS market would increase from $3.2 billion in 2005 to 5.6 billion by 2008. The survey showed that automobile, medical imaging and surveillance applications are among the emerging markets for CIS products.


Research in Computer Vision and Pattern Recognition

What makes a camera smart is the intelligent ASIP - the application-specific information processor built into the camera system. The advancement in academic and industrial research in real-time image processing and understanding, pattern recognition, machine learning, computer vision and video communication continues to provide a large library of intelligent algorithms for use by smart cameras for different applications. As an example, Intels OpenCV (Open Source Computer Vision) Library [3] has been very popular with academic researchers and students working on smart camera projects. Every year, numerous international journals, conferences and workshops give researchers world-wide forums to present their innovative work in areas such as computer vision and pattern recognition. A lot of the work presented can be seen as embryos of future smart cameras. Recently, first ever international conferences and workshops have been held focusing on the design of embedded vision systems.


Embedded System Technologies

A stand-alone smart camera is essentially an embedded vision system. Compared with PC-based systems, an embedded system is usually subject to many constraints on the design, implementation and production of the device which encapsulates it, such as low power, limited resources, real-time processing and low cost. An embedded vision system is even more challenging to design due to video processings insatiable demand for computing power and memory resources. In the last decade, embedded vision systems have made great progress thanks to the increasing affordability of powerful processors and memory chips, availability of real-time operating systems, low complexity intelligent algorithms and the coming-of-age of system development software and tools. Functional integration seems to be a trend in consumer electronics and ICT (Information and Communications Technology). For example, many cellular phones now come with a camera and can play music and receive radio. Some webcams have built-in intelligence such as face tracking. Functional integration can seemingly make a normal camera become smart. For example, a camera with an integrated voice/sound detection component can take a picture of the surrounding area when a human voice is detected, or it can take a picture in a direction from which a gun-shot has been detected [4].


Socio-Economical Drivers

Thanks to Moores law, semiconductor chips and computer hardware continue to shrink in size, reduce in cost and gain in performance. This has driven the prices of cameras, frame grabbers and computers down and made smart camera systems, especially PC-based systems, more affordable to research and development on one hand and to the market and end-users on the other. As hardware constraints (costwise) are lifted, software developers have more freedom to write "smarter" algorithms. One of the most significant developments in surveillance and security industries in the last several years has been the wide use of CCTV (Closed Circuit Television) cameras and their impact on crime, terrorist attacks, and on the general public. It is noticeable that after the 9/11 event in the US, video surveillance has received more attention not only from the academic community, but also from industry and governments. The recent terrorist attacks in the London Underground in mid-2005 and the successful use of CCTV by police in identification of perpetrators have intensified the talk about a new generation of intelligent video surveillance systems based on smart cameras. In fact, surveillance and security demands

are an important driving force behind the ever-increasing scale of academic and industrial research in advanced vision algorithms such as object tracking and identification, and human behavior analysis.

2.5 2.5.1

Market Demands and Analysis Digital Video Surveillance

The first generation of CCTV cameras (1980s-1990s) was mostly analog cameras with limited functionality and high cost. Digital CCTV cameras and the use of DVR (Digital Video Recorders) represented the second generation (2G, 1990s-now). Digital CCTV cameras built using CCD and CMOS image sensors provide better video quality, some intelligent functions such as motion detection, electronic PTZ (Pan-Tilt-Zooming), and networking. The 2G CCTV systems have become mass market products, fuelled by improved affordability and societys increasing concerns over safety and security. According to estimates made in 2004 by market research firm Datamonitor [5], digital video surveillance is a highgrowth segment within the overall surveillance market estimated at 55% CAGR (Compound Annual Growth Rate) between 2003 and 2007. In dollar terms, between 2003 and 2007 the market will grow from US$1.3bn to US$7.4bn globally. However, the 2G CCTV systems are not smart enough to help prevent crimes or terror attacks, even though they proved very useful in post-event identification of crime perpetrators. The 2G CCTV systems are mostly not automated systems and rely strongly on trained security personnel to perform image analysis, object tracking and identification. The increasing number of cameras makes this difficult for real-time analysis by security personnel. Network bandwidth is another important issue affecting realtime processing needed for crime prevention. The intelligent video surveillance system (IVSS) (also called the third generation CCTV system) will try to provide solutions to these problems. Smart cameras will be one of the fundamental building blocks of the IVSS, making it possible to build and deploy automated, distributed and intelligent multi-sensory surveillance systems capable of tracking humans and suspected objects, analyzing human behaviors, and etc. Many market research firms have predicted significant growth in intelligent video systems and smart cameras. For example, the market researcher Frost & Sullivan [6] has forecast that the US$153.7 million video surveillance software market is expected to witness a healthy CAGR of 23.4% from 2004 to 2011 to reach US$670.7 million.


Industry Machine Vision

Industrial machine vision is probably the birth place of smart cameras, at least in terms of the systematic use of commercial smart cameras. It is also one of their most active playgrounds. Most machine vision smart cameras are stand-alone cameras. The demand for these cameras has been steadily increasing over the years. The major end user industries are in robotics, semiconductor, electronics, pharmaceutical, manufacturing, food, plastics and printing. The tasks these smart cameras usually perform include bar-code reading, part inspection, flaw detection, surface inspection, dimensional measurement, assembly verification, print verification, object sorting, OCR (optical character recognition) and maintenance. A recent survey on machine vision products from a Europe based market research firm IMS Research [7] has discovered that smart cameras are rapidly accounting for a greater share of the machine vision market revenue. Demand for smart cameras is primarily driven by the increasing demand for better production efficiency and quality control in industries such as manufacturing and medicine / pharmaceutical. The survey revealed that whilst the sale of more traditional PC-based products (cameras and frame grabbers) has fallen, sales of smart cameras and compact vision systems have continued to grow. The survey predicts that the machine vision market in Europe will grow at an average rate of 11.6% each year to 2006. The highest levels of growth, approaching 20%, are forecast for the smart sensor and cameras product groups resulting in more than doubling in value in dollar terms. The same trend has also been forecast by the same company for the Asia-Pacific market [8]. An estimate provided by the annual market study by the AIA (Automated Imaging Association) for the 2003 North American machine vision smart camera market is about $57 million US dollars, with growth at 15% per year in terms of revenues and 20% per year in terms of units [9].


Other Significant Markets

Other important markets for smart cameras are ITS (Intelligent Transport Systems), automobiles, HCI (Human Computer Interface), medical/healthcare, games, toys, video conferencing, biometrics.

Review of Smart Camera System Architectures

In recent years, smart cameras have attracted considerable attention from academic and industrial

research and development (R&D) organizations. However, to the best of the authors knowledge, a systematic approach to analyzing smart cameras has yet to be agreed-upon. In this section we firstly

present one approach to classify smart camera systems and provide an analysis of their system architectures, followed by a review of some R&D activities on the design of smart cameras as embedded systems.


Classification of Smart Cameras

Smart cameras can come in different system and physical configurations. Figure 2 shows one proposed classification of different types of vision systems and smart cameras.

Stand-alone Smart Cameras Embedded Vision Systems Non Stand-alone Smart Cameras

Single Chip Smart Cameras

PC based Vision Systems

PC-based Smart Cameras

Vision Systems Network based Vision Systems Networked Smart Cameras

Hybrid Vision Systems

Other types of Smart Cameras

Figure 2: One proposed classification of vision systems and smart cameras. As shown in Figure 2, stand-alone smart cameras are a subset of embedded vision systems. Non-standalone embedded smart cameras are sometimes called compact vision systems. Compact vision systems are usually composed of general purpose cameras connected to an external embedded processing unit in a separate box to provide ASIP and communication/networking functionality. Single-chip smart cameras can be thought of as a special case of smart cameras because they require special system design considerations and are usually used in carefully targeted applications. Non-stand-alone smart cameras can be thought of as virtual smart cameras because from user point of view the cameras are smart, even

though the ASIP which makes them smart may be performed by an external unit, like a hardware accelerator board, a local PC or a networked PC. PC-based smart cameras, consisting of a general purpose video camera, a frame-grabber of some sort and a PC, of which the CPU performs the ASIP, is a very common and inexpensive platform for researchers, academics and students to conduct research on smart cameras. Sometimes a normal camera is connected to a PCI (Peripheral Component Interconnect) processing board within a PC. In this case, the PCI board may perform most of the ASIP and output generation, while the PC provides a flexible operator interface or additional processing power. This kind of system is a special case of a compact vision system and a PC-based system. A digital CCTV surveillance system with intelligent features is an example of a network-based smart camera system, and the next generation of distributed intelligent video surveillance systems will be the exciting test ground for smart cameras, especially stand-alone smart cameras. Hybrid vision systems may give rise to some special types of smart cameras. This category may also include smart camera systems that may need some kind of human intervention to help provide high accuracy data output.

3.2 3.2.1

Analysis of Different Types of Smart Cameras Common Characteristics

The common basic components of a normal digital video camera (consumer, professional or industrial) include optics, solid-state image sensor (CCD or CMOS), image processor(s) and supporting hardware, output generator, and communication ports. The main tasks performed by the image processor(s) are to provide color interpolation, color correction or saturation, gamma correction, image enhancement and camera control such as white balance and exposure control. The output generator can be an NTSC/PAL encoder to provide standard TV-compatible output, or a video compression engine to provide compressed video streams for communication over network, or digital video output generator such as a Firewire encoder. Communication ports, such as Ethernet or RS232 provide the basis for networked camera functionality or camera configuration and firmware upgrading through a PC respectively. The main basic components of a smart camera typically exhibit all the above essential components of a normal camera, with the following differences: A smart camera has a distinct and powerful signal processing unit to perform image feature extraction and/or pattern analysis based on application-specific requirements; and

A smart camera has an output generator to produce a coded representation of the image features and/or results from the pattern matching, or in some cases, control signals for other devices (e.g. alarm triggering signal) or actions (e.g. sending a picture of the number plate of a car which is speeding to police).

System architecture design for smart cameras often involves significant system engineering effort. Clear application requirements and specifications are crucial to the successful design. Software architecture, hardware architectures, and network architecture for network-based systems, need to be jointly designed to maximize resource usage and efficiency, and to reduce cost and time-to-completion. More detailed design considerations are discussed in section 5.1.


Stand-alone Smart Cameras

A stand-alone smart camera integrates image capture, ASIP and application specific output generation into a single device casing. A stand-alone smart camera may look very much like a normal industrial camera or a CCTV camera. While the primary function of a normal camera is to provide raw video for monitoring and recording, a smart camera is usually designed to perform specific, repetitive, high-speed and high-accuracy tasks in industries such as machine vision and surveillance. Most of the industry machine vision cameras are stand-alone smart cameras. While a normal video camera may only cost anywhere between US$50 and US$2 000, a machine vision smart camera can cost between US$1 000 and $6 000 per unit [10] and beyond, depending on the functionality and level of customization. Many pattern recognition techniques involve two types of processing tasks, data-intensive tasks such as image enhancement and feature extraction, and math-intensive tasks such as statistical pattern matching. While data-intensive tasks require high speed hardware to deal with high pixel volume and high frame rate, math-intensive tasks often require high performance processors to deal with issues such as pipelining and floating-point arithmetic. For demanding applications, camera hardware architecture may be based on a heterogeneous- and multiple-processor platform, with one or more processor(s) capable of implementing parallel processing (e.g. an FPGA - Field Programmable Gate Array) performing data intensive tasks, and a DSP and/or a RISC (Reduced Instruction Set Computer) processor performing math-intensive tasks. A smart camera built for face detection and recognition application by Broers et al. [11] is such an example. The system employs an FPGA and a parallel processor Xetal working in SIMD

(Single Instruction Multiple Data) mode, to perform data intensive operations such as face detection. A high performance DSP, TriMedia, with a VLIW (Very Long Instruction Word) core is used to perform high level programs such as face recognition. The system architecture can be represented as in Figure 3.
Data-Intensive Processing Block FPGA Xetal Math-Intensive Processing Block TriMedia

Image sensor and AFE/ADC Blocks

System Data Bus

Camera Control

Memory System

Communications/ network interfaces

Figure 3: A stand-alone smart camera system architecture for face recognition [11].


Single-Chip Smart Cameras

Single-board or single-chip smart cameras are a special kind of stand-alone smart camera. Single chip smart cameras take advantage of the integration capability of CMOS image sensors by building intelligent ASIP circuits onto the image sensor chip, potentially releasing the host computer of cumbersome pixel processing tasks and minimizing the data transfer between camera and computer. In some cases, pixellevel ADC and processing can be achieved [12], which can lead to a brand new level of signal and image processing methodologies. Single-chip smart cameras make it possible to design very efficient, very small, low power and low cost cameras (when a large volume is produced). As examples, the VISoc single chip smart camera [13] integrates a 320x256 pixel CMOS image sensor, a RISC processor, a vision co-processor and I/O onto a single chip, which has been fabricated in a 0.35m process on an area of about 36mm2, and a typical power dissipation of about 1W at 3.3V at 60MHz. Moorhead et al. [14] designed a smart CMOS camera chip which integrates an edge detection mechanism directly into the sensor array. Lee et al. [15] reported the design of a 30 frames/second VGA-format CMOS image sensor with an embedded massively parallel processor, for real-time skin-tone detection. In some applications single chip smart camera can bring distinct advantages. For example, Shigematsu et al. argue that, compared with conventional multi-chip fingerprint readers, a single-chip smart camera based fingerprint reader can have advantages of being much smaller, allowing much simplified

integration into mobile devices such as mobile phone, being low in cost, and having improved security [1]. The main disadvantage of the single-chip smart camera lies in the cost of chip design and manufacturing, unless a large volume of units can be produced to justify the initial capital investment. Nevertheless, a single-chip smart camera is a smart sensor that has potential to make vision systems pervasive, especially when connected to wireless sensor networks.


Embedded System based Smart Cameras

This category of smart cameras most often consists of a camera (usually a general purpose one) and an external embedded processing unit connected to it. For example, an embedded system based smart camera could be a general-purpose camera connected to a high performance video processing board, which itself is connected to a PC, either through a PCI slot or through a RS232 port. This kind of configuration is not too different from a PC-based system. Many 2G digital CCTV systems with some intelligent features belong to this category. The necessity of having a dedicated and embedded processing unit in this type of smart cameras is due to the fact that PC, while flexible and versatile, is far from being adequate to perform intensive image and video processing and pattern recognition tasks, particularly when high-resolution, high frame rate and low latency processing is required. Another advantage of this kind of system is that once proof-of-concept is achieved and end-users are identified, it is easier for the system to be converted to a stand-alone smart camera if required. Smart cameras used in robotic and automobile applications can also be classified into this category. These cameras may share computing resources such as a processor and memory with other embedded devices in the robot and in the vehicle.


PC and Network based Smart Cameras

PC-based smart camera systems are probably most popular within the academic research environment, as a first step to conducting computer vision and pattern recognition research, and building first prototype for proof-of-concepts. It is a very simple and inexpensive configuration, as the prices for general purpose video cameras and PCs continue to fall. Most often, a general purpose camera is connected to a PC through either a frame grabber or a communication port such as USB, Firewire, CameraLink, or Ethernet. This type of system relies on the PCs CPU to perform image analysis, feature extraction and pattern

recognition tasks. The availability of various vision processing libraries for PC platforms makes this kind of system very popular. PCs also provide a more flexible environment for building user interfaces. USB cameras, Firewire cameras and network cameras allow digital images to be transferred directly from camera to a PC or an embedded processing hardware, avoiding signal integrity loss caused by DAC (digital to analog conversion) inside many CCTV cameras and ADC by frame grabbers. For highresolution cameras, Firewire cameras are starting to become popular and affordable, but CameraLink remains dominant, especially for high bandwidth and high performance applications. The 2G CCTV system is a network based video surveillance system (NVSS). An NVSS with built-in intelligent surveillance features can be loosely considered as a network of virtual smart cameras. An NVSS is composed of four main layers: a CCTV camera (sensor) layer, a network layer, a central computer layer and a trained security personnel layer (Figure 4). As discussed in section 2.5.1, in most of the currently deployed NVSSes, the ASIP tasks such as object tracking and identification and threat detection are typically performed mostly by trained security personnel. However, human monitoring of surveillance video is a very labor-intensive task. It is generally agreed that watching video feeds requires a higher level of visual attention than most every day tasks. Specifically vigilance, the ability to hold attention and to react to rarely occurring events, is extremely demanding and prone to error due to lapses in attention. A recent study by the US National Institute of Justice found that, after only 20 minutes of watching and evaluating monitor screens, the attention of most individuals will degenerate to well below acceptable levels [16]. The next generation of video surveillance systems - intelligent video surveillance systems (IVSS) will try to solve these problems by providing automated video surveillance and crime preemption abilities. The IVSS will seek a re-distribution of ASIP tasks among the four layers in the NVSS system, notably shifting processing load from security personnel to central computers or DVR (in short-term), and probably more importantly to the surveillance cameras that is, the introduction of (stand-alone) smart cameras to replace passive or dumb CCTV cameras (in mid- and long-term). The use of smart cameras would greatly reduce the bandwidth problem caused by the increasing number of cameras present in the system and enhance surveillance system performance, as sending raw pixels over the network is less efficient than sending the results of intermediate analysis results. Smart cameras can also help in decentralizing the overall surveillance system, which can lead to improved fault tolerance and the realization of more surveillance tasks than with traditional cameras [17].

Camera 1

Camera 2

Camera N central computer (server) layer security personal layer

sensor layer

network layer

network layer

Figure 4: Four layers of a network based video surveillance system (NVSS).


Research in Smart Cameras as Embedded Systems

Video processing is notoriously hungry for computation horsepower, memory and other resources. Smart cameras as embedded systems have to meet the insatiable demand of video processing on one hand, and to meet the challenging demands of embedded systems, such as real-time, robustness, reliability under real-world conditions, on the other hand. This has made smart cameras a leading-edge application for embedded systems research [18]. Recently there has been a significant increase in research in building smart cameras as embedded systems. The first IEEE workshop on Embedded Computer Vision (ECV05) was held in June 2005 [19]. The workshop addressed issues such as how to design smart algorithms to efficiently utilize embedded hardware, how to meet real-time constraints in embedded environment and verification methods for mission-critical embedded vision systems. In particular, the workshop discussed the suitability of FPGA for embedded vision systems. Apart from numerous research groups working on developing smart cameras for video surveillance, there are a number of academic research groups in the world dedicated to research into building smart cameras as embedded systems. One prominent group is the Embedded Systems Group in Princeton Universitys Department of Electrical Engineering [18]. This group has developed an embedded smart camera system that can detect people and analyze their movement in real time. They are also working on a VLSI (Very Large Scale Integration) smart camera. An interesting research activity involving the design of stand-alone smart cameras is the SmartCam project at University of Technology Eindhoven [20]. This project investigates multi-processor based smart camera system architectures and addresses the critical

issue of determining correct camera architectural parameters for a given application domain. Another project bearing the same name is being undertaken by the University of Technology in Graz, Austria [17]. The project aims to develop distributed smart cameras for traffic surveillance applications. They also investigate various issues involved in making smart cameras as embedded systems, such as resourceaware dynamic task allocation systems to support real-time requirements. Many industry research groups and companies are involved in smart camera research for machine vision, especially in Germany, Japan and the US. There exist some very informative and useful journals and web portals for the machine vision world, such as IEEE Transactions on Pattern Analysis and Machine Intelligence, Advanced Imaging Magazine [21], Machine Vision Resources [22], Machine Vision Online [23]. A search on USPTO (US Patent and Trademark Office) website can reveal many patents filed or issued in relation to the concept and embodiment of smart cameras as embedded systems. For example, patent #6 985 780 filed in Aug 2004 under the title of Smart Camera [24] made claims about a camera system that includes an image sensor and a processing module at the imaging location that processes the capture images prior to sending the results to a host computer. The processing module can perform tasks such as image feature extraction and filtering, convolution and deconvolution methods, correction of parallax and perspective image error and image compression.

Review of ASIP Algorithms for Smart Cameras and State-of-the-Art Systems

If cameras are extensions of human eyes, the smart cameras are pushing the boundary of possibilities to

become extensions of human brain as well. What makes a camera smart is the intelligent and application specific information processing (ASIP) algorithms that are built into the software architecture of the camera systems. In this section we firstly explore some common characteristics of intelligent algorithms for smart cameras. We then review several categories of algorithms as applied to machine vision, surveillance and other prominent applications, and some state-of-the-art smart camera systems in use in these applications areas.


Common Characteristics of Algorithms for Smart Cameras

The primary function of a smart camera is to conduct autonomous analysis of the content of an image or video and achieve a high-level understanding of what is happening in the scene. One of the most

commonly adopted approaches is image processing-based pattern recognition, which is a branch of artificial intelligence. Pattern recognition assumes that the image may contain one or more objects and that each object belongs to one of several predetermined types or classes. Given a digitized image containing several objects, the pattern recognition process consists of three main phases, each including several processing tasks: Signal level processing image enhancement, image segmentation; Feature level processing feature extraction, feature measurements and tracking; and Object level processing object classification and estimation.

This is illustrated in figure 5. Also shown in figure 5 is a semantic-level processing component, which is central to the output or action side of smart cameras. The main tasks at this level include possible joint analysis of inputs from additional cameras, other sensory and database inputs, data fusion, event description, control signal generation. It should be noted that some tasks at different levels or phases may intersect each other during processing.
video capture

other camera and/or sensory and/or database inputs

person, behavior and event description

image enhancement and segmentation signal level

feature extraction and tracking feature level

object classification and estimation object level

control signal generation semantic level

Figure 5: General processing flow of algorithms for pattern recognition and smart cameras. Image segmentation at signal level is essential to all subsequent processing tasks, aiming at dividing an image into distinct parts, each having a common characteristic. Image segmentation can be based on color, texture, shape and motion. Feature extraction is crucial to pattern recognition. This is where the segmented regions or objects are measured. A measurement is the value of some quantitative property of an object. A feature is a function of one or more measurements, computed so that it quantifies some significant characteristic of the object. This drastically reduced amount of information (compared to the original image) represents all the knowledge upon which the subsequent classification decision must be based. Object classification outputs a decision regarding the class to which each object belongs. Each

object is recognized as being of one particular type, and the recognition is implemented as a classification process [25]. For simple applications, not all these levels and tasks are required to be implemented. For example, the camera in an optical mouse only performs signal- and feature-level processing tasks. On the other hand, for a particular processing task, different applications can have quite different requirements on the cameras performance, robustness and reliability. For example, the requirements for robustness of processing tasks at all levels are much higher for video surveillance monitoring human movement and behaviors than for industry machine vision cameras performing parts inspection or sorting. Tasks at signal- and feature-levels are usually data-intensive and are well suited for hardware-based implementation to meet speed demands. Tasks at the object level can be math-intensive and may need high performance processor(s) to complete. Stand-alone smart cameras built on a multi-processor architecture would have one processor, such as a DSP or an FPGA, to perform tasks at signal- and feature-levels, and have a high performance DSP or RISC microprocessor to perform statistical object classification. When designing smart cameras as embedded systems for demanding applications such as surveillance and automobiles, there are several important and challenging issues that need to be addressed, such as the development of low-complexity, low-cost algorithms suitable for hardware implementation, and software and hardware co-design, in order to map algorithmic requirements to hardware resources. These issues will be further discussed in section 5.1.


Application: Intelligent Video Surveillance Systems (IVSS)

Current Research in Algorithms for IVSS

Video surveillance in dynamic scenes, especially for humans and vehicles, is currently one of the most active research topics in computer vision and pattern recognition. The IEEE and IEE have organized many workshops and conferences on intelligent visual surveillance in the last several years and have published special journal issues that focus solely on visual surveillance or in human motion/behavior analysis. Hu et al. [26] and Valera et al. [27] recently conducted excellent surveys on various algorithms and techniques under research and development for video surveillance. They also reviewed some high profile IVSS systems. Some comments in this section are derived from their papers.

For video surveillance, image segmentation most often starts with motion detection, which aims at segmenting regions corresponding to moving objects from the rest of an image. Background modeling is indispensable to motion detection. 3-D models can provide more realistic background descriptions but are more costly. 2-D models have more applications currently due to their simplicity. However, all modeling techniques need to find ways to reduce the effect of unfavorable factors such as illumination variation, moving shadows and so on. Promising techniques for motion segmentation include simple background subtraction, temporal differencing, and more complex optical flow methods. Skin-color based segmentation can be very useful when human objects are close enough to the camera and lighting is consistent. Once segmentation has provided isolated objects, feature extraction and measurements can be performed on each object. Simple algorithms for feature extraction include image moments, which can provide geometrical features of the objects. For gesture and behavior recognition, promising algorithms for feature extraction include MEF (Most Expressive Features), extracted by Karhunen-Loeve projection, and MDF (Most Discriminative Features), extracted by multivariate discriminate analysis [28]. Since sometimes it is not easy to specify features explicitly, in some applications when the image size is small enough, the whole image or transformed image is taken as the feature vector. Examples of algorithms for object classification are shape-based classification and motion-based classification. After motion detection and object classification, video surveillance systems generally track moving objects from one frame to another. Promising algorithms for object tracking can be classified into four categories: regionbased tracking, active contour based tracking, feature based tracking, and model based tracking. Particle filters have recently become a major way of tracking moving objects. Human behavior understanding and personal identification are among the most challenging tasks facing IVSS systems for high-end security applications. Behavior understanding involves the analysis and recognition of motion patterns, and the production of high-level description of actions and interactions. Promising approaches and algorithms for behavior understanding include dynamic time warping, finite state-machine, HMMs (Hidden Markov Models), time-delay neural networks. Personal identification is of increasing importance for many security applications. The human face and gait are now regarded as the main biometric features that can be used for personal identification in video surveillance systems. While face recognition research and development has made a lot of progress in recent years, current research on gait recognition is still in its infancy.


State-of-the-Art IVSSes

A number of high-profile IVSSes have been reported in recent years. These systems, some deployed in real-world applications, applied various pattern recognition techniques described in previous sections and provided features such as people tracking, behavior recognition, detection of unattended objects and so on. Examples are the real-time visual surveillance system W4 [29], the Pfinder system developed by Wren et al. [30], the single-person tracking system, TI, developed by Olsen et al. [31], and a system at CMU (Carnegie Mellon University) [32] that can monitor activities over a large area using multiple cameras connected by a network. A few IVSSes based on the use of stand-alone smart cameras have also been reported. The V2 system developed by Christensen and Alblas [33] is a surveillance system that avoids the disadvantages of the centralized computer server, and moves many of the processing tasks directly to the camera, making the system a group of smart cameras connected across the network. The event detection and storage of event video can be performed autonomously by the camera. Thus, normally, it is only necessary to communicate with a central point when significant events occur. The VSAM project described by Collins [34, 35] is a multi-camera surveillance system composed of a network of smart sensors that are independent and autonomous vision modules. These vision sensors are capable of detecting and tracking objects, classifying the moving objects into semantic categories such as human or vehicle and identifying simple human movements such as walking. Desurmont et al. [36] developed a smart network camera system with three smart cameras to perform people tracking and counting in shopping malls. Their system uses web services standards and XML-based meta data to implement inter-camera and camera-to-host coordination. Fleck et al. [37] designed a smart camera that contains an FPGA and a PowerPC processor to perform face tracking and people tracking, using particle filters on HSV (Hue, Saturation, Value) color distributions. The camera outputs the approximated PDF (probability distribution function) of the target state to a host computer.


Application: Industry Machine Vision

While advanced algorithms for smart cameras for surveillance applications are mostly still in their research and development stage, due to high complexity and high-level of robustness requirement for real-world applications, smart cameras for industry machine vision have long established their places in

the market as mature players. Most machine vision cameras are stand-alone and autonomous smart cameras, where communications with PC or other central control unit is only needed for camera configuration, firmware upgrading or in some cases output data collection. Most algorithms implemented in these cameras follow the similar processing flow described in figure 5. One important reason for the relative maturity of machine vision smart cameras, compared with smart cameras for surveillance, is that the application requirements for machine vision cameras are much less restrictive compared with those for surveillance cameras. In other words, many pattern recognition algorithms or techniques have a much better chance of performing with satisfactory robustness and reliability for machine vision than for surveillance applications. This is because machine vision cameras mainly deal with conditions such as: indoor use, thus good and consistent lighting conditions can be more easily guaranteed; minimum problems of occlusion; static and known background, thus unusual feature detection is simpler; limited object patterns to be recognized; and no human movement tracking and recognition is necessary.

There are many proven software packages on the market that can be customized or directly implemented for programmable machine vision cameras. Most of these packages are for special industry sectors, but some are general purpose packages, including a few powerful up-market libraries such as Halcon library [38]. The Halcon library provides algorithms that include shape-based matching to find objects based on ROI (region of interest) modeling, blob analysis, metrology (both 1D and 3D), edge detection, edge and line extraction, contour processing, template matching, and color processing. Thanks to the advancements in embedded system technologies and improved affordability of processing power, there is a migration of the functionality of what were once only PC-based systems down to the smart camera level. Artificial intelligence is one of these functionalities. Pulnix Americas ZiCAM camera, for example, makes use of a hardware neural network to eliminate the need for programming to execute image-understanding algorithms [39]. It can learn what is required for a machine vision application, and once taught, operates as a stand-alone smart camera. Wintriss Engineering manufactured a smart camera which sports a microprocessor, DSP and multiple FPGAs with up to

130,000 gates [40]. The company offers both area- and line-scan versions of their smart cameras, with line scan version being able to perform imaging-related processes on 5 150 pixel lines at 40 MHz. One such camera uses an FPGA to perform image sensor control and pixel correction, and the combination of the compute power in the camera head to run real-time digital filters, lighting correction, streak correction and input/output capability. Ultimately geometric and photometric manifested flaws are discriminated based on connectivity analysis, all performed within the camera.

4.4 4.4.1

Application: Intelligent Transport Systems and Automobiles ITS Applications

There is growing awareness and interest in using smart cameras in Intelligent Transport Systems (ITS) and automobile industries. IEEE organized very recently an international workshop in June 2005 on Machine Vision for Intelligent Vehicles [41]. Generally speaking, the application and algorithmic requirements for ITS are quite similar to those of IVSS. These requirements can be quite different for automobile applications, however, where high-speed imaging and processing are often needed, imposing higher level of demand on both hardware and software. Increased robustness is also required for carmounted cameras to deal with varying weather conditions, speeds, road conditions, car vibrations. CMOS image sensors can overcome problems like large intensity contrasts due to weather conditions or road lights and further blooming, which is an inherent weakness of existing CCD image sensors [42]. There have been a number of successful applications of smart camera systems for ITS reported in the literature. The VIEWS system at the University of Reading [43] is a 3D model-based vehicle tracking system. Kumar et al. [44] described a real-time rule-based behavior-recognition system for traffic videos. This system will be useful for better traffic rule enforcement by detecting and signaling improper behaviors, which is capable of detecting potential accident situations and is designed for existing camera setups on road networks. Beymer et al. [45] presented a smart camera-based monitoring system for measuring traffic parameters. The aim of the system is to capture video from cameras that are placed on poles or other structures looking down at traffic. Once the video is captured, digitized and processed by onsite smart camera, it is transmitted in summary form to a transportation management centre for computing multi-site statistics like travel times. Bramberger et al. [42] described an embedded smart camera for stationary vehicle detection. They discussed the mapping of high-level algorithms to

embedded system components. Dimitropoulos et al. [46] described a network of smart cameras deployed at the airport to detect and track aircrafts; each camera can autonomously detect aircraft traffic in multiple locations within its field of view. A camera data fusion module performs data fusion from multiple cameras to determine the location and size of the aircraft. Other applications for smart cameras for ITS include vehicle behavior in parking lots, vision based vehicle speed measurement, red-light intrusion at traffic lights, vehicle number plate recognition. Some authors have expressed the need to integrate smart traffic surveillance systems with existing traffic control systems to develop the next generation of advanced traffic control and management system [47].


Automobile Applications

Intelligent vehicles will form an integral aspect of the next generation technology of ITS. Smart camera-powered intelligent vehicles will have the comprehensive capability of monitoring the vehicle environment including the drivers state and attention inside of the vehicle as well as detecting roads and obstacles outside the vehicle, so as to provide assistance to drivers and avoid accidents in emergencies. However, building and integrating smart cameras into vehicles is not an easy task: On one hand the algorithms require considerable computing power to work reliably in real-time and under a wide range of lighting conditions. On the other hand, the cost must be kept low, the package size must be small and the power consumption must be low [48]. Applications of smart cameras in intelligent vehicles include lane departure detection, cruise control, parking assistance, blind-spot warning, driver fatigue detection, occupant classification and identification, obstacle and pedestrian detection, intersection-collision warning, overtaking vehicle detection. Below are a few examples. Stein [49] described a single smart camera-based adaptive cruise control system for intelligent vehicles. In a paper on obstacle detection using stereo vision, Ruichek [68] focused on a multilevel- and neuralnetwork-based stereo-matching method for real-time road obstacle detection with linear cameras for use in vehicles. Xu et al. [50] addressed the problem of pedestrian detection and tracking with night vision using a single infrared video camera installed on the vehicle. The EyeQ is a single chip smart camera processor developed by Mobileye [51]. It has been fabricated using 0.18m CMOS technology, operating at 120 MHz. It integrates two 32 bit RISC ARM946E CPUs, four Vision Computing Engines, a multichannel DMA (Direct Memory Access) and several peripherals and is designed for computationally

intensive applications for real-time visual recognition and scene interpretation for use in intelligent vehicle systems.


Other Application Areas

Other important applications for smart cameras include HCI, medical imaging, robotics, games and toys. Optical mice are widely used. Smart cameras performing gesture recognition will play important role in the development of multimodal user interfaces. Bonato et al. [52] presented an FPGA-based smart vision system for mobile robots capable of performing real-time human gesture recognition. The RVT system developed by Leeser et al. [53] and based on FPGA processing allows surgeons to see live retinal images with vasculature highlighted in real time during surgery.

Smart Camera Design Considerations and Future Directions

In this final section we discuss design considerations for smart cameras as embedded systems, identify

several key issues that need to be addressed by the design and research community, and speculate on the future directions of smart camera research and development.

5.1 5.1.1

Design Considerations Design and Development Process

Figure 6 shows a typical design and development process for smart cameras as embedded systems (excluding single-chip smart cameras). A shown in figure 6, the process can be iterative, especially if the initial application specification was not complete from the end user point of view.

Project Definition

Application Requirements Specifications

System Architecture Design

Proof of Concept - Algorithm and Hardware

No Requirements Met? Yes Field Test Evaluation

Embedded System Integration and Debugging

Algorithm Conversion

Engineering Prototyping / Manufacturing

Figure 6: Design and development process for smart cameras as embedded systems.

The system architecture design stage will decide on software and hardware architectures, based on performance, deadline and cost criteria. Algorithmic design and timing design suitable to the targeted hardware platform also needs to be defined. The mapping between algorithm requirements and hardware resources is an important issue. The proof-of-concept stage may use a PC platform for research and algorithm development. Usually a COTS (Commercial Off-The-Shelf) general purpose camera is used at this stage. Hardware components need to be acquired, integrated and tested. However, this is not needed if, during the architecture design stage, a third party camera development platform or hardware accelerator unit for video processing is identified to be an appropriate solution to hardware platform (see section 5.1.6 for examples of smart camera development platforms). The algorithm conversion stage includes tasks such as converting floating-point arithmetic to fixed-point arithmetic, low power and low complexity version consideration, implementation using HDL (Hardware Description Language). The Embedded System Integration stage will result in a prototype smart camera using an embedded hardware platform running embedded versions of algorithms.


System Architecture and Design Methodology

System architecture design will surely depend on application requirements, which can be very simple (e.g. an optical mouse) but can be very complex (e.g. face recognition). System architecture design has to consider many factors such as the hardware platform, cost, time to market, flexibility, and so on. Generally speaking, a heterogeneous, multiple-processor architecture can be ideal for smart camera development. For example, such an architecture may consist of an FPGA or a DSP as a data processor to tackle image segmentation and feature extraction, and a high-performance DSP or media processor to tackle math-intensive tasks such as statistical pattern classification. This kind of system can allow better exploitation of pipelining and parallel processing, which are essential to achieve high frame rates and low latency. Some authors have reported work on the impact of hardware system architecture on the level of implementable pipelining and parallel processing for smart cameras [54, 55]. Some initial work has been reported on design methodology for embedded vision systems [56, 57].


Embedded Processors

There are generally four main families of embedded processors that can be used for smart cameras: Microcontrollers, ASICs (Application Specific Integration Circuits), DSPs (Digital Signal Processors) and PLDs (Programmable Logic Devices) such as the FPGA. Microcontrollers are cheap but have limited

processing power and are generally not suited for building demanding smart cameras. ASICs are powerful and power-efficient processors, but the design cost and risk are high and they are viable solutions only when volume is high and time-to-market is well-timed. DSPs are relatively cheap and powerful in performing image and video processing, but for demanding applications usually more than one DSP would be needed. DSP-based solutions can be cost-effective for medium-volume production. Recently a new class of DSP processors, called media processors, has come into the vision market. Media processors try to provide a good trade-off between flexibility and cost-effectiveness. They typically have a high-end DSP core employing SIMD (Single Instruction Multiple Data) and VLSI architectures, married on-chip with some typical multimedia peripherals such as video ports, networking support, and other fast data ports [58]. Examples of media processors are Philips TriMedia, TIs DM64x, ADIs (Analog Devices, Inc) Blackfin. The FPGA has recently emerged as a very good hardware platform candidate for embedded vision systems such as smart cameras. One of the most important advantages of the FPGA is the ability to exploit the inherently parallel nature of many vision algorithms. FPGAs used to be mainly employed as glue logic between processors and peripherals, but the introduction of on-chip hardware multipliers and dual-port memory has made FPGAs excellent options for DSP applications. The integration of microprocessors into FPGA chips (such as Xilinx Virtex-II Pro and Virtex-4 chips) made them true system-on-a-chip solutions. These features, together with the continuous improvements in cost and maturity of design tools, have made FPGAs very competitive against DSPs and media processors for many types of embedded vision system designs. In fact, an increasing number of publications on smart cameras as embedded systems have employed FPGAs as the sole processor or as a data-intensive processor before a DSP or a media processor, in a powerful heterogeneous multi-processor architecture [59]. Sen et al. [56] has recently proposed a design methodology for effectively and efficiently implementing computer vision algorithms on FPGA to build smart cameras. A study to compare the relative performance of running various image processing routines on DSP, PowerPC, Intel Pentium 4 and FPGA was published on Alacrons web site [60], in which the FPGA solution was found to produce a distinct advantage. However, a more standardized performance evaluation mechanism to help processor selection is much needed.

How should one choose between DSPs, media processors, ASICs and FPGAs? Kisacanin [58] proposed a practical way to help processor selection based on intended production volume, cost and development flexibility. He argued that ASICs may be suitable for high volume of over 1 000 000units, DSPs or media processors for medium volumes between 10 000 and 100 000 units, while for low volumes of under 10 000, FPGAs can be a good viable candidate.


Algorithms Development and Conversion

Algorithm development for embedded systems is quite different from that for PC-based platforms. Basically it can be a lot more demanding and challenging, especially if FPGA or ASIC processors are targeted. Usually when designing applications for ASIC or FPGA, one has to understand chip architecture so that algorithms can be executed efficiently and effectively. Nowadays behavior synthesizers or algorithmic synthesizers do exist to help designers to forget about the device architecture and focus on functionality, but they come at the cost of efficiency in terms of chip area or gate counts and power consumption. Therefore, it is always important to gain an intimate knowledge of the device architecture of whichever of the ASIC, FPGA or DSP is targeted. This intimate knowledge can also help design parallel processing and pipelining processing, which can be a very important and effective video processing technique. Converting floating-point arithmetics to fixed-point and eliminating divisions as much as possible (by using hardware multipliers and look-up tables, for example) are other design considerations for algorithm conversion.


Other Factors

Memory System - Smart cameras need flexible memory models to meet requirements such as scalable frame buffers to cope with increasing image sensor resolutions. As the smart camera may integrate different types of processors, the memory system should support potentially complex processing pipeline and parallelism in order to meet the applications real-time requirements. For single chip smart cameras, care needs to be taken at design stage to conserve memory [54]. Communication Protocols - There are currently too many data output protocols for cameras, such as Firewire, CameraLink, GigE, USB. Firewire is maturing but CameraLink remains the bandwidth leader and very popular with the machine vision users. Unfortunately, the variety of digital interfaces increases the confusion in the market and put pressure on the camera vendors to support multiple versions of cameras with different interfaces.


Smart Camera Development Platforms

There have been a number of commercially available programmable smart camera platforms for developers to design and prototype smart cameras for applications such as machine vision, biometrics, HCI and surveillance. Philips has introduced the INCA (INtelligent CAmera) series of programmable cameras [61] which integrate CMOS image sensors of various resolutions and a highly flexible duel-core processing unit which includes a Xetal processor for computation intensive signal processing such as feature extraction, together with a high performance TriMedia DSP core for math-intensive processing tasks such as pattern recognition. The camera comes with an application development kit allowing for fast prototyping. One application has been designed for face recognition [62], in which the Xetal is used for face detection and TriMedia for face recognition. Sony has recently released a smart camera development system XCI-SX1 that integrates an SXGA CCD image sensor (15 frames per second, 34fps at 640x480 resolution) and an AMD GeodeGX533 400Mhz processor running MontaVista Linux operating system [63]. The camera platform is designed to provide OEMs, systems integrators and vision tool manufacturers a rugged, robust component, combining the imager, intelligence and interface in a single plug-in module that is simple to set up and easy to integrate. The IQeye3 IP camera from IQinvision Inc, powered by a 250 MIPS PowerPC CPU, is a platform for smart IP network camera development [64]. Some signal processing tool development companies provide multi-processor development systems that can serve as excellent development platforms for smart cameras. For example, Hunt Engineering [65] provides a development platform HERON based on a Xilinx FPGA and a TI (Texas Instruments) DSP. They also provide expansion capabilities to integrate video capture, IPs, more DSPs and/or FPGAs for creating scalable smart camera architectures. Lyrtech also provides similar development systems in its SignalMaster series of products [66]. These systems generally provide flexible communication ports and drivers.


Key Issues or Challenges

System Design The proprietary nature of smart cameras can limit choices of hardware, like imagers, I/O, lighting, lens and the communications format. This may lead to a lack of expandability and flexibility of PC-based systems. On the other hand, smart cameras dont have as many software applications and libraries as already exist for PC/frame grabber-based systems. In terms of design methodology, the easy

integration of intellectual property in the design tool and flow can help foster product differentiation. Other important system-level issues include smart camera operating systems, development tools. CMOS Image Sensors Dynamic range is still one of the key aspects where CMOS image sensors lag behind CCD. Improvement in this area can lead to more low-cost smart cameras using CMOS image sensors for machine vision and surveillance applications. Algorithm Development Many intelligent pattern recognition algorithms work well in laboratory conditions but fail when deployed and implemented in real-world conditions (occlusion, lighting condition changes, unfavourable weather conditions), and embedded system environments (scant resources, low power, low cost). Robustness and low complexity are among key issues facing researchers developing algorithms for smart cameras in surveillance, ITS and automobile applications. Performance Evaluation - This is a very significant challenge in smart surveillance systems. Evaluating the performance of video analysis systems requires significant amounts of annotated data. Typically, annotation is a very expensive and tedious process. Additionally, there can be significant errors in annotation. All of these issues make performance evaluation a significant challenge [16]. Standards Development There is need for the development of some smart camera standards. In fact, the European Machine Vision Association (EMVA, [67]) has recently launched an initiative (EMVA 1288 Standard) to define a unified method to measure, compute and present specification parameters for smart cameras and image sensors used for machine vision applications. More needs to be done in this respect. Single Chip Smart Cameras Single-chip smart cameras are an attractive concept, but the manufacturing cost for the single-chip smart cameras can be high because the feature size for making digital processors and memory is often different from the one used to make image sensors, which may require relatively large pixels to efficiently collect light. Therefore, for applications where physical space and power consumption is not extremely restrictive, it probably still makes sense to design the smart camera in a multi-chip approach with a separate image sensor chip. Separating the sensor and the processor also makes sense at the architectural level, given the well-understood and simple interface between the sensor and the computation engine [54].


Future Directions

The demand for smart cameras will steadily increase in traditional industries such as surveillance and industry machine vision, and may also come from new industry and market segments such as healthcare, entertainment, education and so on. Research interest, economic and social factors will drive continuous technological and product development. Based on the discussions above, we can discern the following future directions for smart camera system and technologies. At the system design level, continuous effort will be made in the development of a research strategy or design methodology for smart cameras as embedded systems. Same for the development of libraries and tools that facilitate algorithm implementation in DSPs and FPGAs. Research on the general and optimal architectures for smart cameras and on real-time operating systems for smart cameras will be undertaken, and the issue of too many digital interfaces (Firewire, CameraLink, etc) for cameras will be addressed. At the ASIP algorithm development level, in order to improve performance and robustness of existing techniques, research should address issues such as occlusion handling, fusion of 2D and 3D tracking, anomaly detection and behavior prediction, combination of video surveillance and biometrical personal identification, multi-sensory data fusion [26]. Multi-modal, multi-sensory augmented video surveillance systems have the potential to provide improved performance and robustness. Such systems should be adaptable enough to adjust automatically and cope with changes in the environment like lighting, scene geometry or scene activity. Work on distributed (or networked) IVSS should not be limited to the territory of computer vision laboratories, but should involve telecommunication companies and network service providers, and should take into account system engineering issues. In the machine vision arena, smart cameras will offer more and more functionality. The trend of distributing machine vision across the entire production line at points before value is added will continue. Neural network techniques seem to have become a key paradigm in machine vision that are used either to correctly segment an image in a wide variety of operational conditions or

to classify the detected object. Stereo and 3D-vision applications are also increasingly widespread. Another trend is to utilize machine vision in the non-visible spectrum. New product developments will introduce smart camera-based digital imaging systems into existing consumer and industry products, to increase their value and create new products. Standards development. One area which may need standardization is the metadata format that facilitates integration and communication between different cameras, sensors and modules in a distributed and augmented video surveillance system. New communication protocols may be needed for better communication between different smart camera products.

The authors would like to thank Dr. Xing Zhang from ST Microelectronics and Dr. Julien Epps of National ICT Australia for their many valuable comments and corrections of parts of this paper.

[1] S. Shigematsu, H. Morimura: A Single-Chip Fingerprint Sensor and Identifier. IEEE Journal of Solid-State Circuits, Vol. 34, No. 12, December 1999. pp.1852-1859. [2] M. LaPedus: CMOS Image Sensors Market Consolidates. http://www.eet.com/news/semi/showArticle.jhtml?articleID=177102846. [3] Intel Open Source Computer Vision Library. http://www.intel.com/technology/computing/opencv/index.htm. [4] Chicago Pairing Surveillance Cameras with Gunshot Recognition Systems. http://www.securityinfowatch.com/online/CCTV--and--Surveillance/Chicago-Pairing-Surveillance-Cameras-withGunshot-Recognition-Systems/4628SIW427. [5] Marketresearch.com: Global digital video surveillance markets. http://www.marketresearch.com/product/display.asp?productid=1032291&xs=r. [6] Frost & Sullivan: Video Surveillance Software Emerges as Key Weapon in Fight Against Terrorism. http://www.prnewswire.co.uk/cgi/news/release?id=151696. [7] Smart Products Can See The Future. http://www.imsresearch.com/members/pr.asp?X=103. [8] Smart Cameras Drive Machine Vision Growth. Advanced Imaging Journal. October 2005. page 8. [9] Machine Vision Online: JAI PULNiX Forms New Smart Camera Business Unit. http://www.machinevisiononline.org/public/articles/archivedetails.cfm?id=1990.

[10] W. Hardin, Smart Cameras: The Last Step in Machine Vision Evolution? http://www.machinevisiononline.org/public/articles/archivedetails.cfm?id=389. [11] H. Broers, R. Kleihorst, M. Reuvers and B. Krose: Face Detection and Recognition On A Smart Camera. Proceedings of ACIVS 2004, Brussels, Belgium, Aug.31- Sept.3, 2004. [12] Pixim Digital Pixel System Technology Backgrounder. http://www.pixim.com/html/tech_about.htm. [13] L. Albani, P. Chiesa, D. Covi, G. Pedegani, A. Sartori, M. Vatteroni: VISoc : A Smart Camera SoC. Proceedings of the 28th European Solid-State Circuits Conference, pp.367-370, Firenze, Italy, September 2002. [14] T.W.J. Moorhead, T.D.Binnie: Smart CMOS Camera For Machine Vision Applications. Image Processing and Its Applications, Conference Publication No.465. IEE 1999. pp.865-869. [15] M.S. Lee, R. Kleihorst, A. Abbo, E. Cohen-Solal: Real-time Skin-tone Detection with A Single-chip Digital Camera. Proc. of 2001 Intl Conference on Image Processing. Volume 3, 7-10 Oct. 2001. Page(s):306 309. [16] A. Hampapur, L. Brown, J. Connel, S. Pankanti, A. Senior, Y. Tian: Smart Surveillance: Applications, Technologies and Implications. 4th IEEE Pacific-Rim Conference On Multimedia. 15-18 December 2003, Singapore. [17] SmartCam - Design and Implementation of an Embedded Smart Camera: http://www.iti.tu-graz.ac.at/de/research/smartcam/smartcam.html. [18] W. Wolf, B. Ozer, T. Lu: Smart Cameras As Embedded Systems. IEEE Computer, 35(9):4853, Sep 2002. [19] The First IEEE Workshop on Embedded Computer Vision: http://www.scr.siemens.com/ecv05/. [20] SmartCam: Devices for Embedded Intelligent Cameras. http://www.stw.nl/projecten/E/ees5411.html. [21] Advanced Imaging. http://www.advancedimagingpro.com/. [22] Machine Vision Resources. http://www.eeng.dcu.ie/~whelanp/resources/r_references.html. [23] Machine Vision Online: http://www.machinevisiononline.org/. [24] USPTO. http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&u=/netahtml/searchadv.htm&r=1&p=1&f=G&l=50&d=ptxt&S1=(smart+AND+camera).TTL.&OS=ttl/(smart+and+camera)&RS=TTL/( smart+AND+camera). [25] K. R. Castleman: Digital Image Processing. 1st edition, Prentice Hall, New Jersey, 1996. [26] W. Hu, T. Tan, L. Wang and S. Maybank: A Survey on Visual Surveillance of Object Motion and Behaviors. IEEE Transactions on Systems, Man and Cybernetics. Vol. 34, No. 3, August 2004. 334-352. [27] M. Valera and S.A. Velastin: Intelligent distributed surveillance systems: A review. IEE Proc.-Vis. Image Signal Process. Vol. 152, No. 204 2, April 2005. 192-204. [28] Y. Wu, T.S. Huang: Vision-Based Gesture Recognition: A Review. Lecture Notes in Computer Science. Volume 1739, 1999. pp.103-114.

[29] I. Haritaoglu, D. Harwood, and L. S. Davis: Real-time surveillance of people and their activities. IEEE Trans. Pattern Anal. Machine Intell., vol. 22, pp. 809830, Aug. 2000. [30] C. R. Wren, A. Azarbayejani, T. Darrell, and A. P. Pentland: Pfinder: real-time tracking of the human body. IEEE Trans. Pattern Anal. Machine Intell., vol. 19, pp. 780785, July 1997. [31] T. Olson and F. Brill: Moving object detection and event recognition algorithms for smart cameras. In Proc. DARPA Image Understanding Workshop, 1997, pp. 159175. [32] A. J. Lipton, H. Fujiyoshi, and R. S. Patil: Moving target classification and tracking from real-time video. In Proc. IEEE Workshop Applications of Computer Vision, 1998, pp. 814. [33] M. Christensen, R. Alblas: V2- design issues in distributed video surveillance systems. Denmark, 2000, pp.186. [34] R.T. Collins, A.J. Lipton, H. Fujiyoshi, and T. Kanade: Algorithms for cooperative multisensor surveillance, Proc. IEEE, 89, (10), 2001, pp. 14561475. [35] R. T. Collins, A. J. Lipton, T. Kanade, H. Fujiyoshi, D. Duggins, Y. Tsin, D. Tolliver, N. Enomoto, O. Hasegawa, P. Burt, and L.Wixson: A system for video surveillance and monitoring. Carnegie Mellon Univ., Pittsburgh, PA, Tech. Rep., CMU-RI-TR-00-12, 2000. [36] X. Desurmont, B. Lienard, J. Meessen,, J.F. Delaigle: Real-Time Optimization For Integrated Network Camera. Proc. of SPIE - Real Time Imaging 2005, San Jose, CA, January 2005. [37] S. Fleck, W. Strasser: Adaptive Probabilistic Tracking Embedded in a Smart Camera. Proc. Of the 2005 Computer Society Conference on CVPR. [38] Halcon. http://www.mvtec.com/halcon/. [39] Machine Vision Online: JAI PULNiX Forms New Smart Camera Business Unit. http://www.machinevisiononline.org/public/articles/archivedetails.cfm?id=1990. [40] Are Smart Cameras Smart Enough? http://www.machinevisiononline.org/public/articles/archivedetails.cfm?id=493. [41] MVIV05: http://www.scr.siemens.com/mviv05/. [42] M. Bramberger, R. P. Pflugfelder, A. Maier, B. Rinner, B. Strobl, H. Schwabach: A Smart Camera For Traffic Surveillance. Proceedings of the first Workshop on Intelligent Solutions in Embedded Systems (WISES). June 2003. [43] T. N. Tan, G. D. Sullivan, and K. D. Baker: Model-based localization and recognition of road vehicles. Int. J. Comput. Vis., vol. 29, no. 1, pp. 2225, 1998. [44] P. Kumar, S. Ranganath, H. Weimin, K. Sengupta: Framework for Real-Time Behavior Interpretation From Traffic Video. IEEE Transaction on ITS, March 2005. Volume 6, No.1. pp. 43-54. [45] D. Beymer, P. McLauchlan, B. Coifman, and J. Malik: A real-time computer vision system for measuring traffic parameters. Proc. IEEE Conf. on Computer Vision and Pattern Recognition. pp. 495502.

[46] K. Dimitropoulos, N. Grammalidis, D. Simitopoulos, N. Pavlidou, M. Strintzis: Aircraft Detection and Tracking Using Intelligent Cameras. IEEE Intl Conference on Image Processing. Vol 2, 11-14 Sept. 2005 Page(s):594 597. [47] C. Nwagboso: User focused surveillance systems integration for intelligent transport systems. In Regazzoni, C.S., Fabri, G., and Vernazza, G. (Eds.): Advanced Video-based Surveillance Systems (Kluwer Academic Publishers, Boston, 1998), Chapter 1.1, pp. 812. [48] G. Stein: A Computer Vision System on a Chip: A case study from the automobile domain. First IEEE Workshop on Embedded Computer Vision. June 2005. [49] G. Stein, O. Mano and A. Shashua: Vision-based ACC with a Single Camera: Bounds on the Range and Range Rate Accuracy, IEEE Intelligent Vehicles Symposium, June 2003, Columbus, OH. [50] F. Xu, X. Liu and K. Fujimura: Pedestrian Detection and Tracking With Night Vision. IEEE Transaction on ITS, March 2005, Vol.6 No.1. 63-71. [51] EyeQ: System-on-a-chip. http://www.mobileye.com/eyeQ.shtml. [52] V. Bonato, A.K. Sanches, M.M. Fernandes, J.M.P. Cardoso, E.D.V. Simoes, E. Marques: A Real Time Gesture Recognition System for Mobile Robots. In International Conference on Informatics in Control, Automation, and Robotics, August 25-28, Setbal, Portugal, 2004, INSTICC, pp. 207-214. [53] M. Leeser, S. Miller, H. Yu: Smart Camera Based on Reconfigurable Hardware Enables Diverse Real-time Applications. Proc. of the 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines. [54] W. Wolf, B. Ozer, T, Lu: VLSI Systems for Embedded Video. Proc. IEEE Computer Society Annual Symposium on VLSI, 2002. [55] W.Wolf, T. Lv, B. Ozer: An Architecture Design Study for a High Speed Smart Camera. Proceedings of the 4th Workshop on Media and Streaming Processors. Istanbul, Turkey, 2002. [56] M. Sen, I. Corretjer, F. Haim, S. Saha, J. Schlessman, S. S. Bhattacharyya, W. Wolf: Computer Vision on FPGAs: Design Methodology and Its Application To Gesture Recognition. Proc. of the 2005 IEEE CVPR. [57] W. Caarls, P. Jonker, H. Corporaal: Benchmarks For SmartCam development. Proceedings of ACIVS 2003 (Advanced Concepts for Intelligent Vision Systems), Ghent, Belgium, September 2-5, 2003 [58] B. Kisacanin: Examples of Low-Level Computer Vision on Media Processors. Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. [59] W.J. MacLean: An Evaluation of the Suitability of FPGAs for Embedded Vision Systems. Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. [60] The Future of High Performance Machine Vision. http://www.alacron.com/NEWS/FUTURE.HTM. [61] Philips Industrial Vision Products. http://www.apptech.philips.com/industrialvision/products.htm.

[62] R. Kleihorst, M. Reuvers, B. Krose and H. Broers: A Smart Camera for Face Recognition. Proc. 2004 International Conference on Image Processing (ICIP04). Pp. 2849-2852. [63] Sony introduces first in smart camera series. http://news.sel.sony.com/pressrelease/5915. [64] IQinvision Smart Camera Systems. IQeye300 Series. http://www.iqeye.com/iqeye300.html. [65] http://www.hunteng.co.uk/. [66] http://www.lyrtech.com/index.php/ [67] Standard for Measurement and Presentation of Specifications for Machine Vision Sensors and Cameras. http://emva.org/home/content/blogcategory/135/164/. [68] Y. Ruicheck: Multilevel- and Neural-network-Based Stereo-Matching Method for Real-Time Obstacle Detection Using Linear Cameras. IEEE Transactions on ITS, March 2005, Vol.6 No.1. 54-62.

About the Authors YU SHI is a Senior Researcher with National ICT Australia in Sydney, Australia. He was granted his B.Eng in 1982 by the National University of Defense Technology in Changsha, Hunan, China. He later obtained his M.Eng and PhD in signal processing and biomedical engineering in 1988 and 1992 respectively in Toulouse, France. He also completed post-doctoral research at Oxford Brookes University in England in the late 1990s. His main research interests are in embedded vision systems, FPGA-based design and applications, multimodal user interfaces and web services. SERGE LICHMAN is a Senior Research Engineer with National ICT Australia in Sydney, Australia. He received M.Eng in Electrical Engineering in 1988 by the Odessa State Polytechnic University in Ukraine. His 12 years of experience in the area of image and signal processing for commercial software and hardware gave him practical skills in full product development life cycles, from research to deployment. His work has led to several publications.