8th Sem Project ITAMS

VISVESVARAYA TECHNOLOGICAL UNIVERSITY
Jnana Sangama, Belgaum-590 014
2012 - 2013
A Report on
Intelligent Traffic Analysis & Monitoring

System
Submitted to RVCE (Autonomous Institution Affiliated to Visvesvaraya Technological
University (VTU), Belgaum) in partial fulfillment of the requirements for the award of
degree of
BACHELOR OF ENGINEERING
in
COMPUTER SCIENCE AND ENGINEERING
by
Mayank Darbari
1RV09CS058
Shruti V Kamath
1RV09CS102
Under the guidance

of
Dr. Rajashree Shettar
Professor
Dept. of CSE, RVCE
R. V. College of Engineering,
(Autonomous Institution Affiliated to VTU)
Department of Computer Science and Engineering,

Bangalore 560059
VISVESVARAYA TECHNOLOGICAL UNIVERSITY, BELGAUM
R.V. COLLEGE OF ENGINEERING,

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

M y s or e R o a d, R . V. V i dy a ni ke t a n P o s t , B a ng al o r e - 5 6 0 0 5 9
CERTIFICATE
Certified that the project work entitled Intelligent Traffic Analysis & Monitoring
System carried out by Mr. Mayank Darbari & Ms. Shruti V Kamath, USN:
1RV09CS058 & 1RV09CS102 are bonafide students of R.V. College of Engineering,
Bangalore in partial fulfilment for the award of Bachelor of Engineering in Computer
Science and Engineering of the Visvesvaraya Technological University, Belgaum
during the year 2012-2013. It is certified that all corrections/suggestions indicated for
internal assessment have been incorporated in the report deposited in the departmental
library. The project report has been approved as it satisfies the academic requirement in
respect of project work prescribed for the said degree.
Dr. Rajashree Shettar

Professor,
Department of CSE,
R.V.C.E, Bangalore 59
Dr. N. K. Srinath
Head of Department,
Department of CSE,
R.V.C.E, Bangalore 59
Dr. B. S. Satyanarayana
Principal,
R.V.C.E,
Bangalore 59
Name of the Examiners
Signature with Date
1.____________________
__________________
2.____________________
__________________
VISVESVARAYA TECHNOLOGICAL UNIVERSITY, BELGAUM

R.V. COLLEGE OF ENGINEERING,
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

M y s or e R o a d, R . V. V i dy a ni ke t a n P o s t , B a ng al o r e - 5 6 0 0 5 9
DECLARATION
We, Mayank Darbari & Shruti V Kamath, students of Eighth Semester B.E., in the
Department of Computer Science and Engineering, R.V. College of Engineering,
Bangalore declare that the project entitled Intelligent Traffic Analysis &
Monitoring System has been carried out by us and submitted in partial fulfillment of
the course requirements for the award of degree in Bachelor of Engineering in
Computer Science and Engineering of Visvesvaraya Technological University,
Belgaum during the academic year 2012 -2013. The matter embodied in this report has
not been submitted to any other university or institution for the award of any other
degree or diploma.
Department of Computer Science and Engineering,

R.V. College of Engineering,
Bangalore-560059
Mayank Darbari
1RV09CS058
Shruti V Kamath
1RV09CS102
ACKNOWLEDGMENT
Any achievement, be it scholastic or otherwise does not depend solely on the individual efforts
but on the guidance, encouragement and cooperation of intellectuals, elders and friends. A
number of personalities, in their own capacities have helped me in carrying out this project work.
We would like to take this opportunity to thank them all.
First and foremost we would like to thank Dr. B. S. Satyanarayana, Principal, R.V.C.E,
Benguluru, for his moral support towards completing our project work.
We would like to thank Dr. N. K. Srinath, Head of Department, Computer Science &
Engineering, R.V.C.E, Benguluru, for his valuable suggestions and expert advice.
We deeply express our sincere gratitude to our project mentor Dr. Rajashree Shettar, Associate
Dean & Professor, Department of CSE, R.V.C.E, Benguluru, for her able guidance, regular
source of encouragement and assistance throughout this project.
We thank our Parents, and all the Faculty members of Department of Computer Science &
Engineering for their constant support and encouragement.
Last, but not the least, we would like to thank our peers and friends who provided us with
valuable suggestions to improve our project.
Mayank Darbari
1RV09CS058
8th Sem, CSE
Shruti V Kamath
1RV09CS102
8th Sem, CSE
i
ABSTRACT
The recent progress in the field of computer science has lead to the reducing cost and
growing computing power of the hardware, making vision-based technologies prominent and
popular solutions for surveillance and control systems. Visual vehicle surveillance is gaining
prominence in traffic analysis and monitoring. It is useful for criminal investigation and by
traffic police department to search by the videos content in order to analyze the objects in the
video. To find a vehicle from these videos because of car crashes, speeding, a truck in a no truck
zone or a particular type of vehicle that the user may be interested in, is a common case. A user
would have to rewind the video to look for an event which happened at a previous time. This
would be a very labor intensive and tedious process, and events could be overlooked due to
human error, if there is not an effective content based method for indexing and retrieval. This
project involves coherent system consisting of object detection, object tracking and classification
of objects, and its indexing from vehicle surveillance videos, vehicle speed limit warning, and
determination of traffic flow direction, traffic density and accident detection in real time.
The objects are detected using optical flow method and tracked using Kalman Filtering
method. The objects extracted are classified using 10 features including shape based features
such as area, height, width, compactness factor, elongation factor, skewness, perimeter,
orientation, aspect ratio and extent. A comparative analysis is presented in this project for the
classification of objects (car, truck, auto, human, motorcycle, none) based on Multi-class SVM
(one vs. all), Back-propagation, and Adaptive Hierarchical Multi-class SVM. This system has
different components for traffic analysis and monitoring such as determining the type of object
detected with the count for each computed in real time. The traffic flow direction is determined
as top to down or left to right or their vice versa. The traffic density is determined either as low,
medium or high. There are two algorithms used to accident detection, one based on a motion
direction and the other where a cumulative set of features is calculated to detect an accident.
The results obtained for the classification methods have an accuracy of 92% for MultiSVM (one vs. all), 87.8% for Adaptive Hierarchical Multi-class SVM, and 82% for backpropagation. Using the trained classifier obtained using Multi-class SVM (one vs. all), the
objects are classified in real-time. In addition, objects are indexed using type of object, size,
color and the frames it appears in its respective video.
ii
Table of Contents
ITAMS
TABLE OF CONTENTS
Dept. of CSE, RVCE
Feb-May 2013
iii
Table of Contents
Dept. of CSE, RVCE
ITAMS
Feb-May 2013
iv
Table of Contents
Dept. of CSE, RVCE
ITAMS
Feb-May 2013
Table of Contents
Dept. of CSE, RVCE
ITAMS
Feb-May 2013
vi
List of Figures
ITAMS
LIST OF FIGURES
Dept. of CSE, RVCE
Feb-May 2013
vii
List of Tables
ITAMS
LIST OF TABLES
Dept. of CSE, RVCE
Feb-May 2013
viii
Chapter 1
Introduction
Detecting and recognizing moving vehicles in traffic scenes for traffic surveillance,
traffic control, and road traffic information systems is an emerging research area for
Intelligent Transportation Systems. Due to the progress on the reducing cost and growing
computing power of the hardware, the vision-based technologies has become the popular
solutions for traffic surveillance and control systems. Visual vehicle surveillance videos are
widely used by the police for criminal investigation, by the traffic monitoring system and for
detection of abnormal activities and events like accidents. To find a vehicle from these videos
because of car crashes, speeding, a truck in a no truck zone or a particular type of vehicle that
the user may be interested in, is a common case. Analysis of traffic, like the direction of flow
and the density of traffic, is also an important aspect of traffic monitoring systems. A user
would have to rewind the video to look for an event which happened at a previous time. This
would be a very labor intensive and tedious process, and events could be overlooked due to
human error, if there is not an effective content based method for indexing and retrieval. The
proposed traffic monitoring system greatly reduces human effort.
Visual vehicle surveillance is one of the fastest growing segments of the security
industry. Some of the prominent commercial vehicle surveillance systems include the IBM
S3 [1] and the Hitachi Data Systems Solutions for Video Surveillance [2]. These systems not
only provide the capability to automatically monitor a scene but also the capability to manage
surveillance data and perform event based retrieval. In India, Aftek, Logica and Traffline are
some of the widely used traffic systems. The most recent research work in visual vehicle
surveillance includes real-time vehicle detection by parts [3], integrated lane and vehicle
detection and tracking [4] and occluded vehicle recognition and tracking [5].
The proposed system is a smart surveillance system which works for both real time
and prerecorded traffic videos. Moving objects are detected and tracked in the given input
video. Detected objects are classified based on their types using a robust selection of features.
Functionalities related to event detection and traffic analysis such as accident detection speed
estimation, traffic flow direction determination and traffic density estimation are also
implemented. In case of prerecorded videos, if the videos are large, to save time and
Page 1
Introduction
ITAMS
resources, shots are detected from it using colour histogram method, which takes the
histogram difference of the frames and computes a histogram. A threshold value is then set.
The frames where the value is above the threshold are identified as shots. There are great
redundancies among the frames in the same shot; therefore, certain frames that best reflect
the shot contents are selected as key frames to succinctly represent the shot. For detection of
objects, the Optical Flow Model is used. The objects detected in the video are tracked by
Kalman Filtering.
Objects detected are classified using Multi-class Support Vector Machine and Backpropagation algorithm, based on a number of features including shape based features. Initially
the Multi-class SVM (one vs. all), Adaptive Hierarchical Multi-class SVM decision tree
approach [6] and Back-propagation algorithm are trained using samples of the objects to be
detected by it. The trained data is then used to predict objects under predetermined number of
classes, based on type of the object. The classification results of these algorithms are
compared. The system contains separate modules for accident detection, traffic density
measurement, traffic flow direction determination and speed estimation.
Accident detection is achieved by using a cumulative set of features such as

orientation, position, area, change in bounding box of vehicles and speed. When the change
in these features for an object exceeds their respective thresholds across consecutive frames,
then an index is set. When the overall index value exceeds a particular threshold, an accident
is detected. For traffic density estimation, the number of objects per frame is counted and
based on the threshold value; density is estimated to be low, medium or high. The direction of
motion vectors across consecutive frames is taken into consideration for estimating the traffic
direction to be north to south, east to west or their vice-versa. The change in position of a
tracked vehicle across frames in seconds is calculated to estimate the speed in pixels per
second. By using a suitable scale factor, it is converted to kilometre per hour.
The proposed system also has a querying module, which can be used to query
detected objects using their dimension, colour or type. The frames in which only a particular
queried object appears can be viewed as well.
Dept. of CSE, RVCE
Feb-May 2013
Page 2
Introduction
ITAMS
1.1 Definitions
The following are the definitions relevant to this project:
1.1.1 Computer Vision

Computer vision [7] is a field in Computer Science that includes methods for
acquiring, processing, analyzing, and understanding images from the real world in
order to produce numerical or symbolic information. A theme in the development of
this field has been to duplicate the abilities of human vision by electronically
perceiving and understanding an image. Computer vision has application like Object
Detection, Object Recognition and Object Classification.
1.1.2 Video Analysis

The process of developing algorithms for the purpose of processing digital video data
with the objective of extracting the information conveyed by the data is known as
video analysis [8]. This feature is used in a wide range of domains including health
care, entertainment, safety and security, retail and transport. Algorithms for video
analysis may be implemented as software on computers for general purpose, or as
hardware on processing units specialized for handling video sequences.
1.1.3 Object Detection

Object detection [9] is a computer technology related to computer vision and image
processing that attempts to associate a region of interest in the image or video with a
potential object (such as cars, vegetation, humans, buildings). Well-researched
domains of object detection include face detection and vehicle detection. Object
detection has applications in many applications of computer vision, including image
retrieval and video surveillance.
Dept. of CSE, RVCE
Feb-May 2013
Page 3
Introduction
ITAMS
1.1.4 Video Tracking

Video tracking [10] is the task of estimating over time the position of objects of
interest in a video sequence. It has a variety of uses, some of which are: security and
surveillance, human-computer interaction, video communication and compression,
traffic control, medical imaging and video editing.
1.1.5 Feature Extraction

When the input data to an algorithm is too large to be processed and it is suspected to
be notoriously redundant then the input data is transformed into a reduced
representation set of features, also named features vector. Transforming the input data
into the set of features is called feature extraction. The features used in this project are
aspect ratio, height, width, elongation, skewness, compactness, orientation, area,
perimeter and extent.
1.1.6 Object Recognition

Object recognition [11] is a process for identifying a specific object in a digital image
or video. Object recognition algorithms rely on matching or learning algorithms using
appearance-based or feature-based techniques. Object recognition is useful in
applications such airport surveillance, automated vehicle parking systems, and bioimaging.
1.1.7 Supervised Learning

Supervised learning [12] is a subject in which a model is estimated from data for
mapping explanatory variable to predictive variables. Supervised learning takes a
known set of input data and known responses to the data, and seeks to build a
predictor model that generates reasonable predictions for the response to new data. A
supervised learning algorithm analyzes the training data and produces an inferred
function, which can be used for mapping new examples. An optimal scenario will
allow for the algorithm to correctly determine the class labels for unseen instances.
Dept. of CSE, RVCE
Feb-May 2013
Page 4
Introduction
ITAMS
1.1.8 Artificial Neural Network

Artificial Neural Network [13] (ANN) is a computational model, which is based on
Biological Neural Network. To build artificial neural network, artificial neurons, also
called as nodes, are interconnected. The architecture of ANN is very important for
performing a particular computation. Some neurons are arranged to take inputs from
outside environment. These neurons are not connected with each other, so the
arrangement of these neurons is in a layer, called as Input layer. All the neurons of
input layer are producing some output, which is the input to next layer. The
architecture of ANN can be of single layer or multilayer. In a single layer Neural
Network, only one input layer and one output layer is there, while in multilayer neural
network, there can be one or more hidden layer.
1.1.9 Video Indexing

Physical features can be extracted to partition video data into useful footage segments
and store the segment attribute information, or annotations, as indexes. These indexes
should describe the essential information pertaining to the video segments, and should
be content based. Indexes can be visualized through the interface so users can perform
various functions.
1.1.10 Video Retrieval

Video Retrieval refers to the provision of search facilities over archives of digital
video sequences, where these search facilities are based on the outcome of an analysis
of digital video content to extract index able data for the search process.
1.2 Literature Survey

Vehicle surveillance systems incorporate electronic, computer, and communication
technologies into vehicles and roadways for monitoring traffic conditions, reducing
congestion, enhancing mobility, and detecting accidents. To achieve these goals, in past
decades, there have been many approaches proposed for tackling problems related to visual
vehicle surveillance. Among them, the vision based approach has the advantages of easy
Dept. of CSE, RVCE
Feb-May 2013
Page 5
Introduction
ITAMS
maintenance and high flexibility in traffic monitoring and, thus, becomes one of the most
popular techniques used in visual vehicle surveillance for traffic control.
Surveillance and monitoring systems often require on line segmentation of all moving
objects in a video sequence. Background subtraction is a simple approach to detect moving
objects in video sequences. The basic idea is to subtract the current frame from a background
image and to classify each pixel as foreground or background by comparing the difference
with a threshold [14]. Morphological operations followed by a connected component analysis
are used to compute all active regions in the image. In practice, several difficulties arise
during Background Subtraction. To deal with these difficulties several methods have been
proposed in [15]. Most works however rely on statistical models of the background. A
Gaussian Mixture Model [16] may be used to detect objects. Another set of algorithms is
based on spatio-temporal segmentation of the video signal. These methods try to detect
moving regions taking into account not only the temporal evolution of the pixel intensities
and colour but also their spatial properties.
Most commercial surveillance systems rely on background modelling for detection of
moving objects, in particular vehicles. However they fail to handle crowded scenes as
multiple objects close to each other are often merged into a single motion blob.
Environmental factors such as shadow effects, rain, snow, etc. also cause issues for object
segmentation. Various models and methods have been proposed for appearance-based object
detection, in particular vehicle detection. Examples include the seminal work of Viola and
Jones [17] and many extensions using different features, such as edgelets and strip features,
as well as different boosting algorithms like Real Adaboost and GentleBoost. Support vector
machines with histograms of oriented gradients have also been a popular choice for object
detection. In earlier work, Schneiderman et al. in [18], showed good vehicle detection results
using statistical learning of object parts.
A survey of the research in object tracking [19] discusses about new research in the area
of moving object tracking as well as in the field of computer vision. Here research on object
tracking can be classified as point tracking, kernel tracking and contour tracking according to
the representation method of a target object. In point tracking approach, statistical filtering
method has been used to estimating the state of target object. In kernel tracking approach,
various estimating methods are used to find corresponding region to target object. Contour
tracking is based on the fact that when an object arbitrarily deforms, each contour point can
Dept. of CSE, RVCE
Feb-May 2013
Page 6
Introduction
ITAMS
move independently in it. A vehicle tracking system [20] has also been developed to deal
with night-time traffic videos. Kalman filter, a type of kernel tracking, and particle filter, a
type of contour tracking, are the most popular tracking method. Kalman filter uses a series of
measurements observed over time, to produce estimates more accurate as compared to a
single measurement. Particle filter uses a sophisticated model estimation technique based on
simulation. In the proposed system, Kalman Filtering is used for object tracking due to its
predictive nature and its ability to handles occlusions.
Methods for occlusion handling in object detection generally rely on object part
decomposition and modelling. In this project, however, these methods are not well suited due
to the low-resolution vehicle images. Video-based occlusion handling from the tracking
perspective has been addressed by Senior [21], but it assumes objects are initially far apart
before the occlusion occurs. Large-scale learning is an emerging research topic in computer
vision. Recent methods have been proposed to deal with a large number of object classes and
large amounts of data. In contrast, this approach deals with large-scale feature selection,
showing that a huge amount of local descriptors over multiple feature planes coupled with
parallel machine learning algorithms can handle occlusion effectively to an extent. The most
difficult problem associated with vehicle tracking is the occlusion effect among vehicles. In
order to solve this problem an algorithm is developed referred to as spatio-temporal Markov
random field [22], for traffic images at intersections. This algorithm models a tracking
problem by determining the state of each pixel in an image and its transit, and how such
states transit along both the image axes as well as the time axes.
To feed the classifier with information to use for classification, mathematical measures
called features need to be extracted from the objects to be classified. Feature extraction in
terms of supervised learning [23] can be described as, given a set of candidate features, the
selection of a subset of feature most suitable for the classification algorithm to be used.
Features are generally divided with respect to the shape and texture of the object. Shape
features are based on the objects geometry, captured by both the boundary and the interior
region. Texture features on the other hand depend on the grayscale values of the interior.
Some of the latest feature extraction methods include SIFT descriptor [24], SURF descriptor
[25], GLOH features [26] and HOG features [27]. The features used in this project are a
combination of shape based and texture based features. These include area, perimeter, height,
width, orientation, compactness, extent, skewness, elongation and aspect ratio.
Dept. of CSE, RVCE
Feb-May 2013
Page 7
Introduction
ITAMS
A semi real-time vehicle tracking algorithm [28] to determine the speed of the vehicles in
traffic from traffic cam video has also been developed. This method involves object feature
identification, detection, and tracking in multiple video frames. Speed calculations are made
based on the calibrated pixel distances. Optical flow images have been computed and used
for blob analysis to extract features representing moving objects. Some challenges exist in
distinguishing among vehicles in uniform flow of traffic when the object are too close, are in
low contrast with one another, and travel with the same or close to the same speed. In the
absence of a ground truth for the actual speed of the tracked vehicles accuracy cannot be
determined.
Extensive research has also been done to try to address the problem of estimating the
traffic on the road. Most of the methods used for traffic density either used the pre-learnt
models [29] (which are made by training the systems over the pre classified samples and
known shapes of object first before using them for the general classification purposes) or
used the temporal data [30] available overtime to estimate the cars and background. The
problems with these techniques are that they fail to classify if some untrained examples
appear like accidents, change in weather conditions etc. The other works mostly uses the
temporal data available over time through these cameras to estimate the traffic.
A new approach to describe traffic scene, including vehicle collisions and vehicle
anomalies [31] at intersections by video processing and motion statistic techniques has been
developed. Detecting and analysing accident events are done by observing partial vehicle
trajectories and motion characteristics. Hwang et al. in [32] propose a method which
generates and evolves structure of dynamic Bayesian network to deal with uncertainty and
dynamic properties in real world using genetic algorithm. Effectiveness of the generated
structure of dynamic Bayesian network is evaluated in terms of evolution process and the
accuracy in a domain of the traffic accident detection. Jung Lee in [33] considers a video
image detector system using tracking techniques overcoming shadows, occlusions and no
lighting at night. It derives the traffic information, volume count, speeds and occupancy time,
under kaleidoscopic environments, and proposes an accident detection system. A system
which uses a Hidden Markov Model [34] has also been developed. The system learns various
event behaviour patterns of each vehicle in the HMM chains and then, using the output from
the tracking system, identifies current event chains. The current system can recognize
bumping, passing, and jamming. However, by including other event patterns in the training
Dept. of CSE, RVCE
Feb-May 2013
Page 8
Introduction
ITAMS
set, the system can be extended to recognize those other events, e.g., illegal U-turns or
reckless driving. There are two methods implemented for accident detection in this paper.
The first one is based on the accident detection module from [35] with additional features
such as change in bounding box in our method. The second method utilizes a block matching
technique for motion estimation as described in [36]. Here motion vectors are taken into
consideration and depending on their changes, occurrence of an accident is determined.
Through this module, we demonstrate the effectiveness of these two simple methods for
accident detection in traffic video sequences.
An approach for visual detection and attribute-based classification of vehicles in
crowded surveillance scenes is explained in [37]. Large-scale processing is addressed along
two dimensions: 1) large scale indexing, and 2) learning vehicle detectors with large-scale
feature selection, using a feature pool containing millions of feature descriptors. This method
for vehicle detection also explicitly models occlusions and multiple vehicle types (e.g., buses,
trucks, SUVs, cars), while requiring very few manual labelling. Artificial Neural Networks
have been widely used for classification if objects. Bayesian Networks have also been
extensively used for object classification. Some of the most popular classification techniques
for vehicles include SVM [38], Back-propagation and Adaboost [39]. An Adaptive
Hierarchal Multi-class SVM method [6] has also been discussed.
For indexing purposes [40], the vehicles are tracked over time, and each vehicle is given
a unique ID. In addition, the location and the bounding box of each vehicle are output for
each frame. This data format includes the ID, the first frame the vehicle appears, the vehicles
position and size for each frame until it disappears, and its average size and type. These data
of each detected vehicle are saved as text file, and the vehicle itself is stored into metadata
repository with ID as its name. In this proposed approach, a similar approach with additional
parameters such as dimensions of vehicles is used for indexing and retrieval. Each detected
object is given a video ID, a position ID, file ID, type ID and colour ID. These IDs for a
detected object along with its image and dimension are stored in a .mat file for querying
purposes.
After a thorough research, the following algorithms were concluded the best suited for
this project. For object detection the Optical Flow Model [41], Kalman Filtering [42] for
tracking objects including in real time. A comparative analysis of classification algorithms
Multi-class SVM (one vs. all), Adaptive Hierarchal Multi-class SVM and Back-propagation
Dept. of CSE, RVCE
Feb-May 2013
Page 9
Introduction
ITAMS
is presented for categorizing detected objects. The approaches mentioned above have been
incorporated for accident detection and the traffic analysis module.
1.3 Motivation
CCTV (Closed-Circuit Television) cameras are becoming increasingly common and
widespread, due to the increasing traffic in most cities,. Used for traffic management, the
cameras allow operators to monitor traffic conditions visually and the police to detect crimes
such as speeding or banned vehicles. The large number of cameras makes it impractical for
each to be monitored at all times by an operator, and as such it is common for an event of
interest (e.g. an accident) to be neglected. A surveillance monitoring and analysis system
benefits traffic and police authorities in order to serve the society better. For example, if an
accident occurs and is detected, the appropriate authorities can be notified in real time so that
quick action can be taken. Another example of the application is that illegal vehicles on the
road, such as truck in a no truck zone may be detected and can be flagged in the video in real
time. As manual monitoring of multiple CCTV cameras is impractical, it is required to
develop a content based indexing and retrieval system to make this tedious process easy.
With suitable processing and analysis it is possible to extract a lot of useful information on
traffic from the videos, e.g. the number, type, and speed of vehicles using the road. Computer
Vision, being a new and upcoming field in Computer Science, and visual vehicle surveillance
being a relatively new area of exploration, a system using its principles and methods has been
developed. The main goal of this system is to build a coherent traffic analysis and monitoring
system which will contribute to better run city.
1.4 Problem Statement

The requirement is to build a system for road video surveillance, which tackles the
common issues and help provide useful inputs for managing and controlling the traffic. The
main aim of the project is to build an Intelligent Traffic Analysis & Monitoring System. It
should be able to detect and track objects in the given input videos. The detected objects
should also be classified based on their type, using a robust selection of features.
Functionalities related to event detection and traffic analysis like accident detection, speed
estimation, traffic flow direction determination and traffic density estimation also need to be
Dept. of CSE, RVCE
Feb-May 2013
Page 10
Introduction
ITAMS
implemented. The project tries to develop and combine algorithms which lead to an efficient
system in terms of detection, tracking and classification of objects. The system should work
in real time and for prerecorded videos as well. The system should have a querying module
which can extracts objects from a video based on its dimension, colour and type. It should
also be able to show the frame in which a particular queried object appears.
1.5 Objectives
The objectives of this project are as follows:
1. Detect objects in a video.

2. Track objects in a video.
3. Distinguish between humans and vehicles in a video.
4. Amongst determined vehicles, classify them according to their type.
5. In case of vehicles, index them based on their dimensions, type and colour.
6. Classify vehicles based on these parameters.
7. Retrieve vehicles based on these parameters.
8. Detect the occurrence of an accident.
9. Detect speed of the vehicle.
10. Detect the density of traffic.
11. Detect the direction of flow of the traffic.
1.6 Scope
This project presents a novel vehicle surveillance video indexing and retrieval system
based on object type measurement. The system works for real time video sequences, as well
as prerecorded videos. Firstly, all moving objects are detected from videos using Optical
Flow Model, followed by Kalman Filtering for tracking detected objects. Then each moving
object is segmented, and its features are extracted. Both the vehicle image and its features are
stored in the metadata repository. During retrieval, when the user has selected a particular
type of vehicle, the system would return the most qualified vehicles without re-processing the
videos. Video clip which contains the vehicle selected by user is then replayed, and the
trajectory is depicted on the frame simultaneously. Experimental results prove this system is
Dept. of CSE, RVCE
Feb-May 2013
Page 11
Introduction
ITAMS
an effective approach for video surveillance and interactive indexing and retrieval. The
system also denotes the density of traffic, the speed of vehicles, detects accidents, and
determines traffic flow direction in a chosen video, which are most commonly required for
traffic analysis.
1.7 Methodology
The preprocessing part for a surveillance video that is shot boundary detection and key
frame extraction is done to reduce the redundant frames, if necessary. In case the video
duration is less, we proceed to the next step. To detect the objects in the video, the Optical
Flow Model is used. Optical ow is the distribution of apparent velocities of movement of
brightness patterns in an image. Optical ow can arise from the relative motion of objects and
the viewer. Optical Flow reflects the image changes due to motion during a time interval. The
optical flow field represented in the form of Velocity vector consisting of length of the vector
determines the magnitude of velocity and direction of the vector determines the direction of
motion.
After objects are detected, they are tracked using Kalman Filtering. This algorithm uses a
series of measurements of position of the object that has been detected in the frame observed
over time, containing noise and other inaccuracies, and produces estimates of unknown
variables that tend to be more precise than those based on a single measurement alone. Thus,
the position estimate in the next frame is determined. Then, the weights are updated; when
the position in the next frame is known (it becomes present frame). Higher weights are given
to those object tracks with higher certainty of being to that track and vice versa. The
predicted tracks are assigned to the detections using an assignment algorithm called
Hungarian algorithm. Thus the most optimal tracks are obtained. Bounding box and the
trajectory for the objects is drawn.
When the vehicles come closer to the camera, or somewhere midway in case of two way
traffic, the vehicles are extracted. Features are extracted from the detected object, namely
aspect ratio, height, weight, elongation, perimeter, area, compactness, extent, skewness and
orientation. These features are then used to train the Multi-class SVM (one vs. all). Around
1000 samples are taken to train the Multi-class SVM (one vs. all). These samples were
Dept. of CSE, RVCE
Feb-May 2013
Page 12
Introduction
ITAMS
manually labeled into the following classes, Car, Bike, Truck/Bus, Human, Auto & Junk. The
Multi-class SVM (one vs. all) model was trained using Gaussian Radial Basis Function
(RBF) kernel. 500 samples were used for testing the trained classifier.
Adaptive hierarchical Multi-class SVM is used to train and test the samples mentioned
above as well. The training and testing is in the form of a binary tree with RBF kernel being
used.
These results were compared to results obtained, using Back-propagation. The algorithm
used was Levenberg-Marquardt Back-propagation algorithm with an input layer consisting of
10 nodes, hidden layer consisting of 12 nodes and output layer consisting of 6 nodes.
These three classification techniques were compared, and since Multi-Class SVM (one
vs. all) gave better results, it was incorporated into the real time detection, tracking and
classification module. When a video input is given to this module, the features of detected
objects are extracted, and based on these features, and the trained Multi-class SVM (one vs.
all), their classes are predicted as one of the mentioned above.
The traffic analysis module contains the following three functionalities, namely Speed
Estimation, Traffic Flow Direction Determination & Traffic Density Estimation. For speed
estimation in a particular frame, the change in pixels per second of the object between two
consecutive frames (the previous frame and the current frame) is calculated. For traffic flow
estimation, the video frame is divided into blocks. For each block, a motion vector is
calculated and the motion between the video frames is estimated. This estimation is done
using a block matching method by moving a block of pixels over a search region. Traffic
Density is calculated by counting the number of vehicles per frame and using appropriate
threshold values. The density is determined to be low, medium or high.
Accident detection for an object is done through calculating the change in speed, area,
position, size of bounding box and orientation of a particular vehicle across consecutive
frames its present in. Then these features are added and compared with a threshold. If this
value exceeds the threshold, then accident is signaled. It is also calculated by observing the
random change in direction of motion vectors in the video when an accident occurs. The
results for both are compared as well.
Dept. of CSE, RVCE
Feb-May 2013
Page 13
Introduction
ITAMS
1.8 Organization of the Report

To make the understanding of the realised work easier, the report contains eight chapters
which can be explained as follows:
Chapter 1: The introduction, methodology and objectives of this project are described in this
chapter. The difficulty is not only to track and detect the objects, but also to associate the data
to different objects. This includes a number of problems such as classification, validation and
occlusion handling. Complete separate modules for accident detection and traffic analysis are
also being developed.
Chapter 2: This chapter describes software requirements specification, which enlists all
necessary requirements that are required for the project development.
Chapter 3: This chapter explains the high level design of the project. It explains about the
input data, the output and the transformations necessary to visualize the results. This chapter
also describes the system architecture and the data flow diagrams.
Chapter 4: Gives a detailed design and theoretical description of the various algorithms
being used in this project, be it for tracking, detection, classification, etc. respectively.
Chapter 5: This chapter describes the languages and the environment used for developing
and implementing the project. Here an overview of MATLAB is given, and define its coding
standards, its syntaxes and its limitations.
Chapter 6: This chapter has a detailed overview of the various tests performed on the
system.
Chapter 7: In this chapter the results of the different aspects of the system are tabulated and
presented. The errors and the accuracy of the algorithms used are investigated and then the
influences of different parameters are tested.
Chapter 8: Finally this chapter ends the report with a conclusion and future works.
Dept. of CSE, RVCE
Feb-May 2013
Page 14
Chapter 2
Software Requirements Specification

2.1
Overall Description
This section gives a brief description of the proposed system.
2.1.1 Product Perspective

This project presents a novel vehicle surveillance video indexing and retrieval system
based on object type measurement. The system works for real time video sequences, as well
as pre-recorded videos. Firstly, all moving objects are detected from videos using Optical
Flow Model, followed by Kalman Filtering for tracking the detected objects. Then each
moving object is segmented, and its features are extracted. Both the vehicle image and its
features are stored in the metadata repository. During retrieval, when the user selects a
particular vehicle, based on either type or colour, the system would return the most qualified
vehicles without re-processing the videos. Video clip which contains the vehicle selected by
user is then replayed, and the trajectory is depicted on the frame simultaneously.
Experimental results prove this system is an effective approach for video surveillance and
interactive indexing and retrieval. The system also incorporates commonly useful
functionalities such as the classification of vehicles, density of traffic, the speed of vehicles,
detects accidents, and determines traffic flow direction in a chosen video.
2.1.2 Product Functions

In this project, traffic videos which are captured by stationary cameras gazing at an
angle toward the ground plane. The system works for both pre-recorded and real-time videos.
The system is made up of 5 modules.
The system can be divided into 5 modules:
Object Detection and Tracking Module: This module uses the Optical Flow Model to
detect object and Kalman Filtering to track objects. This module works for both real time and
pre-recorded videos.
Page 15
ITAMS
Classification Module: This module classifies the detected object into six classes, namely
car, bike, truck/bus, human, auto and junk, based on the features of the object. This module
uses three algorithms for classification: Multi- class SVM (one vs. all), Adaptive Hierarchal
Multi-class SVM and Back-propagation Algorithm. This module works for both real time and
pre-recorded videos.
Accident Detection Module: This module checks the video for accidents and signals if an
accident occurs. This module implements two approaches for accident detection, one based
on motion vectors of the detected object, and the other based on calculation of overall
accident index for all the detected objects.
Traffic Analysis Module: This module has three functionalities: Traffic Flow Direction
Estimation, Traffic Density Estimation and Vehicle Speed Estimation.
Querying Module: This module can be used to query detected objects based on their
dimension, colour and type. This module also shows the frames in which a particular object
appears.
2.1.3 User Characteristics

The program provides very easy-to-use functions to input the video and does not
expect any extra technical knowledge from the user. A basic understanding of all the options
provided in the program would facilitate user in analysing to the best possible extent. Since it
is a command-driven input it is sufficiently easy for any kind of end user to execute it.
2.1.4 Constraints
1. The software only works for videos captured by stationary cameras.
2. Foreground blobs vary according to the quality of video, not being well defined in
certain cases even after applying morphological operations.
3. Occlusions are partially handled in certain situation, because of the stationary camera,
due to its position; the vehicles do not diverge while they are captured.
4. Managing huge amounts of training data for Multi-class SVM and Back-propagation
is tedious.
5. The height of the camera is required to get real world coordinates for the objects.
Dept. of CSE, RVCE
Feb-May 2013
Page 16
ITAMS
2.1.5 Assumptions and Dependencies

1. The camera for recording videos is stationary.
2. The video contains moving prominent vehicles, which are not heavily occluded.
3. Scale factor is assumed, as the height of the camera is not known.
4. Certain dimensions are approximated based on other dimensions.
2.2
Specific Requirements
2.2.1 Functional Requirements

1. The system handles both real time, and pre-recorded video.
2. The system is able to take snapshots from the video.
3. The system is capable of video playback at 30fps with a resolution of at least
320x240.
4. The system should be compatible with AVI, WMV formats.
2.2.2 Performance Requirements

The device must be running at least on a 2.67 GHz CPU, with 4 GB of RAM, for the
application to run optimally. Power supply (220 Volts A.C.) should be provided to the
system.
2.2.3 Supportability
The supportability or maintainability of the system being built, including the coding
standards, naming conventions, class libraries, maintenance access and maintenance utilities
can be enhanced by implementing it like real world application, with the use of optimum high
end hardware and software. One or two lines of documentation must be provided along with
the functions to indicate what they are trying to achieve. Documentation must be provided for
every module.
2.2.4 Software Requirements

o Windows XP/7/8 OS
o MATLAB 2011a, with Image Processing Toolkit
o SmartDraw
Dept. of CSE, RVCE
Feb-May 2013
Page 17
ITAMS
2.2.5 Hardware Requirements

o Intel Core i5 CPU, M 560 @ 2.67 GHz
o 4GB RAM
o 500GB HDD
o A colour monitor with a minimum resolution of 1024x786 pixels
2.2.6 Design Constraints

1. Efficient usage of memory is required for processing images and videos.
2. As the program needs to be run even on low-end machines the code should be
efficient and optimal with the minimal redundancies.
3. Needless to say, the program is made to be robust and fast.
4. It is assumed that the standard output device, namely the monitor, supports colours.
5. One of the requirement in the file saving and retrieval process is that the required file
is in the current directory.
2.2.7 Interfaces
The interface developed is user friendly and very easy to understand. Even a new user
can easily understand the complete functionalities of the application. A Graphical User
Interface (GUI) is implemented for this purpose.
2.3
Concluding Remarks
This chapter describes software requirements specification, which enlists all necessary
requirements that are required for the project development. It gives a brief module wise
description of the project, as well as specifying the specific software, hardware and design
requirements required for the project.
Dept. of CSE, RVCE
Feb-May 2013
Page 18
Chapter 3
High Level Design

This chapter deals with the design constraints of the system, the architectural strategy
adopted for the system, the system architecture of the system and it also presents the data
flow diagrams for the different modules of the system.
3.1
Design Constraints
This section addresses the issues that need to be discussed or resolved before
attempting to devise a complete design solution.
3.1.1 General Constraints

The following constraints must be kept in mind while developing the code:
1. The system must have MATLAB 2011a or higher with Image Processing Toolbox,
Computer Vision Toolbox & Neural Network Toolbox installed.
2. The program must be robust to handle multiple conditions and sub-conditions.
3. The operating system in use must be Windows 7 (or any equivalent) or higher.
3.1.2 Development Methods

The whole project has been implemented on MATLAB 2011a. The user interface will
be coded in the same using GUIDE (GUI Development Environment) as its core. Different
modules are required for different aspects of the project. Their integration needs to be done in
a manner such that they do not interfere with the other functionalities.
3.2
Architectural Strategies
This section gives a description about the architectural strategies adopted for the
development of this project. It mentions the programming language used, the future plans for
the project, how data is stored and managed and describes the software model of the proposed
system.
Page 19
High Level Design
ITAMS
3.2.1 Programming Language

The programming language plays a major role in the efficiency as well as the future
development of the project. As such, we have chosen MATLAB as the programming
environment to be used. We made a minimalistic use of the in-built functions of MATLAB.
The algorithms for all the parts of the proposed system have been coded manually.
3.2.2 Future Plans

While this project exploits the manipulation of the various parameters, some features
may affect the optimal classification of objects more than others. As part of our future
enhancements, we aim to find these features and optimize them so as to find the most
accurate solution for classification. Furthermore, it would be worthwhile to run this system
with a feed from a greater variety of cameras, as well as using moving cameras. Most likely,
this would aid in complete handling of occlusion and would lead to improved detection and
classification results. Data storage should be as efficient as possible, in spite of having a large
number of training samples.
3.2.3 Data Storage Management

Data storage management is essential for the efficient running of the program. It has
been ensured that all dynamically allocated variables and objects are efficiently cleaned up
and de-allocated. Various techniques must be used to guarantee a certain amount of available
RAM to the program memory space.
3.2.4 Software Model

The described system follows a common data repository model. This model consists
of a set of mechanisms and data structures that allows software to handle data and execute
functions in an effective manner. In this model, there is a central data repository which holds
all the data. This system has a central repository of image and video files as can be seen in
Fig 3.1. The user can choose a video from this repository for processing according to her
needs. There also exist .mat files which contain a video ID, a position ID, file ID, type ID and
colour ID for each detected object. Real-time videos after processing are stored back into the
repository.
Dept. of CSE, RVCE
Feb-May 2013
Page 20
High Level Design
3.3
ITAMS
System Organization
This section gives a description about the organization of the various system
components and the order in which the input to the system is processed. Fig 3.1 demonstrates
the framework for the project.
Figure 3.1 Framework for Traffic Analysis & Monitoring System

The proposed system works for both real time and pre-recorded traffic videos. In case
of pre-recorded videos, if the videos are large, to save time and resources, shots are detected
from it using colour histogram method, which takes the histogram difference of the frames
and computes a histogram. A threshold value is then set. The frames where the value is above
the threshold are identified as shots. There are great redundancies among the frames in the
same shot; therefore, certain frames that best reflect the shot contents are selected as key
frames to succinctly represent the shot. For detection of objects, the Optical Flow Model is
used, which has been found to be efficient in comparison to the Gaussian Mixture Model.
The objects detected in the video are tracked by Kalman Filtering. Objects detected
are classified using Multi-class SVM (one vs. all), Adaptive Hierarchal Multi-class SVM and
Back-propagation, based on a number of features. Initially Multi-class SVM (one vs. all),
Adaptive Hierarchal Multi-class SVM and Back-propagation Algorithm are trained using
samples of the objects to be detected by it. The trained data is then used to predict objects
Dept. of CSE, RVCE
Feb-May 2013
Page 21
High Level Design
ITAMS
under predetermined number of classes, based on type of the object. Certain features such as
dimensions and colour are used for querying vehicles form a given surveillance video.
The system also contains modules for Accident Detection and Traffic Analysis. The
Traffic Analysis module consists of three functionalities, namely Traffic Density
Measurement, Traffic Flow Direction Determination and Vehicle Speed Estimation.
The system contains a Querying module as well. This module can be used to query
detected objects based on their dimension, colour and type. This module also shows the
frames in which a particular object appears.
3.4
Data Flow Diagrams

A data flow diagram (DFD) is a graphical representation of the "flow" of data through
an information system, modelling its process aspects. DFDs can also be used for the
visualization of data processing (structured design). A DFD shows what kinds of information
will be input to and output from the system, where the data will come from and go to, and
where the data will be stored. The DFDs have been drawn using the SmartDraw software.
3.4.1 DFD Level 0

The DFD Level-0 consists of two external entities, the GUI and the Output Video, along with
a process block 1.0, representing the Intelligent Traffic Analysis & Management System as
shown in Fig 3.2. The ITAMS consists of the Tracking, Querying, Accident Detection and
Traffic Analysis modules. The user selects one of these modules and the desired output is
displayed.
Figure 3.2 Data Flow Diagram Level 0

Dept. of CSE, RVCE
Feb-May 2013
Page 22
High Level Design
ITAMS
3.4.2 DFD Level 1

The DFD Level-1 consists of two external entities, the GUI and the Output Video, along with
six process blocks, representing the internal workings of the Intelligent Traffic Analysis &
Management System as shown in Fig 3.3. Initially objects are detected in any input video in
process block 1.1. After this objects are tracked in process block 1.3. From this step, the user
can proceed in one of three ways. She could either choose the accident detection module
(process block 1.2), the traffic analysis module (process block 1.4), or the classification
module (process block 1.6), which is preceded by the feature extraction module (process
block 1.5).
Figure 3.3 Data Flow Diagram Level 1
Dept. of CSE, RVCE
Feb-May 2013
Page 23
High Level Design
ITAMS
3.4.3 DFD Level 2 (Traffic Analysis)

The DFD Level-2 for Traffic Analysis consists of two external entities, the GUI and the
Output Video, along with three process blocks, representing the three functionalities of the
Traffic Analysis Module, namely Speed Estimation (process block 1.4.1), Traffic Flow
Direction Determination (process block 1.4.2) and Traffic Density Estimation (process block
1.4.3) as shown in Fig 3.4. Any one of these functionalities may be chosen by the user, and
the corresponding analysis takes place on the on the selected input video.
Figure 3.4 Data Flow Diagram Level 2 (for Traffic Analysis)
Dept. of CSE, RVCE
Feb-May 2013
Page 24
High Level Design
ITAMS
3.4.4 DFD Level 2 (Object Tracking)

The DFD Level-2 for Object Tracking consists of two external entities, the GUI and the
Output Video, along with five process blocks, representing the working of the Kalman Filter
for Object Tracking shown in Fig 3.5. The prior position of the object is noted (process block
1.3.1). It consists of two operations, predict (process block 1.3.3) and update (process block
1.3.2), wherein, the tracks of objects is predicted in the next frame (process block 1.3.4) and
later the weights is updated for more accurate prediction in the subsequent frames.
Figure 3.5 Data Flow Diagram Level 2 (for Object Tracking)
Dept. of CSE, RVCE
Feb-May 2013
Page 25
High Level Design
ITAMS
3.4.5 DFD Level 2 (Accident Detection)

The DFD Level-2 for Accident Detection consists of two external entities, the GUI and the
Output Video, along with six process blocks, to calculate the overall accident index, and
signal whether an accident has occurred or not as shown in Fig 3.6. After the speed (1.2.1),
area (1.2.2), bounding box size (1.2.3), orientation (1.2.4) and position (1.2.5) of the detected
vehicles have been determined; they are all compared with preset threshold values.
Depending on this comparison, the overall accident index is calculated (process block 1.2.6),
and if it exceeds the total threshold value, then an accident is signaled.
Tracked
Objects
Figure 3.6 Data Flow Diagram Level 2 (for Accident Detection)
3.5
Concluding Remarks
This chapter explains the high level design of the project. It explains about the input
data, the output and the transformations necessary to visualize the results. This chapter also
describes the system architecture and the data flow diagrams.
Dept. of CSE, RVCE
Feb-May 2013
Page 26
Chapter 4
Detailed Design
This chapter describes each system component in detail. The Fig 4.1 shows the
organization of the various system components. First object detection is done using Optical
Flow Model, followed by tracking, which is achieved through Kalman Filtering. After this,
features are extracted, and are used to train the Support Vector Machine and Backpropagation Algorithm, which are used to classify objects into predetermined categories.
Modules for accident detection and traffic analysis are integrated after the features have been
extracted. A module for Querying is present as well.
Figure 4.1 Organization of System Components

Page 27
Detailed Design
4.1
ITAMS
Pre-processing
Pre-processing may be required for pre-recorded video sequences. Shot boundary
detection [16] is performed to extract a scene of interest from a given input video. This is
followed by key frame extraction [16] if the resultant video has redundant frames.
4.1.1 Shot Boundary Detection

A shot is a consecutive sequence of frames captured by a camera action that takes
place between start and stop operations, which mark the shot boundaries. There are strong
content correlations between frames in a shot. Therefore, shots are considered to be the
fundamental units to organize the contents of video sequences and the primitives for higher
level semantic annotation and retrieval tasks. Generally, shot boundaries are classified as cut
in which the transition between successive shots is abrupt and gradual transitions which
include dissolve, fade in, fade out, wipe, etc., stretching over a number of frames. Cut
detection is easier than gradual transition detection. The algorithm implemented in this
project follows a threshold based approach.
The threshold-based approach detects shot boundaries by comparing the measured

pair-wise similarities between frames with a predefined threshold: When a similarity is less
than the threshold, a boundary is detected. The threshold can be global, adaptive, or global
and adaptive combined. The global threshold-based algorithms use the same threshold, which
is generally set empirically, over the whole video, as in. The major limitation of the global
threshold-based algorithms is that local content variations are not effectively incorporated
into the estimation of the global threshold, therefore influencing the boundary detection
accuracy.
4.1.2 Key Frame Extraction

There are great redundancies among the frames in the same shot; therefore, certain
frames that best reflect the shot contents are selected as key frames to succinctly represent the
shot. The extracted key frames should contain as much salient content of the shot as possible
Dept. of CSE, RVCE
Feb-May 2013
Page 28
Detailed Design
ITAMS
and avoid as much redundancy as possible. The implemented algorithm adopts the colour
feature to extract key frames. The following algorithm has been used in this project:
1. Choose the first frame as the standard frame that is used to compare with the
following frames.
2. Get the corresponding pixel value in both frames one by one, and computing their
difference respectively.
3. After finishing 2, add the results in 2 altogether. The sum will be the difference
between these two frames.
4. Finally, if sum is larger than a threshold we set, select frame (1+i) as a key-frame,
then frame (1+i) becomes the standard frame. Redo 1 to 4 until there is no frame can
be captured.
4.2
Optical Flow Model

The Optical Flow Model described in [41] is used for object detection. Optical flow is
the distribution of apparent velocities of movement of brightness patterns in an image.

Optical flow can arise from relative motion of objects and the view. Discontinuities in the
optical flow can help in segmenting images into regions that corresponds to different objects.
To compute the optical flow between two images, the following optical flow
constraint equation is evaluated:
Ixu+Iyv+It=0
(4.2.1)
In equation (4.2.1), the following values are represented:
Ix, Iy and It are the spatiotemporal image brightness derivatives
u is the horizontal optical flow
v is the vertical optical flow
As this equation is under constrained, there are several methods to solve for u and v.
Horn-Schunck Method is used in this project.
Dept. of CSE, RVCE
Feb-May 2013
Page 29
Detailed Design
ITAMS
By assuming that the optical flow is smooth over the entire image, the Horn-Schunck
method computes an estimate of the velocity field.
In Horn-Schunck method, u and v are solved as follows:
1. Compute Ix and Iy using the Sobel convolution kernel: [-1 -2 -1; 0 0 0; 1 2 1], and its
transposed form for each pixel in the first image.
2. Compute It between images 1 and 2 using the [-1 1] kernel.
3. Assume the previous velocity to be 0, and compute the average velocity for each pixel
using [0 1 0; 1 0 1; 0 1 0] as a convolution kernel.
4. Iteratively solve for u and v.
Blob analysis, or Blob detection is to detect and analysis connected region in a frame and
the vehicles are tracked. The optical flow vectors are stored as complex numbers. Their
magnitude squared is computed, which is used for thresholding the frames. Median filtering
is applied to remove speckle noise. Morphological operations are done to remove small
objects and holes. The bounding box is drawn for blobs with extent ratio above 0.4 and those
of suitable size. The motion vectors are drawn for these detected vehicles as well.
4.3
Kalman Filter Tracking

The Kalman filter, also known as linear quadratic estimation (LQE), is an algorithm
that uses a series of measurements observed over time, containing noise (random variations)
and other inaccuracies, and produces estimates of unknown variables that tend to be more
precise than those based on a single measurement alone. The filter is named after Rudolf
(Rudy) E. Klmn, one of the primary developers of its theory [42] [43].
This algorithm that uses a series of measurements of position of the object that has been
detected in the frame observed over time, containing noise and other inaccuracies, and
produces estimates of unknown variables that tend to be more precise than those based on a
single measurement alone. Thus, the position estimate in the next frame is determined. Then,
the weights are updated; when the position in the next frame is known (it becomes present
frame). Higher weights are given to those object tracks with higher certainty of being to that
track and vice versa. The predicted tracks are assigned to the detections using an assignment
Dept. of CSE, RVCE
Feb-May 2013
Page 30
Detailed Design
ITAMS
algorithm called Hungarian algorithm. Thus the most optimal tracks are obtained. Bounding
box and the trajectory for the objects is drawn.
, where p=position and v=velocity
(4.3.1)
( )
(4.3.2)
( )
( )
( )
(4.3.3)
The above equations (4.3.1), (4.3.2) and (4.3.3) are used to estimate the position and
velocity of an object, which are nothing but the equations of kinematics, and they are used for
prediction of the aforementioned values.
The algorithm works in a two-step process. In the prediction step, the Kalman filter
produces estimates of the current state variables, along with their uncertainties. Once the
outcome of the next measurement (necessarily corrupted with some amount of error,
including random noise) is observed, these estimates are updated using a weighted average,
with more weight being given to estimates with higher certainty. Because of the algorithm's
recursive nature, it can run in real time using only the present input measurements and the
previously calculated state; no additional past information is required.
From a theoretical standpoint, the main assumption of the Kalman filter is that the
underlying system is a linear dynamical system and that all error terms and measurements
have a Gaussian distribution (often a multivariate Gaussian distribution).
The weights are calculated from the covariance, a measure of the estimated
uncertainty of the prediction of the system's state. The result of the weighted average is a new
state estimate that lies in between the predicted and measured state, and has a better estimated
uncertainty than either alone. This process is repeated every time step, with the new estimate
and its covariance informing the prediction used in the following iteration. This means that
the Kalman filter works recursively and requires only the last "best guess", rather than the
entire history, of a system's state to calculate a new state.
Dept. of CSE, RVCE
Feb-May 2013
Page 31
Detailed Design
ITAMS
A simple step-by-step guide for Kalman filtering is mentioned below [44] [45]:
1. Building a Model
First we have to check of the Kalman Filtering conditions fit the problem.
The two equations of Kalman Filter are as follows:
xk = Axk-1 + Buk + wk-1
(4.3.4)
zk = Hxk + vk
(4.3.5)
Each xk may be evaluated by using a linear stochastic equation (4.3.4). Any xk is a linear
combination of its previous value plus a control signal uk and a process noise.
The second equation (4.3.5) tells that any measurement value is a linear combination of the
signal value and the measurement noise. They are both considered to be Gaussian. The
process noise and measurement noise are statistically independent.
The entities A, B and H are in general form matrices. While these values may change
between states, most of the time, they are assumed to be constant.
If the problem fits into this model, the only thing left is to estimate the mean and standard
deviation of the noise functions wk-1 and vk. The better noise parameters are estimated, the
better estimates is obtained.
2. Starting the Process
The next step is to determine the necessary parameters and the initial values if the model fits
the Kalman Filter.
There are two distinct set of equations: Time Update (prediction) and Measurement Update
(correction), as presented in Table 4.1. Both equation sets are applied at each kth state.
Dept. of CSE, RVCE
Feb-May 2013
Page 32
Detailed Design
ITAMS
Time Update
Measurement Update
(prediction)
(correction)
(
(
)
)
)
Table 4.1 Equations for Prediction and Correction

R is rather simple to find out, because information about the noise in the environment is
generally available. But finding out Q is not so obvious. To start the process, x0 and P0 are
estimated.
3. Iteration
Here, is the "prior estimate" which in a way means the rough estimate before the
measurement update correction. And also is the "prior error covariance". These "prior"
values are used in the Measurement Update equations.
In Measurement Update equations,
Also,
is determined which is the estimate of x at time k.
is determined which is necessary for the k1 (future) estimate, together with
Kalman Gain (
. The
) evaluated is not needed for the next iteration step. The values evaluated at
Measurement Update stage are also called "posterior" values.

The position of an object is obtained from the centroid of the blob of the objects in a
particular frame in the input video which the detections are obtained using optical flow
method. Thus by using the Kalman Filter Tracking approach, the predicted value for the
position of the vehicle in the particular frame in a video is obtained. The predicted tracks are
assigned to the detections using an assignment algorithm called Hungarian algorithm [46].
The distance method used is the Euclidean Method, to calculate the cost matrix. The cost
value threshold is set to 60 in this approach. If the cost is within this value, then the
prediction holds good, else it is discarded. Each track is assigned a unique colour and is
mapped to their corresponding objects in the video.
Dept. of CSE, RVCE
Feb-May 2013
Page 33
Detailed Design
ITAMS
The detected object is tracked for a set number of frames. In this approach, the value is set
between 4-15. If the object disappears for these numbers of frames, and reappears, then
occlusion is handled appropriately, else if it does not reappear, then the object is assumed to
have gone out of sight.
4.4
Feature Extraction
The essential task for the video processing system will be to take an object region in a
video and classify it, thereby recognizing it. In other words, a collection of classes is
generated, namely junk, bike, truck/bus, car, human and auto, and then the
detected object in a video is taken and it is determined to which, if any, of the classes that
object falls into. Such a mechanism is called a classifier. To feed the classifier information to
use for classification, mathematical measurements (features) from that object are extracted.
When the object is close to the camera, it is captured such that the size of the bounding box is
the largest, so that the object is extracted in whole. The features selected for classification
should be stored in a feature vector. The following features were extracted from detected
objects in this system for classification:
1. Area:
Area is a scalar. It is the actual number of pixels in the region.
2. Extent:
Extent is a scalar. It is the proportion of the pixels in the bounding box. It is computed by
dividing the area of the object (blob) by the area of the bounding box.
3. Perimeter:
Is the vector containing the distance around the boundary of each contiguous region in the
image, where p is the number of regions. Perimeter is computed by calculating the distance
between each adjoining pair of pixels around the border of the region.
4. Aspect Ratio:
The Aspect Ratio is obtained by dividing the width of the bounding box, by the height.
Dept. of CSE, RVCE
Feb-May 2013
Page 34
Detailed Design
ITAMS
5. Height:
It is the height of the bounding box.
6. Width:
It is the width of the bounding box.
Centroid:
Centroid (centre of mass) (xc; yc) of shape Q with area S, is given by equation 4.4.1:
(4.4.1)
Central Moment:
The central moment of order pq for object (region) Q is defined by equation 4.4.2:
where
) (
(4.4.2)
is area of Q (number of pixels in Q), (xc; yc) centroid of Q, and p; q = 0; 1;
7. Compactness:
Is given by equation 4.4.3:
(4.4.3)
8. Elongation:
Also known as elongation or elongatedness, is calculated as shown in equation 4.4.4:
(4.4.4)
9. Orientation:
Orientation of object can be defined as angle between x axis and principal axis, axis around
which the object can be rotated with minimum inertia. It is a Scalar; the angle (in degrees
Dept. of CSE, RVCE
Feb-May 2013
Page 35
Detailed Design
ITAMS
ranging from -90 to 90 degrees) between the x-axis and the major axis of the ellipse that has
the same second-moments as the region. It is given by equation 4.4.5:
(4.4.5)
10. Skewness:
Skewness is a measure of symmetry, or more precisely, the lack of symmetry. A distribution,
or data set, is symmetric if it looks the same to the left and right of the center point. The
skewness for a normal distribution is zero, and any symmetric data should have skewness
near zero. Negative values for the skewness indicate data that are skewed left and positive
values for the skewness indicate data that are skewed right. By skewed left, it is meant that
the left tail is long relative to the right tail. It is given by the equation 4.4.6:
(4.4.6)
l here defines the length as explained above.
4.5
Accident Detection
Having discussed about the advantages of video-based traffic surveillance systems, a
new paradigm can be added to the application of video surveillance systems, if accidents can
be detected at traffic intersections and reported to the concerned authorities so that necessary
action can be taken.
An important stage in automatic vehicle crash monitoring systems is the detection of
vehicles in each video frame and accurately tracking the vehicles across multiple frames.
With such tracking, vehicle information such as speed, change in speed and change in
orientation can be determined to facilitate the process of crash detection. The region of
interest which incorporates the road is taken into consideration.
Dept. of CSE, RVCE
Feb-May 2013
Page 36
Detailed Design
ITAMS
4.5.1 Accident Detection using Overall Accident Index

The detail description of this module is as follows:
1. The first step of the process is the frame extraction step. In this frames are extracted
from the video camera input.
2. The second step of the process is the vehicle detection step. Here the already stored
background frame is subtracted from the input frame to detect the moving regions in
the frame. The difference image is further thresholded to detect the vehicle regions in
the frame. Hence the vehicles in each frame are detected.
3. In the third step low-level features such as area, centroid, orientation, luminance and
colour of the extracted vehicle regions are computed. And also for each of the region
detected in frame at time t, similarity index is computed with all of the regions
detected in frame at time t+1 using human vision based model analysis.
4. In the tracking stage, Euclidean distance is computed between the low-level features
of each vehicle in frame n and all the other vehicles detected in frame n+1. This
Euclidean distance vector is combined with the already computed similarity index for
a particular vehicle region in frame n. Based on the minimum distance between
vehicle regions tracking was done.
5. In the next step, the centroid position of a tracked vehicle in each frame is computed
and based on this information and the frame rate; the speed of the tracked vehicle is
computed in terms of pixels/second.
6. Since the position of the video camera is fixed, the camera parameters such as focal
length, pan and tilt angle of the vehicle remains the constant.
7. From all this information the pixel coordinates of the vehicle in each frame is
converted to real-world coordinates. By this conversion, the speed of the vehicle in
terms of km/hr is computed.
Dept. of CSE, RVCE
Feb-May 2013
Page 37
Detailed Design
ITAMS
8. Based on the velocity information, position and low-level features of the tracked
vehicle suitable thresholds are defined to determine the occurrence of accidents.
The accident detection algorithm is summarized as follows:
1. Speeds of the tracked vehicles are calculated. Refer to equation 4.5.1.
2. Velocity, Area, Position, size of Bounding box and Orientation Indexes are
calculated.
3. Overall Accident Index is calculated using the sum of individual indexes and
occurrence of accident is identified.
( )
))
))
(4.5.1)
The differences in area, position, bounding box, orientation and speed of the object
across consecutive frames above a certain threshold, are used to calculate an index. This
computed value of the feature is compared with a threshold value and an index is set. If the
computed value exceeds the threshold value, then that particular index is set to 1, otherwise 0.
When all 5 indexes mentioned above are calculated, they are summed up and this new value
called the Overall Accident Index is compared against a predefined Accident Threshold. All
these thresholds are set based on the input video. In this system, the Accident Threshold is set
to 3. If it exceeds this value, an accident is signalled.
4.5.2 Accident Detection using block Matching

The Block Matching Algorithm [36] estimates motion between two images or two
video frames using blocks of pixels. The Block Matching block matches the block of pixels
in frame k to a block of pixels in frame k+1 by moving the block of pixels over a search
region.
Dept. of CSE, RVCE
Feb-May 2013
Page 38
Detailed Design
ITAMS
Assuming that the input to the block is frame k. The Block Matching block performs the
following steps:
1. The block subdivides this frame using the values entered for the Block size [height
width] and Overlap [r c] parameters. The Block size and Overlap used in this
implementation is [15 15] and [0 0] respectively.
2. For each subdivision or block in frame k+1, the Block Matching block establishes a
search region based on the value entered for the Maximum displacement [r c]
parameter. Here the Maximum displacement was set to [7 7].
3. The block searches for the new block location using either the Exhaustive or Threestep search method.
In this paper, the search method used is the Exhaustive search method. In the Exhaustive
search method, the block selects the location of the block of pixels in frame k+1 by moving
the block over the search region 1 pixel at a time. This process is computationally expensive
as compared to the Three-step search method.
A Block matching criteria parameter is used to specify how the block measures the
similarity of the block of pixels in frame k to the block of pixels in frame k+1. Block
matching criteria parameter used in this paper the Mean Square Error (MSE), which causes
the Block Matching block to estimate the displacement of the center pixel of the block as the
(d1 d2) values that minimize the following MSE equation 4.5.2:
(
(4.5.2)
In equation 4.5.2, B is an N1xN2 block of pixels, and s(x,y,k) denotes a pixel location at (x,y)
in frame k.
The Velocity output parameter is used to specify the block's output. Here the
parameter was chosen as Horizontal and vertical components in complex form. This makes
the block output the optical flow matrix where each element is of the form u+jv. The real part
of each value is the horizontal velocity component and the imaginary part of each value is the
vertical velocity component. If the direction of motion vectors is random, that is in different
directions then an accident is detected. In other words, the count of the motion vectors in the
Dept. of CSE, RVCE
Feb-May 2013
Page 39
Detailed Design
ITAMS
approximate four different directions is almost the same then, due to a crash, the vectors tend
to point in different directions signalling an accident.
4.6
Traffic Analysis
Video traffic analysis is the capability of automatically analysing traffic videos to
detect and determine properties of traffic such as traffic flow, speed of vehicles and density of
traffic not based on a single image in real time. This section has the following 3 components:
Traffic Flow Direction Estimation
Vehicle Speed Estimation
Traffic Density Estimation
4.6.1 Traffic Flow Direction Estimation

An image (frame) is extracted from the input video, and is divided into blocks as
mentioned in section 4.5.2. The region of interest which incorporates the road is taken into
consideration. To get the direction of traffic flow, motion is computed between two objects
using exhaustive block matching technique. Thus the motion of traffic flow is determined.
Vectors, which are divided into real and imaginary parts, are drawn around the objects. The
vector is clustered into 4 quadrants in the Cartesian coordinate system, based on the direction
of motion. If the real part of the vector is positive and imaginary part of the vector is positive,
or if the real part of the vector is negative and imaginary part of the vector is positive, then
the direction of flow is North to South, otherwise vice versa. If the real part of the vector is
negative and imaginary part of the vector is positive, or if the real part of the vector is
negative and imaginary part of the vector is negative, then the direction of flow is West to
East, otherwise vice versa.
4.6.2 Vehicle Speed Estimation

Speed of a particular vehicle region Xi in frame It is computed using the distance
travelled by the vehicle in frame It+1 and the frame rate of the video from which the image
sequence are extracted. The distance travelled by the vehicle is computed using the centroid
Dept. of CSE, RVCE
Feb-May 2013
Page 40
Detailed Design
ITAMS
position (x,y) of the vehicle in It and It+1. Let Xi denote a particular vehicle detected in It and
Xj denote the same vehicle detected in It+1, assuming the correspondence between the
vehicles is determined using the vehicle tracking step. Speed of a particular vehicle region Xi
is given by equation 4.6.1:
( )
))
))
(4.6.1)
The scale factor that relates pixel distance to real-world distance was approximately
found to be (100pixel 120m). Therefore scale factor =.0.8 is used.
4.6.3 Traffic Density Estimation

Traffic Density is calculated by counting the number of vehicles per frame and using
appropriate threshold values. The density is determined to be either:
1. Low
2. Medium
3. High
4.7
Object Classification
Once the objects are detected and tracked, they are categorised into one of the following
six categories:
1. Car
2. Bike
3. Bus/Truck
4. Human
5. Auto
6. Junk
The algorithms used for classification are Multi-class SVM (one vs. all), Adaptive
Hierarchal Multi-class SVM and Back-propagation Algorithm.
Dept. of CSE, RVCE
Feb-May 2013
Page 41
Detailed Design
ITAMS
4.7.1 Support Vector Machine

Support Vector Machine (SVM) was first heard in 1992, introduced by Boser, Guyon,
and Vapnik in COLT-92 [36] [47]. Support vector machines (SVMs) are a set of related
supervised learning methods used for classification and regression [36] [47]. Support Vector
machines can be defined as systems which use hypothesis space of a linear functions in a
high dimensional feature space, trained with a learning algorithm from optimization theory
that implements a learning bias derived from statistical learning theory. It is being used for
many applications, such as hand writing analysis, face analysis and so forth, especially for
pattern classification and regression based applications. SVM gained popularity due to many
promising features such as better empirical performance.
The goals of SVM are separating the data with hyper plane and extend this to nonlinear boundaries using kernel trick. For calculating the SVM, the goal is to correctly classify
all the data. For mathematical calculations the following equations are used,
[a] If Yi= +1; wxi b 1
[b] If Yi= -1;
wxi + b 1
[c] For all i;
yi (wi + b) 1
In this equation x is a vector point and w is weight and is also a vector. So to separate
the data [a] should always be greater than zero. Among all possible hyper planes, SVM
selects the one where the distance of hyper plane is as large as possible. If the training data is
good and every test vector is located in radius r from training vector. Now if the chosen hyper
plane is located at the farthest possible from the data. This desired hyper plane which
maximizes the margin also bisects the lines between closest points on convex hull of the two
datasets. Thus there is [a], [b] & [c].
SVM can be represented as:
SVM classification, equation 4.7.1:
min f
f,i
2
K
C i
yif(xi) 1 - i, for all i i 0
(4.7.1)
i 1
Dept. of CSE, RVCE
Feb-May 2013
Page 42
Detailed Design
ITAMS
SVM classification, Dual formulation, equation 4.7.2:

l
min i
i
i 1
1
2
y y K(x , x
i 1 j1
0 i C, for all i;
y
i 1
(4.7.2)
Variables i are called slack variables and they measure the error made at point (xi,yi).
Training SVM becomes quite challenging when the number of training points is large.
Kernel: If data is linear, a separating hyper plane may be used to divide the data. However it
is often the case that the data is far from linear and the datasets are inseparable. To allow for
this kernels are used to non-linearly map the input data to a high-dimensional space.
Feature Space: Transforming the data into feature space makes it possible to define a
similarity measure on the basis of the dot product. If the feature space is chosen suitably,
pattern recognition can be easy.
Kernel Functions: The idea of the kernel function is to enable operations to be performed in
the input space rather than the potentially high dimensional feature space. Hence the inner
product does not need to be evaluated in the feature space. The goal of the function is to
perform mapping of the attributes of the input space to the feature space. The kernel function
plays a critical role in SVM and its performance. It is based upon reproducing Kernel Hilbert
Spaces, equation 4.7.3:
(
( ( ) ( ))
(4.7.3)
If K is a symmetric positive definite function, which satisfies Mercers Conditions, then the
kernel represents a legitimate inner product in feature space. The training set is not linearly
separable in an input space. The training set is linearly separable in the feature space. This is
called the Kernel trick. The Kernel trick allows SVMs to form nonlinear boundaries.
The kernel function used in this project is the Gaussian Radial Basis Function, as represented
by equation 4.7.4:
(
Dept. of CSE, RVCE
Feb-May 2013
(4.7.4)
Page 43
Detailed Design
ITAMS
SVM is a useful technique for data classification. Even though its considered that
Neural Networks are easier to use than this, however, sometimes unsatisfactory results are
obtained. A classification task usually involves with training and testing data which consist
of some data instances. Each instance in the training set contains one target values and
several attributes. The goal of SVM is to produce a model which predicts target value of data
instances in the testing set which are given only the attributes.
Classification in SVM is an example of Supervised Learning. Known labels help
indicate whether the system is performing in a right way or not. This information points to a
desired response, validating the accuracy of the system, or be used to help the system learn to
act correctly. A step in SVM classification involves identification as which are intimately
connected to the known classes. This is called feature selection or feature extraction. Feature
selection and SVM classification together have a use even when prediction of unknown
samples is not necessary. They can be used to identify key sets which are involved in
whatever processes distinguish the classes.
The Multi-class SVM (one vs. all) method has been used. Here, the samples are
trained which consists of n classes. It takes the samples of class i as positive samples and the
rest as negative samples, are constructed during training.
The Multi-class SVM method in [26] using a top-down approach for training and
testing. This has been implemented which used a decision tree approach. The dataset is
divided into two clusters using k-means clustering algorithm (k=2). The means of each group
is the input to the clustering algorithm. Using SVM, which is a binary classification
algorithm, it is trained with one cluster given as positive samples and the other as negative
sample. This is continued for each cluster till only one class remains. For testing purpose, a
decision binary tree is built and a top down approach is used. Starting from the root node, it
goes down left or right sub-tree until it reaches a leaf which is the class it belongs to. The
classifier for training has 5 internal nodes.
The major strengths of SVM are the training is relatively easy. Another major
advantage is that it is convex. No local optimal, unlike in neural networks. It scales relatively
well to high dimensional data and the trade-off between classifier complexity and error can
be controlled explicitly. The weakness includes the need for a good kernel function.
Dept. of CSE, RVCE
Feb-May 2013
Page 44
Detailed Design
ITAMS
4.7.2 Back-propagation
Back-propagation [48], an abbreviation for "backward propagation of errors", is a
common method of training artificial neural networks. From a desired output, the network
learns from many inputs, similar to the way a child learns to identify a dog from examples of
dogs. It is a supervised learning method. It requires a dataset of the desired output for many
inputs, making up the training set. It is most useful for feed-forward networks (networks that
have no feedback, or simply, that have no connections that loop). Back-propagation requires
that the activation function used by the artificial neurons (or "nodes") be differentiable.
The sigmoid equation (4.7.5) is what is typically used as a transfer function between
neurons. It is similar to the step function, but is continuous and differentiable. One useful
property of this transfer function is the simplicity of computing its derivative.
( )
(4.7.5)
A neuron is the smallest unit in any neural network. A multi input neuron consist
multiple inputs ( ), which are assigned a weight ( ).
is the output of the multi input
neuron, which is obtained by giving the inputs, along with their respective weights the
function . The following equation (4.7.6) defines the figure 4.2:
(
(4.7.6)
Figure 4.2 Multi Input Neuron
Dept. of CSE, RVCE
Feb-May 2013
Page 45
Detailed Design
ITAMS
a. Notation: Following are the notations used to explain the Back-propagation Algorithm:
- Input to node j of layer l
- Weight from layer l-1 node i to layer l node j

( )
) - Sigmoid Transfer Function
- Bias of node j of layer l
- Output of node j in layer l
- Target value of node j of the output layer
b. Error Calculation:
Given a set of training data points tk and output layer output Ok, the error can be written as
shown in equation 4.7.7:
(4.7.7)
Let the error of the network for a single training iteration is denoted by E. It is required to
calculate
, the rate of change of the error with respect to the given connective
weight, so that it can be minimized. Two cases are considered: The node is an output node, or
it is in a hidden layer.
c. Output layer node:

An output layer node can be defined by the equation 4.7.8:
(4.7.8)
where,
Dept. of CSE, RVCE
)(
Feb-May 2013
Page 46
Detailed Design
ITAMS
d. Hidden layer node:

A hidden layer node can be defined by the equation 4.7.9:
(4.7.9)
e. The Algorithm:
1. Run the network forward with your input data to get the network output
2. For each output node compute as shown in equation 4.7.10:

(
)(
(4.7.10)
3. For each hidden node calculate as shown in equation 4.7.11:

(
(4.7.11)
4. Update the weights and biases as shown in equations 4.7.12 and 4.7.13:
(4.7.12)
(4.7.13)
The Levenberg-Marquardt Back-propagation Method is implemented in this project. The

number of input nodes is 10, number of hidden layer nodes is 12 and number of output nodes
is 6 as shown in Fig 4.3. The number of patterns used is 1016.
Figure 4.3 Parameters for Back-propagation Algorithm

Dept. of CSE, RVCE
Feb-May 2013
Page 47
Detailed Design
ITAMS
The classification result with highest accuracy is taken as the trained classifier for
classifying the moving objects in real time. The objects are captured when close to the
camera and the features mentioned in section 4.4 are calculated passed to the trained
classifier to determine the count of the category it belongs to.
4.8
Estimation of Car Dimensions

To estimate the length and width of the detected cars, their orientation and the width
and height of their bounding boxes were considered. To get the length of the car, the
Pythagoras Theorem was applied as shown in Fig 4.4, and length was obtained by the
division of the height of the bounding box by the cosine if the angle of orientation, as shown
in equation 4.8.1.
Length of Car = Height of its bounding box/cosine(cars orientation)
(4.8.1)
Similarly if the orientation of the car is less than 85 degrees, the width of the
bounding box is considered to be the width of the car, otherwise the width is approximated by
the length of the car.
Actual length
of vehicle if
at an angle
Figure 4.4 Calculation of the length of the car
Dept. of CSE, RVCE
Feb-May 2013
Page 48
Detailed Design
4.9
ITAMS
Estimation of Car Colour

Once the foreground blobs are extracted from the video and are classified as cars,
their colours are estimated based on the following method. The dominant colour for each
vehicle is extracted, allowing the user to search for vehicles based on five coloursblack,
white, red, grey and blue. The dominant colour is computed by initially converting each input
video frame into (hue, saturation, luminance) HSL space from its original RGB space, and
then quantizing the HSL space into six colours. Before this quantization is done, a convex
hull is drawn over the car image and the pixels (extracted blob) falling under it are
considered. This ensures only the pixels within the contour of the object are considered for
colour estimation. Low values of luminance are closer to black and higher values are closer to
white. An approximate hue and saturation value range is associated for the colour required.
Starting with lightness value of the centroid pixel of the blocks, a histogram is computed for
the values using the approximate ranges. The values which satisfy a particular range of
luminance for a colour are considered for comparing with saturation range, and those
satisfying the saturation range of the same colour, are considered for comparison with hue
range of that particular colour. The colour corresponding to the bin which receives the
majority of votes is then assigned as the dominant colour of the object belonging to a
specified track.
4.10 Indexing
Indexing of extracted objects is done in the following manner. Each detected object is
given a video ID, a position ID, file ID, type ID and colour ID. These IDs for a detected
object along with its image and dimension are stored in a .mat file. The frames in which an
object appears in its respective video is also stored. As more and more vehicles keep getting
detected, they are added the .mat file. The Junk class as classified by the algorithm has been
left out and a few cases where occlusion was not handled was manually pruned.
4.11 Querying
The proposed system also has a querying module which can be used to query objects
based on their dimension, colour or type. The .mat file created during the indexing of
Dept. of CSE, RVCE
Feb-May 2013
Page 49
Detailed Design
ITAMS
detected objects is used for querying. The user may view the frames in which a particular
queried object appears.
4.12 Concluding Remarks

This chapter gives a detailed design and theoretical description of the various
algorithms being used in this project, be it for tracking, detection, classification, etc.
respectively. The chapter explains the working of each module implemented in this project in
detail, giving a description about the input data and the resultant output data of each module
as well.
Dept. of CSE, RVCE
Feb-May 2013
Page 50
Chapter 5
Implementation
5.1 Programming Language Selection
In developing any application, the programming language plays a significant role. The
choice of the programming language has to be made based on the requirements of the
application at hand. In this application, MATLAB programming [49] is used as it is very
convenient to develop video processing applications with it, owing to its wide variety of
available toolkits and libraries. However, only basic library functions have been used such as
for arithmetic calculations, viewing and certain library functions from vision and image
processing toolbox. The implementation of algorithms has been done stepwise.
5.2 Platform Selection

The platform selected for the project is Windows 7, as it is a compatible OS for running
MATLAB. Also, the input video for the application is AVI (Audio Video Interleave) format,
which is a multimedia container format introduced by Microsoft in November 1992 as part of
its Video for Windows technology. AVI files can contain both audio and video data in a file
container that allows synchronous audio-with-video playback. AVI format is compatible both
with Windows 7 and MATLAB.
5.3 Code Conventions

5.3.1 Naming Conventions
File names can be whatever the user wants them to be (usually simpler is better though),
with a few exceptions:
1. MATLAB for Windows retains the file naming constraints set by DOS. The following
characters cannot be used in filenames:
a. " / : * < > | ?
Page 51
Implementation
ITAMS
2. It is not allowed to use the name of a reserved word as the name of a file. For
example, while.m is not a valid file name because while is one of MATLAB's
reserved words.
3. When an m-file function is declared, the m-file must be the same name as the function
or MATLAB will not be able to run it. For example, for a function called 'factorial':
a. function Y = factorial(X)
b. it must be saved as "factorial.m" in order to use it.
5.3.2 File Organization

Efficient file organization MATLAB may be done as follows:
1.
Use a separate folder for each project
2.
Write header comments, especially H1
3.
Save frequent console commands as a script
1. Managing Folders:
The Current Folder Browser provides a few features to make managing separate
folders easier. The tree views multiple projects from a root directory. Having a top-down
hierarchical view makes it easy to move files between project directories. The address bar is
used for quickly switching back and forth between project directories. This allows keeping
only one of these folders on the MATLAB search path at the same time. If there is a nested
folder structure of useful functions that needs to be accessed (for example a hierarchical tools
directory), Add with Subfolders from the context menu can be used to quickly add a whole
directory tree to the MATLAB search path.
2. Write Header Comments:
Having comment lines in files enables the functions, scripts, and classes to participate
in functions like help and lookfor. When a directory is supplied to the help function it reads
out a list of functions in that directory. The display may be customized with a Contents.m
file, which can be generated with the Contents Report.
Dept. of CSE, RVCE
Feb-May 2013
Page 52
Implementation
ITAMS
3. Save Console commands as a script:

Several commands in the Command History may be selected to create a script out of.
The method used is as follows:
(1) Delete from the Command History the unwanted commands so that the ones of interest
are left as a contiguous block.
(2) Then create a file in the Editor.
(3) Select in the History those commands.
(4) Drag the whole block into the Editor.
5.3.3 Properties Declaration

1. Rational Operators
MATLAB has all the standard rational operators. It is important to note, however, that unless
told otherwise, all rational operations are done on entire arrays, and use the matrix
definitions.
Add, Subtract, multiply, divide, exponent operators:
%addition
a=1+2
%subtraction
b=2-1
%matrix multiplication
c=a*b
%matrix division (pseudoinverse)
d=a/b
%exponentiation
e=a^b
Equality '==' returns the value "TRUE" (1) if both arguments are equal. This must not be
confused with the assignment operator '=' which assigns a value to a variable.
Dept. of CSE, RVCE
Feb-May 2013
Page 53
Implementation
ITAMS
Greater than, less than and greater than or equal to, less than or equal to are given by >, <, >=,
<= respectively. All of them return a value of true or false.
2. Boolean Operators
The boolean operators are & (boolean AND) | (boolean OR) and ~ (boolean NOT /negation).
A value of zero means false, any non-zero value (usually 1) is considered true.
The negation operation in MATLAB is given by the symbol ~, which turns any FALSE
values into TRUE and vice versa.
The NOT operator has precedence over the AND and OR operators in MATLAB unless the
AND or OR statements are in parenthesis.
3. Declaring Strings
MATLAB can also manipulate strings. They should be enclosed in single quotes:
>> fstring = 'hello'
fstring =
hello
4. Displaying values of String Variables
If it is needed to display the value of a string, the semicolon is omitted as is standard in
MATLAB.
If is needed to display a string in the command window in combination with other text, one
way is to use array notation combined with either the 'display' or the 'disp' function:
>> fstring = 'hello';
>> display( [ fstring 'world'] )
helloworld
MATLAB doesn't put the space in between the two strings. If a space is required, it must be
done explicitly.
Dept. of CSE, RVCE
Feb-May 2013
Page 54
Implementation
ITAMS
5. Comparing Strings
Unlike with rational arrays, strings will not be compared correctly with the relational
operator, because this will treat the string as an array of characters. To get the comparison the
strcmp function is used as follows:
>> string1 = 'a';
>> strcmp(string1, 'a')
ans = 1
>> strcmp(string1, 'A')
ans = 0
6. Anonymous functions
An anonymous function can be created at the command or in a script:
>>f = @(x) 2*x^2-3*x+4;
>>f(3)
ans = 13
To make an anonymous function of multiple variables, use a comma-separated list to declare
the variables:
>>f = @(x,y) 2*x*y;
>>f(2,2)
ans = 8
7. Function Handles
A function handle passes an m-file function into another function. The functionality it offers
is similar to that of function pointers in C++.
To pass an m-file to a function, first the m-file must be written:
function xprime = f(t,x)
xprime = x;
Dept. of CSE, RVCE
Feb-May 2013
Page 55
Implementation
ITAMS
Save it as myfunc.m. To pass this to another function, @ symbol is used as follows:

>> ode45(@myfunc, [0:15], 1)
5.3.4 Class Declarations

Users can create their own MATLAB classes. For example, a class to represent polynomials
could be defined. This class could define the operations typically associated with MATLAB
classes, like addition, subtraction, indexing, displaying in the command window, and so on.
However, these operations would need to perform the equivalent of polynomial addition,
polynomial subtraction, and so on. For example, when two polynomial objects are added:
p1 + p2
the plus operation would know how to add polynomial objects because the polynomial class
defines this operation.
2. MATLAB Classes Key Terms
MATLAB classes use the following words to describe different parts of a class definition and
related concepts.
Class definition Description of what is common to every instance of a class.
Properties Data storage for class instances
Methods Special functions that implement operations that are usually performed
only on instances of the class
Events Messages that are defined by classes and broadcast by class instances when
some specific action occurs
Attributes Values that modify the behavior of properties, methods, events, and
classes
Listeners Objects that respond to a specific event by executing a callback function

when the event notice is broadcast
Objects Instances of classes, which contain actual data values stored in the objects'
properties
Dept. of CSE, RVCE
Feb-May 2013
Page 56
Implementation
ITAMS
Subclasses Classes that are derived from other classes and that inherit the methods,
properties, and events from those classes (subclasses facilitate the reuse of code
defined in the superclass from which they are derived).
Superclasses Classes that are used as a basis for the creation of more specifically
defined classes (i.e., subclasses).
Packages Folders that define a scope for class and function naming
5.3.5 Comments
Comment lines begin with the character '%', and anything after a '%' character is
ignored by the interpreter. The % character itself only tells the interpreter to ignore the
remainder of the same line.
In the MATLAB Editor, commented areas are printed in green by default, so they
should be easy to identify. There are two useful keyboard shortcuts for adding and removing
chunks of comments. Select the code to be commented or uncommented, and then press CtrlR to (-/ for Mac) place one '%' symbol at the beginning of each line and Ctrl-T (-T for
Mac) to do the opposite.
1. Common uses
Comments are useful for explaining what function a certain piece of code performs especially
if the code relies on implicit or subtle assumptions or otherwise perform subtle actions. For
example,
% Calculate average velocity, assuming acceleration is constant
% and a frictionless environment.
force = mass * acceleration
It is common and highly recommended to include as the first lines of text a block of
comments explaining what an M file does and how to use it. MATLAB will output the
comments leading up to the function definition or the first block of comments inside a
function definition when this is typed:
>> help functionname
Dept. of CSE, RVCE
Feb-May 2013
Page 57
Implementation
ITAMS
All of MATLAB's own functions written in MATLAB are documented this way as well.
Comments can also be used to identify authors, references, licenses, and so on. Such text is
often found at the end of an M file though also can be found at the beginning. Finally,
comments can be used to aid in debugging.
5.4
The Computer Vision System Toolbox

Computer Vision System Toolbox provides primitives for the design and simulation
of computer vision and video processing systems. The toolbox includes methods for feature
extraction, motion detection, object detection, object tracking, stereo vision, video
processing, and video analysis. Tools include video file I/O, video display, drawing graphics,
and compositing. Capabilities are provided as MATLAB functions, MATLAB System
objects, and Simulink blocks. For rapid prototyping and embedded system design, the system
toolbox supports fixed-point arithmetic and C code generation.
Key Features:
Feature detection, including FAST, Harris, Shi & Tomasi, SURF, and MSER
detectors.
Feature extraction and putative feature matching.
Object detection and tracking, including Viola-Jones detection and CAMShift

tracking.
Motion estimation, including block matching, optical flow, and template matching.
RANSAC-based estimation of geometric transformations or fundamental matrices.
Video processing, video file I/O, video display, graphic overlays, and compositing.
Block library for use in Simulink.
5.5
The Image Acquisition Toolbox

Image Acquisition Toolbox enables the user to acquire images and video from
cameras and frame grabbers directly into MATLAB. The user can detect hardware
automatically and configure hardware properties. Advanced workflows let the user trigger
acquisition while processing in-the-loop, perform background acquisition, and synchronize
sampling across several multimodal devices. With support for multiple hardware vendors and
Dept. of CSE, RVCE
Feb-May 2013
Page 58
Implementation
ITAMS
industry standards, the user can use imaging devices ranging from inexpensive Web cameras
to high-end scientific and industrial devices that meet low-light, high-speed, and other
challenging requirements.
Key Features:
Support for industry standards, including DCAM, Camera Link, and GigE Vision.
Support for common OS interfaces for webcams, including Direct Show, QuickTime,
and video4linux2.
Support for a range of industrial and scientific hardware vendors.
Multiple acquisition modes and buffer management options.
Synchronization of multimodal acquisition devices with hardware triggering.
Interactive tool for rapid hardware configuration, image acquisition, and live video
previewing.
5.6
Support for C code generation in Simulink.
The Neural Network Toolbox

Neural Network Toolbox provides functions and apps for modeling complex
nonlinear systems that are not easily modeled with a closed-form equation. The toolbox can
be used to design, train, visualize, and simulate neural networks. The user can use Neural
Network Toolbox for applications such as data fitting, pattern recognition, clustering, timeseries prediction, and dynamic system modeling and control.
Key Features:
Supervised networks, including multilayer, radial basis, learning vector quantization

(LVQ), time-delay, nonlinear autoregressive (NARX), and layer-recurrent.
Unsupervised networks, including self-organizing maps and competitive layers.
Apps for data-fitting, pattern recognition, and clustering.
Parallel computing and GPU support for accelerating training (using Parallel
Computing Toolbox).
Preprocessing and postprocessing for improving the efficiency of network training

and assessing network performance.
Dept. of CSE, RVCE
Feb-May 2013
Page 59
Implementation
ITAMS
Modular network representation for managing and visualizing networks of arbitrary

size.
Simulink blocks for building and evaluating neural networks and for control systems
applications.
5.7
GUIs in MATLAB
A MATLAB GUI is a figure window to which the user adds user-operated controls.
The user can select, size, and position these components as the user likes. Using callbacks the
user can make the components do what the user wants when the user clicks or manipulates
them with keystrokes.
The user can build MATLAB GUIs in two ways:
Use GUIDE (GUI Development Environment), an interactive GUI construction kit.
Create code files that generate GUIs as functions or scripts (programmatic GUI
construction).
The first approach starts with a figure that the user populate with components from within
a graphic layout editor. GUIDE creates an associated code file containing callbacks for the
GUI and its components. GUIDE saves both the figure (as a FIG-file) and the code file.
Opening either one also opens the other to run the GUI.
In the second, programmatic GUI-building approach, the user creates a code file that
defines all component properties and behaviors; when a user executes the file, it creates a
figure, populates it with components, and handles user interactions. The figure is not
normally saved between sessions because the code in the file creates a new one each time it
runs.
5.8
Concluding Remarks
This chapter describes the languages and the environment used for developing and
implementing the project. Here an overview of MATLAB is given, and define its coding
standards, its syntaxes and its limitations.
Dept. of CSE, RVCE
Feb-May 2013
Page 60
Chapter 6
SOFTWARE TESTING
The user may select anyone of the modules from the User Interface, and then can
choose an input video for it.
Beforehand, a training set of 1016 samples is prepared using 13 different datasets

(videos). The training data consists of two arrays, one representing the features of the
different detected objects, and the other specifying to which group they belong (car, bike,
truck/bus, human, auto, junk).
The different functionalities that maybe chosen are Speed Estimation, Traffic
Direction Determination, Density Estimation and Classification of the detected objects.
6.1
Testing Environment
Testing was done on a Dell XPS laptop, connected to a power supply (not on battery
power), with the following specifications:

Hardware Specifications:
o Intel Core i7 CPU
o 4.00 GB RAM
o 500GB HDD
Software Specifications:
o Windows 7 OS
o MATLAB 2011a
6.2
Unit Testing
For the unit testing of modules, the videos viptraffic.avi and t.avi were chosen which has a
total 35 cars in them. The videos were separately given as input to each of the modules, and
the results were tabulated as shown below:
Page 61
Software Testing
ITAMS
6.2.1 Detection Module

Object Detection is implemented using Optical Flow method and is tested with the videos
vip.avi and t.avi which have 35 samples in total. The actual output matches with the expected
output. The unit test for the detection module is shown in Table 6.1.
S/No. of Test Case
Name of Test Case
Object Detection Test
Feature being Tested
Object Detection (using Optical Flow Model)
Description
Tests whether objects are detected accurately or not
Sample Input
Traffic video containing 35 objects
Expected Output
35 detections
Actual Output
35 detections
Remarks
Success
Table 6.1 Detection Module Test

The object detection test was a success, with an accuracy of 100%.
6.2.2 Tracking Module

Object Tracking is implemented using Kalman Filter and is tested with the videos vip.avi and
t.avi which have 35 samples in total. The actual output matches with the expected output. The
unit test for the detection module is shown in Table 6.2.
S/No. of Test Case
Name of Test Case
Object Tracking Test
Object Tracking (using Kalman Filtering)
Description
Tests whether objects are tracked accurately or not
Sample Input
Traffic video containing 35 objects
Expected Output
35 objects tracked
Actual Output
35 objects tracked
Remarks
Success
Table 6.2 Tracking Module Test

The object tracking test was a success, with an accuracy of 100%.
Dept. of CSE, RVCE
Feb-May 2013
Page 62
Software Testing
ITAMS
6.2.3 Speed Estimation Module

The Speed estimation test was done on the viptraffic.avi video. Speed of the 10 cars
was calculated as follows, in the unit pixels/second. A scale factor of 0.8 is considered to
convert the speed into km/hr. The results of the test are shown in Table 6.3 and the unit
testing of the module is summarised in Table 6.4.
Pixels/Second
1141.8
873.6
1411.1
265.4
99.3
115.1
115.1
84.8
173.2
142.3
Kilometers/Hour
91.34
72
112.88
21.23
7.94
9.21
9.21
6.78
13.85
11.38
Table 6.3 Speed Estimation Module Results

S/No. of Test Case
Name of Test Case
Speed Estimation Test
Vehicle Speed Estimation
Description
Tests whether speed is estimated for all objects or not
Sample Input
Traffic video containing 10 vehicles
Expected Output
10 speed estimations
Actual Output
10 speed estimations
Remarks
Success
Table 6.4 Speed Estimation Module Test

The speed estimation test was a success, with the actual output matching the expected output.
Dept. of CSE, RVCE
Feb-May 2013
Page 63
Software Testing
ITAMS
6.2.4 Density Estimation Module

The unit test for the density estimation module is shown in Table 6.5.
S/No. of Test Case
Name of Test Case
Density Estimation Test
Traffic Density Estimation
Description
Tests whether traffic density is estimated accurately or not
Sample Input
Traffic video with low traffic density
Expected Output
Low Density
Actual Output
Low Density
Remarks
Success
Table 6.5 Density Estimation Module Test

Traffic density for the input is Low, and that determined by the system is the same.
6.2.5 Traffic Flow Direction Estimation Module

The unit test for the direction estimation module is shown in Table 6.6.
S/No. of Test Case
Name of Test Case
Direction Estimation Test
Traffic Flow Direction Estimation
Description
Tests the traffic flow direction
Sample Input
Traffic video with traffic flow direction from N-S
Expected Output
North to South
Actual Output
North to South
Remarks
Success
Table 6.6 Traffic Direction Estimation Module Test

In the test video, the traffic direction is North to South, and that determined by the
system is the same.
Dept. of CSE, RVCE
Feb-May 2013
Page 64
Software Testing
ITAMS
6.2.6 Classification Module

The unit test for the classification module is shown in Table 6.7. The training set consists of
1016 samples and the testing set consists of 35 samples. An accuracy of 91.42% was
achieved.
S/No. of Test Case
Name of Test Case
Object Classification Test
Object Classification
Description
Classifies the detected objects
Sample Input
Traffic video with 35 objects
Expected Output
35 Accurate Classifications
Actual Output
32 Accurate Classifications
Remarks
Accuracy 91.42%
Table 6.7 Classification Module Test
6.2.7 Accident Detection Module

The video used for this module was a2.avi and a4.avi. Accidents were detected
accurately in both these test videos. The unit test for the accident detection module is shown
in Table 6.8.
S/No. of Test Case
Name of Test Case
Accident Detection Test
Accident Detection
Description
Detects for accidents in the video
Sample Input
Traffic video with an accident
Expected Output
Accident Signalled
Actual Output
Accident Signalled
Remarks
Success
Table 6.8 Accident Detection Module Test

Dept. of CSE, RVCE
Feb-May 2013
Page 65
Software Testing
ITAMS
6.2.8 Colour Estimation Module

The unit test for the colour estimation module is shown in Table 6.9. Colour estimation was
done based on the approach described in section 4.9. An accuracy of 85.71% was achieved.
S/No. of Test Case
Name of Test Case
Colour Estimation Test
Object Colour Estimation
Description
Estimates colour of cars in the video
Sample Input
Traffic video with 35 cars
Expected Output
35 Accurate Estimations
Actual Output
30 Accurate Estimations
Remarks
Accuracy 85.71%
Table 6.9 Colour Estimation Module Test
6.2.9 Summary
A summary of all the unit tests is shown in Table 6.10.
S/no. of Unit Test
Name of Unit Test
Remarks
Detection Test
Accuracy 100%
Tracking Test
Accuracy 100%
Speed Estimation Test
Success
Density Estimation Test
Success
Direction Estimation Test
Success
Classification Test
Accuracy 91.42%
Accident Detection Test
Success
Colour Estimation Test
Accuracy 85.71%
Table 6.10 Summary of the Unit Tests
Dept. of CSE, RVCE
Feb-May 2013
Page 66
Software Testing
6.3
ITAMS
SYSTEM & GUI TESTING
The Intelligent Traffic Analysis & Monitoring System has been abbreviated as
ITAMS.
Figure 6.1 GUI for ITAMS

The home screen of ITAMS is shown in Fig 6.1. The left hand side of the screen has Traffic
Analysis, Accident Detection and Tracking modules. The center of the screen has a window
to display the selected input video. The right hand side of the screen has options for querying
various types of objects detected from the input video.
Figure 6.2 Interface to select a Video File

When the Load Video button is pressed, an interface as shown in Fig 6.2 appears for the
selection of a video.
Dept. of CSE, RVCE
Feb-May 2013
Page 67
Software Testing
ITAMS
Figure 6.3 Selected Video being Played in the GUI

Once a video is selected, it is played in the GUI after the Play button is pressed, as shown in
the Fig 6.3.
Figure 6.4 Traffic Analysis Module (Estimation of Density)

As shown in Fig 6.4, any one option may be chosen from the Traffic Analysis Module with
the help of a radio button. Once the required option is selected, on pressing the Analyze
button, the selected video is played in the GUI, and the appropriate analysis may be observed.
Dept. of CSE, RVCE
Feb-May 2013
Page 68
Software Testing
ITAMS
Figure 6.5 Traffic Analysis Module (Estimation of Direction)

In the Fig 6.5, the user has chosen the Estimate Direction option to analyse the density of
traffic in the selected traffic video.
Figure 6.6 Traffic Analysis Module (Estimation of Speed)

In the Fig 6.6, the user has chosen the Estimate Speed option to analyse the density of traffic
in the selected traffic video.
Dept. of CSE, RVCE
Feb-May 2013
Page 69
Software Testing
ITAMS
Figure 6.7 Accident Detection Module

When the Accident Detection button is pressed, an interface to choose a video, similar to that
shown in Fig 6.2 appears. Once the video is selected, it is analysed for accidents, and if an
accident is detected, it is signalled, as shown in Fig 6.7.
Figure 6.8 Tracking Module

The tracking of the various objects detected in the video is shown in Fig 6.8. The trail of the
tracked object is also highlighted, as shown by the yellow trail of the black car in the above
figure. The count of the object of each type is also displayed.
Dept. of CSE, RVCE
Feb-May 2013
Page 70
Software Testing
ITAMS
Figure 6.9 Interface to select an Image File

When the Load Image button is pressed, an interface as shown in Fig 6.9 appears for the
selection of an image file.
Figure 6.10 Viewing Images

The images in a selected .mat file may be viewed as shown in Fig 6.10. The Previous and
Next Button may be used to cycle through all the images in the .mat file.
Dept. of CSE, RVCE
Feb-May 2013
Page 71
Software Testing
ITAMS
Figure 6.11 Querying Options

For specific querying of objects, the object type and its colour may be chosen. Once the Done
button is pressed, the objects maybe viewed in the panel below the querying options as shown
in Fig 6.11. Similar specified objects may be cycled through using the Previous and Next
buttons. If a car is selected, then its dimensions may be viewed in the dimensions panel.
Figure 6.12 Playing Video of Queried Image

On pressing the Show Frames button, a short video clip is played, only showing those frames
in which the selected object appears, as shown in Fig 6.12.
Dept. of CSE, RVCE
Feb-May 2013
Page 72
Software Testing
6.4
ITAMS
Concluding Remarks
This chapter gives a detailed overview of the various unit tests, system tests and GUI
tests performed on the system, and their resultant outcome.
Dept. of CSE, RVCE
Feb-May 2013
Page 73
Chapter 7
Experimental Analysis and Results

Once the objects have been detected in the input video, their features are extracted
and used for classification. First a training set of 1016 samples is prepared using 13 different
datasets (videos). These datasets have been acquired from some academic institute website.
They have a combined duration of 12 minutes and an average frame rate of 29Hz. The total
number of frames processed was around 20880. The training data consists of two arrays, one
representing the features of the different detected objects, and the other specifying to which
group they belong (car, bike, truck/bus, human, auto, junk). Additionally, a testing set of 500
samples is created using 4 different datasets (videos). These video sequences have a
combined duration of 4 minutes and an average frame rate of 29Hz, containing around 6960
frames. The testing data also consists of two arrays, one representing the features of the
different detected objects, and the other specifying to which group they belong.
The training data, along with the test array representing the features is given as input
to the SVM, Adaptive Hierarchal Multi-class SVM and Back-propagation Algorithm. An
appropriate kernel is selected for the SVM algorithms. In the present work the Gaussian
kernel is selected. The results of the classification are stored in a .mat file, which is later used
to calculate accuracy and precision, using the testing group array for actual values.
7.1
Evaluation Metric
Confusion Matrix was used as the evaluation metric. A confusion matrix is a specific
table layout that allows visualization of the performance of an algorithm. Each column of the
matrix represents the instances in a predicted class, while each row represents the instances in
an actual class. The name stems from the fact that it makes it easy to see if the system is
confusing two classes (i.e. commonly mislabelling one as another).
Page 74
Experimental Analysis and Report
ITAMS
7.1.1 Confusion Matrix for Multi-class SVM (one vs. all)

The confusion matrix for Multi-class SVM (one vs. all) is shown in Table 7.1.
Predicted class
Car
Bike
Bus/Truck Human
Auto
Junk
Car
199
22
Bike
86
Bus/Truck
15
Human
10
Auto
Junk
148
Actual Class
Table 7.1 Confusion Matrix for Multi-class SVM (Accuracy = 92%)
7.1.2 Confusion Matrix for Adaptive Hierarchal Multi-class SVM

The confusion matrix for Adaptive Hierarchal Multi-class SVM is shown in Table
7.2.
Predicted class
Car
Bike
Bus/Truck Human
Auto
Junk
Car
171
12
Bike
82
Bus/Truck
13
Human
Auto
Junk
22
161
Actual Class
Table 7.2 Confusion Matrix for Adaptive Hierarchal SVM (Accuracy = 87.80%)
Dept. of CSE, RVCE
Feb-May 2013
Page 75
ITAMS
7.1.3 Confusion Matrix for Back-propagation

The confusion matrix for Back-propagation is shown in Table 7.3.
Predicted class
Car
Bike
Bus/Truck Human
Auto
Junk
Car
178
28
Bike
75
Bus/Truck
10
Human
Auto
Junk
19
145
Actual Class
Table 7.3 Confusion Matrix for Back-propagation (Accuracy = 82.20%)
7.2
Experimental Dataset
The dataset consists of 17 videos in total, of which 13 are used for creating the
training data, and 4 are used for creating the testing data. The reason for using so many
videos is that they provide features of objects detected from various orientations and during
different lighting conditions. This leads to stronger training of the machine learning
algorithms, and hence leads to better predicted results.
Training Set - 1016 Samples
Testing Set - 500 Samples
Dept. of CSE, RVCE
Feb-May 2013
Page 76
7.3
ITAMS
Performance Analysis
The accuracy and precision for each of the three classification algorithms
implemented in this project, i.e. Multi-class SVM (one vs. all), Adaptive Hierarchal Multiclass SVM and Back-propagation is calculated in this section.
7.3.1 Accuracy
Accuracy is the overall correctness of the model and is calculated as the sum of correct
classifications divided by the total number of classifications. It is described by equation
(7.3.1), and accuracy of the classification algorithms is shown in Table 7.4.
Accuracy = sum of predicted classifications/actual classification
S/No.
1
2
3
Classification Algorithm
Multi-class SVM (one vs. all)
Adaptive Hierarchal Multi-class SVM
Back-propagation
Accuracy
460/500
439/500
410/500
(7.3.1)
Percentage
92%
87.80%
82.20%
Table 7.4 Accuracy of Classification Algorithms
7.3.2 Precision
Precision is a measure of the accuracy provided that a specific class has been predicted. It is
defined by equation (7.3.2), where tp and fp are the numbers of true positive and false positive
predictions for the considered class.
Precision = tp/(tp + fp)
(7.3.2)
In the confusion matrices above, the precision would be calculated as shown in Table 7.5.
PrecisionC
PrecisionB
PrecisionT
PrecisionH
PrecisionA
PrecisionJ
Multi-class SVM
(one vs. all)
96.60%
97.72%
93.75%
90.90%
66.66%
84.90%
Adaptive Hierarchal
Multi-class SVM
83.00%
93.18%
81.25%
81.81%
100%
91.47%
Back-propagation
Algorithm
86.40%
85.22%
60.25%
27.27%
0%
82.38%
Table 7.5 Precision for Classified Objects

Dept. of CSE, RVCE
Feb-May 2013
Page 77
7.4
ITAMS
Inference from the Results

After running the Multi-class SVM (one vs. all), the SVM (using Binary Trees) and
the Back-propagation Algorithm for the same dataset, it was found that the accuracy was
highest for Multi- class SVM (one vs. all), 92%, followed by Adaptive Hierarchal Multi-class
SVM with 88% and for Back-propagation Algorithm, 82%. Hence we infer from the results
that the features used by us for classification are more suitable for Multi-class SVM (one vs.
all).
Dept. of CSE, RVCE
Feb-May 2013
Page 78
Chapter 8
CONCLUSION
The goal of this project is to develop an Intelligent Traffic Analysis & Monitoring
System, which is capable of operating in real-time, as well as with pre-recorded video
sequences, with a good performance rate. The proposed system provides an efficient and
interesting object-based video indexing and retrieval approach. All concepts are implemented
on MATLAB using our own written codes, with only a minimalistic use of in-built functions.
The performance of the detection and tracking algorithms was found to be 100% for
three test videos used for testing. For the three classification algorithms implemented in this
project, it was found that the accuracy was highest for Multi-SVM, 92%, followed by
Adaptive Hierarchal Multi-SVM with 88% and then the Back-propagation Algorithm, 82%.
Classification of objects is done in real time providing the count for each type of object.
The system also has two additional modules for Accident Detection and Traffic
Analysis. The Traffic Analysis module consists of the following functionalities Traffic
Density Measurement, Traffic Flow Direction Determination and Speed Estimation.
For the testing of the accident detection module, two videos were used, and it was
found that the accidents were detected accurately. The traffic analysis module was also
tested, and it was found that the density, traffic flow and speed estimation was accurate. An
algorithm for object colour estimation has also been implemented, which provided an
accuracy of around 85%.
The system also has a Querying module, which is able to query objects based on their
dimension, colour and type. It also has the capability to display the frames in which a
particular queried object appears. The accuracy of this module was found to be 100%.
A graphical user interface that enables fast search and retrieval interesting vehicles
based on colour, size or type, detection, tracking and count of objects for each type in real
time and playback for the video clip which contains the selected vehicle and for the analysis
of the implemented algorithms has also been implemented. The proposed system
demonstrates its strong ability in digital surveillance.
Page 79
Conclusion
8.1
ITAMS
Limitations of the Project
1. The software is not defined for multiple cameras.

2. Program works only for stationary camera videos only.
3. Foreground blobs vary according to the quality of video, not being well defined in
certain cases even after applying morphological operations.
4. Occlusions are partially handled in certain situation, because of the stationary camera,
due to its position, the vehicles do not diverge while they are captured.
5. Managing huge amounts of training data for SVM and Back-propagation is tedious.
6. The height of the camera is required to get real world coordinates for the objects.
8.2
Future Enhancements
While this project exploits the manipulation of the various parameters, some features
may affect the optimal classification of objects more than others. As part of our future
enhancements, we aim to find these features and optimize them so as to find the most
accurate solution for classification.
Furthermore, it would be worthwhile to run this system with a feed from a greater
variety of cameras, as well as using moving cameras. Most likely, this would aid in complete
handling of occlusion and would lead to improved detection and classification results.
Data storage should be as efficient as possible, in spite of having a large number of
training samples.
Dept. of CSE, RVCE
Feb-May 2013
Page 80
References
[1] Arun Hampapur, S3-R1: The IBM Smart Surveillance System-Release 1, IBM T.J.
Watson Research Centre, New York, U.S.A, 2006.
[2] Scott Bradley and Peter DeCoursey, Hitachi Data Systems Solutions for Video
Surveillance, Hitachi Data Systems Corporation, 2011.
[3] Sayanan Sivaraman and Mohan M. Trivedi, "Vehicle Detection by Independent Parts for
Urban Driver Assistance, IEEE Transactions on Intelligent Transportation Systems,
Anchorage, Alaska, U.S.A., 2013.
[4] Sayanan Sivaraman and Mohan M. Trivedi, "Integrated Lane and Vehicle Detection,
Localization, and Tracking: A Synergistic Approach, IEEE Transactions on Intelligent
Transportation Systems, Anchorage, Alaska, U.S.A., 2013.
[5] Eshed Ohn-Bar, Sayanan Sivaraman, and Mohan M. Trivedi, Partially Occluded Vehicle
Recognition and Tracking in 3D, IEEE Intelligent Vehicles Symposium, San Diego,
California, 2013.
[6] Song Liu, Haoran Yi, Liang-Tien Chia, and Deepu Rajan, Adaptive Hierarchical Multiclass SVM Classifier for Texture-based Image Classification, IEEE International
Conference on Multimedia & Expo, NTU, Singapore, 2005.
[7] Linda G. Shapiro and George C. Stockman, Computer Vision, Prentice Hall, 2001,
ISBN 0-13-030796-3.
[8] Alan Hanjalic, Content-based Analysis of Digital Video, Kluwer Academic Publishers,
2004, ISBN 1-4020-8115-4.
[9] Xiaochao Yao, Object Detection and Tracking for Intelligent Vehicle Systems,
University of Michigan-Dearborn Technology and Engineering, 2006.
[10] Emilio Maggio and Andrea Cavallaro, Video Tracking: Theory and Practice, Wiley,
2011, ISBN 978-0-470-74964-7.
[11] M. Bennamoun and G.J. Mamic, Object Recognition: Fundamentals and Case Studies,
Springer-Verlag, 2002, ISBN 1-85233-398-7.
[12] Zheng Rong Yang, Machine Learning Approaches to Bioinformatics, World Scientific
Publishing Co. Pte. Ltd., 2010, ISBN 978-981-4287-30-2.
Page 81
Bibliography
ITAMS
[13] Gaurang Panchal, Amit Gantara, Parth Shah and Devyani Panchal, Determination of OverLearning and Over-Fitting Problem in Backpropagation Neural Network, International Journal on Soft
Computing, Vol. 2(2), 2011.
[14] Kirk James, F. OBrien and David A. Forsyth, Skeletal Parameter Estimation from
Optical Motion Capture Data, University of California, Berkeley, 2008.
[15] J. Komala Lakshmi and M. Punithavalli, A Survey on Performance Evaluation of
Object Detection Techniques in Digital Image Processing, International Journal of
Computer Science, pp. 562-568, Vol. 7(6), 2010.
[16] Shruti V Kamath, Mayank Darbari and Rajashree Shettar, Content Based Indexing and
Retrieval from Vehicle Surveillance Videos Using Gaussian Mixture Method, International
Journal of Computer Engineering & Technology, Vol. 4(1), pp. 420-429, 2013.
[17] Paul Viola and Michael Jones, Robust Real-time Object Detection, International
Workshop on Statistical and Computing Theories of Vision, Vancouver, Canada, 2001.
[18] H. Schneiderman and T. Kanade Object Detection Using the Statistics of Parts,
International Journal of Computer Vision, Vol. 56(3), pp. 151-177, 2004.
[19] Kinjal A Joshi and Darshak G Thakore, A Survey on Moving Object Detection and
Tracking in Video Surveillance System, International Journal of Soft Computing and
Engineering, Vol. 2(3), 2012.
[20] Hsu-Yung Cheng, Po-Yi Liu and Yen-Ju Lai, Vehicle Tracking in Daytime and
Nighttime Traffic Surveillance Videos, 2nd International Conference on Education
Technology and Computer (ICETC), Taiwan, 2010.
[21] Andrew Senior, Arun Hampapur, Ying-li Tian, Lisa Brown, Sharath Pankanti and Ruud
Bolle, Appearance Models for Occlusion Handling, IEEE Workshop on Performance
Evaluation of Tracking and Surveillance, New York, U.S.A., 2001.
[22] Kamijo, S., Matsushita, Y., Ikeuchi, K. and Sakauchi, M., Traffic Monitoring and
Accident Detection at Intersections, IEEE Intelligent Transportation Systems, 2000.
[23] Luis Carlos Molina, Llus Belanche and ngela Nebot, Feature Selection Algorithms:
A Survey and Experimental Evaluation, Universitat Politcnica de Catalunya, Departament
de Llenguatges i Sistemes Informtics, Barcelona, Spain.
[24] David Lowe, "Object recognition from local scale-invariant features," Proceedings of the
International Conference on Computer Vision, Vol. 2, pp. 1150-1157, 1999.
Dept. of CSE, RVCE
Feb-May 2013
Page 82
Bibliography
ITAMS
[25] Herbert Bay, Andreas Ess, Tinne Tuytelaars, Luc Van Gool, "SURF: Speeded Up
Robust Features," Computer Vision and Image Understanding, Vol. 110, No. 3, pp. 346-359,
2008.
[26] Krystian Mikolajczyk and Cordelia Schmid, "A performance evaluation of local
descriptors," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 10(27),
pp. 1615-1630, 2005.
[27] N. Dalal and B. Triggs, Histograms of Oriented Gradients for Human Detection,
Computer Vision and Pattern Recognition, 2005.
[28] Mehrube Mehrubeoglu and Lifford McLauchlan, Determination of vehicle speed in
traffic video, SPIE Digital Library, 2009.
[29] Deng-Yuan Huang, Chao-Ho Chen, Wu-Chih Hu, Shu-Chung Yi and Yu-Feng Lin,
Feature-Based Vehicle Flow Analysis and Measurement for a Real-Time Traffic
Surveillance System, Journal of Information Hiding and Multimedia Signal Processing, Vol.
3(3), 2012.
[30] Erhan Ince, Measuring traffic flow and classifying vehicle types: A surveillance video
based approach, Turkish Journal on Electrical Engineering & Comp Science, Vol. 19(4),
2011.
[31] . Akz and M.E. Karsligil, Severity Detection of Traffic Accidents at Intersections
based on Vehicle Motion Analysis and Multiphase Linear Regression, Annual Conference
on Intelligent Transportation Systems, Madeira Island, Portugal, 2010.
[32] Ju-Won Hwang, Young-Seol Lee and Sung-Bae Cho, Structure Evolution of Dynamic
Bayesian Network for Traffic Accident Detection, IEEE Congress on Evolutionary
Computation, Seoul, Korea, 2011.
[33] In Jung Lee, An Accident Detection System on Highway Using Vehicle Tracking
Trace, IEEE International Conference on ICT Convergence, South Korea, 2011.
[34] Shunsuke Kamijo, Yasuyuki Matsushita, Katsushi Ikeuchi and Masao Sakauchi, Traffic
Monitoring and Accident Detection at Intersections, IEEE Transactions on Intelligent
Transportation Systems, Vol. 1(2), 2000.
[35] Logesh Vasu, An Effective Step to Real-Time Implementation of Accident Detection
System Using Image Processing, Masters Thesis, Oklahoma State University, 2010.
[36] Aroh Barjatya, Block Matching Algorithm for Motion Estimation, DIP Spring Project,
2004.
[37] Feris, R.S., Siddiquie, B., Petterson, J., Yun Zhai, Datta, A., Brown, L.M. and Pankanti,
S. Large-Scale Vehicle Detection, Indexing, and Search in Urban Surveillance Videos,
IEEE Transactions on Multimedia, 2012.
Dept. of CSE, RVCE
Feb-May 2013
Page 83
Bibliography
ITAMS
[38] Corinna Cortes and Vladimir N. Vapnik, Support-Vector Networks, Machine

Learning, 20, 1995.
[39] Freund, Yoav, Schapire and Robert E, A Decision-Theoretic Generalization of on-Line
Learning and an Application to Boosting, 1995.
[40] Chang Liu, George Chen, Yingdong Ma and Xiankai Chen, A System for Indexing and
Retrieving Vehicle Surveillance Videos, 4th International Congress on Image and Signal
Processing, 2011.
[41] Shruti V Kamath, Mayank Darbari and Rajashree Shettar, Content Based Indexing and
Retrieval from Vehicle Surveillance Videos Using Optical Flow Method, International
Journal of Scientific Research, Vol. 2(4), 2013.
[42] R.E. Kalman, A new approach to linear filtering and prediction problems, Journal of
Basic Engineering, Vol. 82(1), pp. 3545, 1960.
[43] R.E. Kalman and R.S. Bucy, New Results in Linear Filtering and Prediction Theory,
Research Institute For Advanced Study, Baltimore, Maryland, 1961.
[44] Greg Welch, Gary Bishop, An Introduction to the Kalman Filter, University of North
Carolina at Chapel Hill, Department of Computer Science, 2001.
[45] M.S. Grewal and A.P. Andrews, Kalman Filtering - Theory and Practice Using
MATLAB, Wiley, 2001.
[46] Andrs Frank, On Kuhn's Hungarian Method A tribute from Hungary, Egervary
Research Group, Budapest, Hungary, 2004.
[47] Vikramaditya Jakkula, Tutorial on SVM, Washington State University, 2006.
[48] Ral Rojas, The backpropagation algorithm of Neural Networks - A Systematic
Introduction, Springer-Verlag, Berlin, 1996.
[49] David Houcque, Introduction to MATLAB for engineering students, Northwestern
University, 2005.
Dept. of CSE, RVCE
Feb-May 2013
Page 84

8th Sem Project ITAMS

Enviado por

Dados do documento

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

8th Sem Project ITAMS

Enviado por

Direitos autorais:

Formatos disponíveis

VISVESVARAYA TECHNOLOGICAL UNIVERSITY

Jnana Sangama, Belgaum-590 014

Intelligent Traffic Analysis & Monitoring

Under the guidance

Department of Computer Science and Engineering,

VISVESVARAYA TECHNOLOGICAL UNIVERSITY, BELGAUM

R.V. COLLEGE OF ENGINEERING,

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

Dr. Rajashree Shettar

Name of the Examiners

Signature with Date

VISVESVARAYA TECHNOLOGICAL UNIVERSITY, BELGAUM

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

Department of Computer Science and Engineering,

Dept. of CSE, RVCE

Dept. of CSE, RVCE

Dept. of CSE, RVCE

Dept. of CSE, RVCE

Dept. of CSE, RVCE

Dept. of CSE, RVCE

Accident detection is achieved by using a cumulative set of features such as

Dept. of CSE, RVCE

1.1.1 Computer Vision

1.1.2 Video Analysis

1.1.3 Object Detection

Dept. of CSE, RVCE

1.1.4 Video Tracking

1.1.5 Feature Extraction

1.1.6 Object Recognition

1.1.7 Supervised Learning

1.1.8 Artificial Neural Network

1.1.9 Video Indexing

1.1.10 Video Retrieval

1.2 Literature Survey

1.4 Problem Statement

1. Detect objects in a video.

1.8 Organization of the Report

Software Requirements Specification

2.1.1 Product Perspective

2.1.2 Product Functions

Software Requirements Specification

2.1.3 User Characteristics

Dept. of CSE, RVCE

Software Requirements Specification

2.1.5 Assumptions and Dependencies

2.2.1 Functional Requirements

2.2.2 Performance Requirements

2.2.4 Software Requirements

Software Requirements Specification

2.2.5 Hardware Requirements

2.2.6 Design Constraints

Dept. of CSE, RVCE

High Level Design

attempting to devise a complete design solution.

3.1.1 General Constraints

3.1.2 Development Methods

High Level Design

3.2.1 Programming Language

3.2.2 Future Plans

3.2.3 Data Storage Management

3.2.4 Software Model

High Level Design

Figure 3.1 Framework for Traffic Analysis & Monitoring System

High Level Design