Escolar Documentos
Profissional Documentos
Cultura Documentos
NUMBER
PLATE
EXTRACTION
CONTENTS
DESCRIPTION PAGE NO
1
ABSTRACT 4
CHAPTER-1 5-6
1. INTRODUCTION 6
CHAPTER-2:
2. BUSINESS PROBLEM AND SOLUTIONS 7-10
2.1 Existing System 8
2.2 Proposed Solution 9
2.3 System Requirements 8
2.4 Functional Requirements 8
2.5 Non-Functional Requirements 9
2.6 User Requirements 9
2.7 Input and Output Requirements 9
2.8 System Architecture 10
CHAPTER-3
3. LITERATURE SURVEY 11-23
3.1 Machine Learning 12-13
3.2 Types of Machine Learning 14
3.3 Testing and Validating 14-15
3.4 Methodology 16-18
3.5 Implementation 19-20
3.5.1 CV2 for Image Classification 21-22
3.5.2 Pytesseract 23
3.5.3 Numpy 23
3.5.4 PIL 23
CHAPTER-4
4. DESIGN 24-29
2
4.1 Introduction to UML 25-29
4.2 UML Diagrams 26-29
4.2.1 Use Case Diagram 26-27
4.2.2 Class Diagram 27-28
4.2.3 Architecture Diagram 29
IMPLEMENTATION 30-36
Output Images
Example 1
Fig 1 : Input Image 32
Example 2
CHAPTER-5
5. CONCLUSION AND FUTURE ENHANCEMENTS 37
REFERENCES 38
ABSTRACT
3
Automatic Number Plate Extraction is an image processing technology that is used to identify
license plates of vehicles. The main objective is to design a machine learning model by which
could extract the vehicle number from the given image. This technology can be mainly used
by the Police department. Some of the main uses are detecting traffic violators and track
vehicle thefts, also authenticate the owner of the vehicle. This can be also used by private and
government organizations near the entry and exit points for security control. This technology
first detects the vehicle and then extracts the number plate, which then will be checked with
the vehicle database and the required information about the vehicle and the vehicle owner can
be obtained.
4
CHAPTER-1
INTRODUCTION
5
Most of the number plate localization algorithms merge several procedures, resulting in long
computational time this may be reduced by applying less and simpler algorithms. The results
are highly dependent on the image quality, since the reliability of the procedures severely
degrades in the case of complex, noisy pictures that contain a lot of details. Unfortunately the
various procedures barely offer remedy for this problem, precise camera adjustment is the
only solution. This means that the car must be photographed in a way that the environment is
excluded as possible and the size of the number plate is as big as possible. Adjustment of the
size is especially difficult in the case of fast cars, since the optimum moment of exposure can
hardly be guaranteed. Number Plate Localization on the Basis of Edge Finding: The
algorithms rely on the observation that number plates usually appear as high contrast areas in
the image (black-and-white or black-and-yellow).
6
CHAPTER-2
7
2.1 Existing System
The present system is brute force method in which a Traffic Constable manually note downs
the license plates, Which is a time taking processes and extra maintenance is required to
safeguard the data.
It may also lead to manual errors while noting down the registration number from license
plates since it is a human work.
We can’t even provide an image as a proof when required to prove the traffic violator as
guilty.
Our system takes an image as an input and process the image for extracting the license plate
by character segmentation using SVM and the output will be 10 characters.
8
Capacity: The ability of the system that tells how good the system can hold lot of input
data.
A dataset which contains up to 20 pictures and with a size of 4x4cm and with a good
resolution at least 420px.
10 characters will be the standard output which can be further used to obtain details of
the vehicle.
9
Step 3 : Extract the feature vector of each normalized candidate
Input Image
Camera
License Plate
Extraction
Character
Segmentation
OCR Character
Recognition
Output
Text
10
CHAPTER-3
LITERATURE SURVEY
11
Machine Learning is the science (and art) of programming computers so they can learn from
data. Here is a slightly more general definition: [Machine Learning is the] field of study that
gives computers the ability to learn without being explicitly programmed.
For example, the spam filter is a Machine Learning program that can learn to flag spam
given examples of spam emails (e.g., flagged by users) and examples of regular (non-spam,
also called "ham") emails. The examples that the system uses to learn are called the training
set. Each training example is called a training instance (or sample). In this case, the task T is
to flag spam for new emails, the experience E is the training data and the performance
measure P needs to be defined; for example, the ratio of correctly classified emails can be
used. This particular performance measure is called accuracy and it is often used in
classification tasks.
1. First look at what spam typically looks like. It can be noticed that some words or phrases
(such as "41J," "credit card," "free," and "amazing") tend to come up a lot in the subject. A
few other patterns in the sender's name, the email's body can be noticed.
2. Write a detection algorithm for each of the patterns identified, and the program would flag
emails as spam if a number of these patterns are detected.
3. Test the program, and repeat steps 1 and 2 until it is good enough.
12
Since the problem is not trivial, the program will likely become a long list of complex rules -
pretty hard to maintain. In contrast, a spam filter based on Machine Learning techniques
automatically learns which words and phrases are good predictors of spam by detecting
unusually frequent patterns of words in the spam examples compared to the ham examples
(Figure I -2). The program is much shorter, easier to maintain, and most likely more accurate.
2. Whether or not they are trained with human supervision (supervised, unsupervised, semi-
supervised, and Reinforcement Learning)
3. Whether or not they can learn incrementally on the fly (online versus batch learning)
4. Whether they work by simply comparing new data points to known data points, or instead
detect patterns in the training data and build a predictive model, much like scientists do
(instance-based versus model-based learning).
13
3.3 Testing and Validating
The only way to know how well a model will generalize to new cases is to actually try it out
on new cases. One way to do that is to put the model in production and monitor how well it
performs. This works well, but if the model is horribly bad, users will complain - not the best
idea.
A better option is to split the data into two sets: the training set and the test set. As these
names imply, train the model using the training set, and test it using the test set. The error rate
on new cases is called the generalization error (or out-of-sample error), and by evaluating the
model on the test set, an estimation of this error can be obtained. This value tells how well the
model will perform on instances it has never seen before.
If the training error is low (i.e., the model makes few mistakes on the training set) but the
generalization error is high, it means that the model is over fitting the training data.
So evaluating a model is simple enough: just use a test set. How to decide between two
models? One option is to train both and compare how well they generalize using the test set.
Now suppose that the linear model generalizes better, but some regularization features must
be applied to avoid over fitting. The question is: how to choose the value of the regularization
hyper parameter? One option is to train 100 different models using 100 different values for
this hyper parameter. Suppose the best hyper parameter value that produces a model with the
lowest generalization error, say just 5o/o error is found and this model is launched into
production, unfortunately it does not perform as well as expected and produces 75% errors.
What just happened? The problem is that the generalization error is measured multiple times
on the test set, and the model and hyper parameters adapted to produce the best model for that
set. This means that the model is unlikely to perform as well on new data. A common solution
to this problem is to have a second holdout set called the validation set. Train multiple models
with various hyper parameters using the training set, select the model and hyper parameters
that perform best on the validation set, and run a single final test against the test set to get an
estimate of the generalization error. To avoid "wasting" too much training data in validation
sets, a common technique is to use cross validation: the training set is split into
complementary subsets, and each model is trained against a different combination of these
subsets and validated against the remaining parts. Once the model type and hyper parameters
have been selected, a final model is trained using these hyper parameters on the full training
set, and the generalized error is measured on the test set.
14
Prepare the Data for Machine Learning Algorithms:
1. It’s time to prepare the data for Machine Learning algorithms. Instead of just doing this
manually, functions should be written to do that, for several good reasons.
2. This will allow reproduction of these transformations easily on any dataset (e.g., the next
time a fresh dataset is obtained).
3. Gradually, a library of transformation functions will be built that can be reused in future
projects.
4. These functions in the live system can be used to transform the new data before feeding it
to algorithms.
5. This will make it possible to easily try various transformations and see which combination
of transformations works best.
3.4 Methodology
15
techniques have to cope with more complex problems and have to face their adaptability
according to human vision. With vision being complex, machine learning has emerged as a
key component of intelligent computer vision programs when adaptation is needed (e.g., face
recognition). With the advent of image datasets and benchmarks, machine learning and image
processing have recently received a 1ot of attention.
The primary purpose of this special issue is to increase the awareness of image processing
researchers to the impact of machine learning algorithms. The special issue discusses
problems and their proposed solutions currently under research by the community.
B) Image Recognition:
An image recognition algorithm takes an image as input and outputs what the image contains.
In other words, the output is a class label. An image recognition algorithm knows the contents
of an image by training the algorithm in order to learn the differences between different
classes. To find number plates in images, an image recognition algorithm with thousands of
images of number plates and thousands of images of backgrounds that do not contain helmets
should be trained. Needless to say, this algorithm can only understand objects / classes it has
learned.
16
Image processing Steps
Original Image
Top-hat image
Threshold image
Box image
17
Dilate image
Segmented image
3.5 Implementation
Data Collection:
Number plate recognition starts with the acquisition of images from an image source,
desirably from a surveillance camera. The image acquisition technique determines the
18
captured image quality of the number plate with which the detection algorithm have to work.
Better the quality of the acquired images, higher the accuracy.A method of preprocessing is to
prepare the image for better feature extraction. This can be considered as a stage to set up the
vehicle picture prepared for Pattern Recognition and Image Processing. The decision of
preprocessing strategy to be received on a vehicle image relies on upon the sort of use for
which the image is being utilized.
Data Pre-Processing:
Data Integration : Data with different representations are put together and conflicts within
the data are resolved.
Data Reduction : This step aims to present a reduced representation of the data in a data
warehouse.
Pre- Processing : The pre-processing is the first step in number plate recognition. It consists
the following major stages: 1.Binarization, 2.Noise Removal.
Binarization : The input image is initially processed to improve its quality and prepare it to
next stages of the system. First, the system will convert RGB images to gray-level images.
19
Noise Removal : In this noise removal stage we are going to remove the noise of the image
i.e., while preserving the sharpness of the image. After the successful Localization of the
Number Plate, we go on with Optical Character Recognition which involves the
Segmentation, Feature extraction and Number plate Recognition.
Accuracy Measures:
Accuracy is a common yet interesting concept used in project management. It is defined as
how the values that are measured are close to the intended target (value) or simply, an
assessment of correctness. Thus, if the measurements are accurate, then the values are close
to the target.
Accuracy is often mistaken for precision. It is important to take note that while accuracy is
being close to the target value, precision is defined as the values of a particular repeated
measurement that is clustered but not necessarily close to the target value. If there is less
scatter with the value, then it is considered precise even if the values are far from the desired
target. While accurate data can be precise, it does not mean the same thing or the other way
around.
Precision:
There are two measurement systems used in project management – accuracy and precision.
While accuracy is defined as the closeness... .There are two measurement systems used in
project management – accuracy and precision. While accuracy is defined as the closeness of
the measured value to a known standard, precision is defined as the measure of exactness.
The level of the increment used in every interval determines the precise measurements. This
means that precision is achieved with the greater number of increments.
Risk Data Quality Assessment : The risk data quality assessment is a project management
technique that is used to evaluate the level or degree the risk data quality assessment is a
project management technique that is used to evaluate the level or degree to which data about
risks is necessary for risk management. This technique also involves analyzing the dress
which the risk is understood. It also looks into the accuracy, reliability, quality and integrity
of the data concerning the risk.
20
The library has more than five hundred optimized algorithms. It is used around the world,
with forty thousand people in the user group. Uses range from interactive art, to mine
inspection, and advanced robotics.
The library is mainly written in C, which makes it portable to some specific platforms such
as Digital Signal Processor. Wrappers for languages such as C, Python, Ruby and Java (using
Java CV) have been developed to encourage adoption by a wider audience.
The recent releases have interfaces for C++. It focuses mainly on real-time image processing.
OpenCV is a cross-platform library, which can run on Linux, Mac OS and Windows. To date,
OpenCV is the best open source computer vision library that developers and researchers can
think of.
With OpenCV 3.3, we can utilize pre-trained networks with popular deep learning
frameworks. The fact that they are pre-trained implies that we don’t need to spend many
hours training the network — rather we can complete a forward pass and utilize the output to
make a decision within our application.
OpenCV does not (and does not intend to be) to be a tool for training networks — there are
already great frameworks available for that purpose. Since a network (such as a CNN) can be
used as a classifier, it makes logical sense that OpenCV has a Deep Learning module that we
can leverage easily within the OpenCV ecosystem.
Aleksandr Rybnikov, the main contributor for this module, has ambitious plans for this
module so be sure to stay on the lookout and read his release notes (in Russian, so make sure
you have Google Translation enabled in your browser if Russian is not your native language).
It’s my opinion that the dnn module will have a big impact on the OpenCV community, so
let’s get the word out.
21
Configure your machine with OpenCV 3.3
Installing OpenCV 3.3 is on par with installing other versions. The same install tutorials can
be utilized — just make sure you download and use the correct release.
Simply follow these instructions for MacOS or Ubuntu while making sure to use
the opencvand opencv_contrib releases for OpenCV 3.3. If you opt for the MacOS +
homebrew install instructions, be sure to use the --HEAD switch (among the others
mentioned) to get the bleeding edge version of OpenCV.
If you’re using virtual environments (highly recommended), you can easily install OpenCV
3.3 alongside a previous version. Just create a brand new virtual environment (and name it
appropriately) as you follow the tutorial corresponding to your system.
Keras is currently not supported (since Keras is actually a wrapper around backends such as
TensorFlow and Theano), although I imagine it’s only a matter of time until Keras is directly
supported given the popularity of the deep learning library.
3.5.2 Pytesseract
22
Python-tesseract is an optical character recognition (OCR) tool for python. That is, it will
recognize and "read" the text embedded in images.
3.5.3 Numpy
Numpy is the core library for scientific computing in Python. It provides a high-performance
multidimensional array object, and tools for working with these arrays.
Arrays
A numpy array is a grid of values, all of the same type, and is indexed by a tuple of
nonnegative integers. The number of dimensions is the rank of the array; the shape of an
array is a tuple of integers giving the size of the array along each dimension.
3.6 PIL
The Python Imaging Library supports a wide variety of image file formats. To read files from
disk, use the open() function in the Image module. You don’t have to know the file format to
open a file. The library automatically determines the format based on the contents of the file.
To save a file, use the save() method of the Image class. When saving files, the name
becomes important. Unless you specify the format, the library uses the filename extension to
discover which file storage format to use.
23
CHAPTER-4
24
DESIGN
4.1 Introduction to UML
The Unified Modeling Language allows the software engineer to express an analysis model
using the modeling notation that is governed by a set of syntactic, semantic and pragmatic
rules.
A UML system is represented using five different views that describe the system from
distinctly different perspective. Each view is defined by a set of diagram, which is as follows:
2. The analysis representation describes a usage scenario from the end-users' perspective.
25
In this view, the structural and behavioral as parts of the system are represented as they are to
be built.
Use case diagrams are used to gather the requirements of a system including internal and
external influences. These requirements are mostly design requirements. So when a system is
analysed to gather its functionalities use cases are prepared and actors are identified.
26
Fig: Use Case Diagram
Our software system can be used to support library environment to create a Digital Library
where several licence plate images are converted into electronic-form for accessing by the
users. For this purpose the printed plates must be recognized before they are converted into
electronic-form. The resulting electronic-documents are accessed by the users like police and
general public for reading and getting information.
The class diagram describes the attributes and operations of a class and also the constraints
imposed on the system. The class diagrams are widely used in the modeling of object
oriented systems because they are the only UML diagrams which can be mapped directly
with object oriented languages. The class diagram shows a collection of classes, interfaces,
associations, collaborations and constraints. It is also known as a structural diagram.
27
The UML diagrams like activity diagram, sequence diagram can only give the sequence flow
of the application but class diagram is a bit different. So it is the most popular UML diagram
in the coder community.
The class diagram gives a clear picture of all the processes involved in the background in
order to carry out the recognition process. It shows all the classes that happens in the
background and as well gives a clear relationship on how they relates with one another to
help recognize the characters in the plates at the end of the day. The class diagram contains of
all the attributes involved in each class or method. It also gives a high clear idea towards the
entire processing of the image, how the image is being processes to cater for recognizing the
characters.
28
4.2.3 Architecture Diagram
Software architecture involves the high level structure of software system abstraction, by
using decomposition and composition, with architectural style and quality attributes. A
software architecture design must conform to the major functionality and performance
requirements of the system, as well as satisfy the non-functional requirements such as
reliability, scalability, portability, and availability.
29
IMPLEMENTATION
Code Snippets:
Import Statements:-
import PIL
import os
Re-sizing Images:-
outPath = r"N:\ML\ANPR\RDS" #Resized Images Folder
Scaling:-
Output:
31
Fig 2 : Processed Image
32
Fig 4 : Input Image
33
Fig 6 : Output Extracted Number Plate
Performance measures:
Many metrics can be used to measure whether or not a program is learning to perform its task
more effectively. For supervised learning problems, many performance metrics measure the
number of prediction errors. There are two fundamental causes of prediction error: a model's
bias and its variance. Assume that there are many training sets that are all unique, but equally
representative of the population. A model with a high bias will produce similar errors for an
input regardless of the training set it was trained with; the model biases its own assumptions
about the real relationship over the relationship demonstrated in the training data. A model
with high variance, conversely, will produce different errors for an input depending on the
training set that it was trained with. A model with high bias is in exible, but a model with high
variance may be so exible that it models the noise in the training set. That is, a model with
high variance over-ts the training data, while a model with high bias under-ts the training
data. It can be helpful to visualize bias and variance as darts thrown at a dartboard. Each dart
is analogous to a prediction from a different dataset. A model with high bias but low variance
will throw darts that are far from the bull's eye, but tightly clustered. A model with high bias
and high variance will throw darts all over the board; the darts are far from the bull's eye and
each other.
Accuracy Measures:
34
Accuracy is a common yet interesting concept used in project management. It is defined as
how the values that are measured are close to the intended target (value) or simply, an
assessment of correctness. Thus, if the measurements are accurate, then the values are close
to the target.
Accuracy is often mistaken for precision. It is important to take note that while accuracy is
being close to the target value, precision is defined as the values of a particular repeated
measurement that is clustered but not necessarily close to the target value. If there is less
scatter with the value, then it is considered precise even if the values are far from the desired
target. While accurate data can be precise, it does not mean the same thing or the other way
around.
Precision:
There are two measurement systems used in project management – accuracy and precision.
While accuracy is defined as the closeness... .
There are two measurement systems used in project management – accuracy and precision.
While accuracy is defined as the closeness of the measured value to a known standard,
precision is defined as the measure of exactness. The level of the increment used in every
interval determines the precise measurements. This means that precision is achieved with the
greater number of increments.
Risk Data Quality Assessment: The risk data quality assessment is a project management
technique that is used to evaluate the level or degree the risk data quality assessment is a
project management technique that is used to evaluate the level or degree to which data about
risks is necessary for risk management. This technique also involves analyzing the dress
which the risk is understood. It also looks into the accuracy, reliability, quality and integrity
of the data concerning the risk.
Limitations:
35
The limitations of the study are those characteristics of design or methodology that impacted
or influenced the interpretation of the findings from your research. They are the constraints
on generalizability, applications to practice, and/or utility of findings that are the result of the
ways in which you initially chose to design the study and/or the method used to establish
internal and external validity.
* Any blur image which can't be properly gray-scaled hence can't detect the license plate.
* Any low resolution image can't be re-sized because the resolution decreases more.
36
CHAPTER-5
This project presents a recognition method in which the vehicle plate image is obtained by
the digital cameras and the image is processed to get the number plate information. A rear
image of a vehicle is captured and processed using various algorithms. Further we are
planning to study about the characteristics involved with the automatic number plate system
for better performance.
ANPR can be further exploited for vehicle owner identification, vehicle model
identification traffic control, vehicle speed control and vehicle location Tracking.
37
Most of the ANPR focus on processing one vehicle number plate but in real-time there can be
more than one vehicle number plates while the images are being captured.
In [5] multiple vehicle number plate images are considered for ANPR while in most of other
systems offline images of vehicle, taken from online database such as [78] are given as input
to ANPR so the exact results may deviate from the results shown in Table 1 and Table 2.
To segment multiple vehicle number plates a coarse-to-fine strategy could be helpful. And
finally adding more and more extra features.
References
1) https://www.semanticscholar.org/paper/Survey-on-Automatic-Number-Plate-
Recognition-(ANR)-Sonavane-Soni/044beb18e386503aa5551fa73169ea432062f1b5
2) https://www.irjet.net/archives/V4/i4/IRJET
3) https://ieeexplore.ieee.org/abstract/document
4) http://ictactjournals.in/paper/IJIVP
38