Escolar Documentos
Profissional Documentos
Cultura Documentos
e-Yantra
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Vision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1
2.2
Eyes vs Webcam
10
2.3
Models of Vision
10
2.4
Gestalt Principles
11
2.4.1
2.4.2
2.4.3
2.4.4
2.4.5
2.4.6
Proximity . . . . . . .
Similarity . . . . . . .
Closure . . . . . . . .
Common Motion
Symmetry . . . . . .
Continuity . . . . .
Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.1
Introduction
15
3.2
Installation
15
3.3
Python
16
3.4
Numpy
16
3.5
OpenCV
16
3.6
Debugging
17
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
11
11
11
11
12
12
Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.1
Representation
19
4.2
Properties
19
II
Fundamental Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5.1
Structure of a Program
23
5.2
23
5.3
25
5.4
26
5.5
Experiments
26
5.6
Debugging
27
6.1
Colourspaces
29
6.2
Thresholding
30
6.3
31
6.4
Experiments
31
6.5
Debugging
31
User Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
7.1
Shapes
33
7.2
Buttons
34
7.3
Experiments
34
7.4
Debugging
34
Drawing on air . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
8.1
System
35
8.2
Light Pen
35
8.3
Steps
36
8.3.1
8.3.2
8.3.3
8.3.4
8.3.5
8.3.6
8.3.7
8.3.8
Step 1
Step 2
Step 3
Step 4
Step 5
Step 6
Step 7
Step 8
8.4
Experiments
46
8.5
Debugging
46
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
36
37
38
39
40
41
43
45
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
Books
47
Articles
47
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Vision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1
2.2
2.3
2.4
Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.1
3.2
3.3
3.4
3.5
3.6
Introduction
Installation
Python
Numpy
OpenCV
Debugging
Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.1
4.2
Representation
Properties
1. Introduction
Computer Vision is simply the pursuit of teaching a machine to see as humans do. While the task
seems to be trivial at the outset, given that we have made so many advances in computing. However,
the problem lies in the fact that while we humans certainly use our vision to understand the world
around us, the exact way in which we make inferences from the images we see is still not very clear.
For example, let me ask you to take the trouble to pick up a pencil and draw a cube.
Chapter 1. Introduction
10
2. Vision
2.1
12
2.2
Chapter 2. Vision
Eyes vs Webcam
Since our goal is to emulate human vision using machines, we must understand the differences
between a machine and a human. Therefore, let us take our systems to be a human and a machine.
We must compare the three parts of the two systems.
From the previous description, it is clear that the sensor in the case of a human is the human
eye, and in the case of a machine, it is the camera or as is colloquially called, the webcam.
resolution
receptors
focus
binocular/stereo
The difference in algorithm cannot be stated, as we do not know completely how humans see,
but computer vision is progressing along many paths, a few broad fields of which are listed
here. The classification is not mutually exclusive, and many of these fields share overlaps in
the paths mentioned.
Mapping, Localization, Depth maps
Object detection
Object identification
Image/scene retrieval
Augmented Reality
Image segmentation
Scene Understanding
Action segmentation and recognition
Feature detectors
Since we can define the output loosely to be the kinds of inferences that can be drawn from
an image/video. For example, given a class, to count the number of students attending, or
to construct a 3D map of the seen world, or identifying an object in the scene. In such a
comparizon, if you will excuse blatant arrogance, we may say that the outputs of machine
vision are often good for specific applications, but no general purpose system exists that
makes all the above inferences in a manner comparable to human vision. However, it is not
only a guess, but also a hope that the future might tip the scale.
2.3
Models of Vision
In order to understand how we have arrived at present day algorithms for computer vision, we
must first know how human vision works. The model that we have today is vastly different from
those we constructed earlier, and a reading of these will help us understand both the strengths and
weaknesses that these models offer.
Emission Theory - Eye emits stuff that interacts with the outer world, perhaps modeled on
the sense of touch.
Intro-mission - Stuff representative of objects enters the eye, perhaps modeled on the sense
of smell.
Unconscious inference (Helmholtz) - Vision is learned from past experience
Gestalt theory - Visual system automatically groups elements into patterns:
Proximity
Similarity
Closure
Symmetry
Common Motion
Continuity
13
Computational Models - Based upon study of brain functions, uses methods such as machine
learning, neural networks, etc.
2.4
2.4.1
Gestalt Principles
Proximity
This image indicates the Law of Proximity. The birds flock closely together, causing viewers to
perceive them as a group.
2.4.2
Similarity
This image indicates the Law of Similarity. Even though the blue shapes are arranged uniformly,
the triangle made up of blue circles stands out from the rest of the figure.
2.4.3
Closure
This image indicates the Law of Closure. In this very famous logo of WWF, we can see the panda
clearly, even though there are no outlines to specify the head or the body.
2.4.4
Common Motion
This image indicates the Law of Common Motion.
Chapter 2. Vision
14
Symmetry
This image indicates the Law of Symmetry. Earlier, we discussed that objects that are close by
are grouped together, but here, we see three sets of paranthesis, even though the different types of
brackets are closer than the matching sets.
2.4.6
Continuity
This image indicates the Law of Continuity. Even though, by the law of similarity, we should see
two bent curves touching, we instead see two smooth curves intersecting.
Therefore, we can see that these Gestalt Laws can be used as a foundation upon which we can
build ways to see and understand objects in an image. However, note that they do not point to the
same inference, but instead these laws sometimes compete to form different interpretations of the
same image.
15
3. Software
3.1
Introduction
In the previous chapter, we saw how the webcam is the Sense part of the system, and it retrieves an
image for us to analyze. This does not define what exactly an image is. Therefore, in order to define
an image, we must first learn how the image is represented. There are many ways to represent an
image. For example, the representation of an image taken by an MRI may be different from the
one taken by the webcam. Since we will be using the webcam, we will use a representation that is
intuitive, widely-used and pliable : a matrix of numbers. In order to elaborate on this representation,
let us first inspect how the webcam senses/captures the image. A webcam has an array of cells
which are arranged in sets of 3 ( refer to fig. xx )1 . This choice of three is modeled on the human
eye, which can loosely be said to be sensitive to 3 colours, red, green and blue. Therefore, each
cell on the array of the webcam has a receptor for blue light, green light and red light. These are
then given a value from 0 to 255 based on the amount of that particular light falling on the cell2 .
Therefore, the matrix of numbers that we spoke about is an n*m*3 dimensioned array, with n being
the height, m being the width, and 3 for an 8 bit value corresponding to blue, red and green ( BGR
). Please refer to fig. xx in order to get a better picture. We will be using the library called OpenCV
and the language Python in order to obtain and process the above matrix in the rest of the tutorial.
3.2
Installation
In this part, we will install the following three softwares:
Python
Numpy
OpenCV
1 This is a simplification.
In commercial CCDs, each group consists of four pixels, one red, one blue and two green(the
human eye is more sensitive to green than either red or blue).
2 Assuming the colour is represented in 8 bits
Chapter 3. Software
18
3.3
Python
Please follow the steps given below: Download Python from the following link: https://www.
python.org/ftp/python/2.7.6/python-2.7.6.msi Double-click the downloaded file to commence installation In order to configure the environment variables,
Right-click on My Computer
Click on Properties
Click on Advanced System Settings to open the System Properties dialog box
Under System Properties, select the Advanced tab
Click on Environment Variables
Under System Variables, search for the variable Path
Add C:/Python27;C:/Python27/Scripts; at the start of the textbox
Click on OK
In order to verify your installation,
Open Command Prompt and type python and press enter
You should see the following prompt (fig 3.1):
3.4
Numpy
Download Numpy from the following link: http://sourceforge.net/projects/numpy/
files/NumPy/1.7.1/numpy-1.7.1-win32-superpack-python2.7.exe/download
Double-click the downloaded file to commence installation
In order to verify your installation,
Open Command Prompt and type python and press Enter
At the python prompt, type import numpy and press Enter
You should see the following prompt (fig 3.2):
3.5
OpenCV
Download OpenCV from the following link: http://sourceforge.net/projects/opencvlibrary/
files/opencv-win/2.4.9/opencv-2.4.9.exe/download
Double-click the downloaded file to commence installation
Navigate to the folder opencv/build/python/2.7
Copy the file cv2.pyd to C:/Python27/lib/site-packages
In order to verify your installation,
3.6 Debugging
19
3.6
Debugging
4. Images
4.1
Representation
From the earlier section, we learned that one way to represent images is using an n*m*3 dimensioned matrix to represent the colours blue, breen and red. However, there are other representations
as well, which are defined in OpenCV. A partial list is as follows and will be elaborated upon in the
chapter xx
BGR
RGB
Grayscale
Binary
HSV
YUV
4.2
Properties
The properties of such representations of images are as follows
Width - This is the number of rows in the image matrix
Height - This is the number of columns in the image matrix
Channels - This determines the kind of information stored in each pixel of the image matrix.
For example, a BGR image has 3 channels, blue, green and red, whereas a grayscale image
has only one channel, the grayness value.
Depth - This is the type of information of each pixel of the image matrix. For example, it
could be 8 bits, therefore have a value from 0 to 255, or it could be 16 bits, which would
- 1)
cover the numbers from 0 to (216
II
Fundamental Programs . . . . . . . . . . . . . . 23
5.1
5.2
5.3
5.4
5.5
5.6
Structure of a Program
Image from file
Image from Camera
Video from Camera
Experiments
Debugging
6.1
6.2
6.3
6.4
6.5
Colourspaces
Thresholding
Zooming, Rotating and Panning
Experiments
Debugging
User Interfaces . . . . . . . . . . . . . . . . . . . . . . 33
7.1
7.2
7.3
7.4
Shapes
Buttons
Experiments
Debugging
Drawing on air . . . . . . . . . . . . . . . . . . . . . . 35
8.1
8.2
8.3
8.4
8.5
System
Light Pen
Steps
Experiments
Debugging
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . 47
Books
Articles
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5. Fundamental Programs
5.1
Structure of a Program
The structure of a program can be derived from the Sense-Analyze-Control model of the program.
Every program that we henceforth write will largely have the same structure, as represented in fig.
xx below
5.2
26
# ###########################################
# # I m p o r t OpenCV
i m p o r t numpy
i m p o r t cv2
# ###########################################
# ###########################################
# # Read t h e image
img = cv2 . imread ( ' g a n d a l f . j p e g ' )
# ###########################################
# ###########################################
# # Do t h e p r o c e s s i n g
# Nothing
# ###########################################
# ###########################################
# # Show t h e image
cv2 . imshow ( ' image ' , img )
# ###########################################
# ###########################################
# # C l o s e and e x i t
cv2 . waitKey ( 0 )
cv2 . destroyAllWindows ( )
# ###########################################
- - - - - - - - - - - - - - - explanation of statements - - - - - - - - - - - - - - - - - - - - F
import numpy - This command asks Python to use the Numpy library to manipulate matrices. Since the images we use are stored as matrices, we can then use numpy functions as
numpy.<function>. Alternately, we can also use the command import numpy as np to shorten
the name of the module so that we can call the numpy functions as np.<function>
import cv2 - This command asks Python to use the OpenCV library(also called module).
Once we have imported cv2, Python understands that the functions we use are defined in the
OpenCV library. Once we have imported this library, we can proceed to use functions from
this library as cv2.<function>
cv2.waitKey(t) - This command waits for a time period of time milliseconds for the user to
press a key, and if the user does press a key, then it returns the ASCII code of the key pressed.
In case we pass 0 as our parameter for time, then it waits indefinitely till the user presses a
key.
cv2.destroyAllWindows() - This command asks python to close all the open windows. This is
how we exit the program.
27
5.3
- - - - - - - - - - - - - - - explanation of statements - - - - - - - - - - - - - - - - - - - - F
cv2.VideoCapture(i) -> cap - This command tells python to set up an instance of a VideoCapture class and assigns it to the variable cap. In other words, we need to tell python where
we are getting our images from, in this case the number assigned to the camera we need
to use. For example, cap = cv2.VideoCapture(0) will make python use the first camera it
finds(usually the laptop camera) whenever we need to read images using cap.imread().
cap.read() -> ret, frame - This is the command we use to retrieve images from the source we
have named as capture. For example, if we have used the command cap = cv2.VideoCapture(0),
28
then our capture is named cap and cap.read() will return frame, a matrix that stores an image
taken from the camera, and ret, which if tells us if the capture was successful
cap.release() - This command releases the camera that we had initialized when we used the
cv2.VideoCapture command. In the absence of this command, we will get errors when we
run the program again and try to initialize the same camera without first releasing it.
5.4
- - - - - - - - - - - - - - - explanation of statements - - - - - - - - - - - - - - - - - - - - F
5.5
Experiments
show multiple files, types of files, windows waitkey times waitkey catch key index different cameras
5.6 Debugging
5.6
Debugging
29
6.1
Colourspaces
# ###########################################
# # I m p o r t OpenCV
i m p o r t numpy
i m p o r t cv2
# ###########################################
# ###########################################
# # Read t h e image
img = cv2 . imread ( ' g a n d a l f . j p e g ' )
# ###########################################
# ###########################################
# # Do t h e p r o c e s s i n g
p r i n t img . shape
gray = cv2 . cvtColor ( img , cv2 . COLOR_BGR2GRAY )
p r i n t gray . shape
# ###########################################
# ###########################################
# # Show t h e image
cv2 . imshow ( ' image ' , gray )
# ###########################################
# ###########################################
# # C l o s e and e x i t
cv2 . waitKey ( 0 )
cv2 . destroyAllWindows ( )
# ###########################################
cv2.cvtColor(image, cv2.COLOR_BGR2HSV)
32
6.2
Thresholding
Thresholding is a technique used to create a binary image from an existing image by assigning
either of two values based on the comparison of the intended threshold value and the pixel value.
For example, supposing the image we are Thresholding is a grayscale image. Then each of the
pixels has a value ranging from 0 to 255. Let us assume that the intended threshold value is 150.
Then those pixels in the image that have a value equal to or below 150 are set to 0 and those above
are set to 255. Thus the resulting image has pixels that are either black(0) or white(255).
# ###########################################
# # I m p o r t OpenCV
i m p o r t numpy
i m p o r t cv2
# ###########################################
# ###########################################
# # Read t h e image
img = cv2 . imread ( ' l i o n . j p g ' )
# ###########################################
# ###########################################
# # Do t h e p r o c e s s i n g
gray = cv2 . cvtColor ( img , cv2 . COLOR_BGR2GRAY ) # We n e e d a g r a y s c a l e image t o do t h e thresholding
ret , thresh1 = cv2 . threshold ( gray , 1 5 0 , 2 5 5 , cv2 . THRESH_BINARY )
ret , thresh2 = cv2 . threshold ( gray , 1 2 7 , 2 5 5 , cv2 . THRESH_BINARY_INV )
ret , thresh3 = cv2 . threshold ( gray , 1 2 7 , 2 5 5 , cv2 . THRESH_TRUNC )
ret , thresh4 = cv2 . threshold ( gray , 1 2 7 , 2 5 5 , cv2 . THRESH_TOZERO )
ret , thresh5 = cv2 . threshold ( gray , 1 2 7 , 2 5 5 , cv2 . THRESH_TOZERO_INV )
# ###########################################
# ###########################################
# # Show t h e image
cv2 . imshow ( ' image t h r e s h 1 ' , thresh1 )
cv2 . imshow ( ' image t h r e s h 2 ' , thresh2 )
cv2 . imshow ( ' image t h r e s h 3 ' , thresh3 )
cv2 . imshow ( ' image t h r e s h 4 ' , thresh4 )
cv2 . imshow ( ' image t h r e s h 5 ' , thresh5 )
cv2 . imshow ( ' o r i g i n a l ' , img )
# ###########################################
# ###########################################
# # C l o s e and e x i t
cv2 . waitKey ( 0 )
cv2 . destroyAllWindows ( )
# ###########################################
cv2.threshold(src, thresh, maxval, type) - Takes src as input and returns the threshed image. The types of thresholding include THRESH_BINARY, THRESH_BINARY_INV,
THRESH_TRUNC, THRESH_TOZERO, THRESH_TOZERO_INV
# inRange program
cv2.inRange - This command is used to apply thresholding when there is more than one
channel, eg. HSV, BGR images, etc.
6.3
33
6.4
Experiments
6.5
Debugging
34
7. User Interfaces
7.1
Shapes
# ###########################################
# # I m p o r t OpenCV
i m p o r t numpy as np
i m p o r t cv2
# ###########################################
# ###########################################
# # C r e a t e t h e image
img = np . zeros ( ( 5 0 0 , 5 0 0 , 3 ) , np . uint8 )
# ###########################################
# ###########################################
# # Do t h e p r o c e s s i n g
# Draw a l i n e
cv2 . line ( img , ( 1 0 , 1 0 ) , ( 4 9 0 , 1 0 ) , ( 2 5 5 , 0 , 0 ) , 5 )
# Draw a r e c t a n g l e
cv2 . rectangle ( img , ( 2 0 , 2 0 ) , ( 4 8 0 , 8 0 ) , ( 0 , 2 5 5 , 0 ) , 3 )
# Draw a c i r c l e
cv2 . circle ( img , ( 1 5 0 , 1 5 0 ) , 5 0 , ( 0 , 0 , 2 5 5 ) , 1)
cv2 . circle ( img , ( 3 5 0 , 1 5 0 ) , 5 0 , ( 0 , 0 , 2 5 5 ) , 3 )
# Filled
# Outline
# Draw an e l l i s p e
cv2 . ellipse ( img , ( 2 5 0 , 2 0 0 ) , ( 2 0 0 , 1 0 0 ) , 0 , 0 , 1 8 0 , ( 1 0 0 , 1 0 0 , 0 ) , 5 )
# Draw a p o l y g o n
pts = np . array ( [ [ 2 0 0 , 4 0 0 ] , [ 3 0 0 , 4 0 0 ] , [ 2 5 0 , 4 5 0 ] ] , np . int32 )
p r i n t pts
pts = pts . reshape ( ( 1 , 1 , 2 ) )
p r i n t pts
cv2 . polylines ( img , [ pts ] , True , ( 0 , 2 5 5 , 2 5 5 ) , 2 )
# ###########################################
# ###########################################
36
# # Show t h e image
cv2 . imshow ( ' image ' , img )
# ###########################################
# ###########################################
# # C l o s e and e x i t
cv2 . waitKey ( 0 )
cv2 . destroyAllWindows ( )
# ###########################################
7.2
cv2.lin
cv2.circle
cv2.rectangle
cv2.ellipse
Buttons
F
cv2.setMouseCallback
7.3
Experiments
7.4
Debugging
8. Drawing on air
8.1
System
8.2
Light Pen
38
(a) On
(b) Off
8.3
Steps
8.3.1
Step 1
As a first step, we need to get the video feed from the camera. Therefore, we will use our
camVideo.py as the template. As usual, please choose the correct video channel when using the
VideoCapture function.
# ###########################################
# # I m p o r t OpenCV
i m p o r t numpy
i m p o r t cv2
# I n i t i a l i z e camera
cap = cv2 . VideoCapture ( 1 )
# ###########################################
# ###########################################
# # Video Loop
while (1) :
# # Read t h e image
ret , frame = cap . read ( )
# # Do t h e p r o c e s s i n g
# # Show t h e image
cv2 . imshow ( ' image ' , frame )
# # End t h e v i d e o l o o p
i f cv2 . waitKey ( 1 ) == 2 7 : # # 27 ASCII f o r e s c a p e key
break
# ###########################################
# ###########################################
# # C l o s e and e x i t
# c l o s e camera
cap . release ( )
cv2 . destroyAllWindows ( )
# ###########################################
8.3 Steps
8.3.2
39
Step 2
Next, we need to choose the colourspace that is appropriate for our application. We can expect
changes in lighting conditions, and we are tracking a certain colour, therefore, we choose the HSV
colourspace.
# ###########################################
# # I m p o r t OpenCV
i m p o r t numpy
i m p o r t cv2
# I n i t i a l i z e camera
cap = cv2 . VideoCapture ( 1 )
# ###########################################
# ###########################################
# # Video Loop
while (1) :
# # Read t h e image
ret , frame = cap . read ( )
# # Do t h e p r o c e s s i n g
img = cv2 . cvtColor ( frame , cv2 . COLOR_BGR2HSV )
# # Show t h e image
cv2 . imshow ( ' image ' , img )
# # End t h e v i d e o l o o p
i f cv2 . waitKey ( 1 ) == 2 7 : # # 27 ASCII f o r e s c a p e key
break
# ###########################################
# ###########################################
# # C l o s e and e x i t
# c l o s e camera
cap . release ( )
cv2 . destroyAllWindows ( )
# ###########################################
40
8.3.3
Step 3
In this step, we introduce the concept of functions in Python. We simply call the function findPoint
to convert the image from BGR to HSV and return the converted image. Therefore, the output we
shall see will be the same as Step 2.
# ###########################################
# # I m p o r t OpenCV
i m p o r t numpy
i m p o r t cv2
# I n i t i a l i z e camera
cap = cv2 . VideoCapture ( 1 )
# ###########################################
# ###########################################
# # F i n d i n g t h e p o i n t (LED)
d e f findPoint ( img ) :
# # C o n v e r t t o HSV
img = cv2 . cvtColor ( img , cv2 . COLOR_BGR2HSV )
r e t u r n img
# ###########################################
# # Video Loop
while (1) :
# # Read t h e image
ret , frame = cap . read ( )
# # Do t h e p r o c e s s i n g
output = findPoint ( frame )
# # Show t h e image
cv2 . imshow ( ' image ' , output )
# # End t h e v i d e o l o o p
i f cv2 . waitKey ( 1 ) == 2 7 : # # 27 ASCII f o r e s c a p e key
break
# ###########################################
# ###########################################
# # C l o s e and e x i t
# c l o s e camera
cap . release ( )
cv2 . destroyAllWindows ( )
# ###########################################
8.3 Steps
8.3.4
41
Step 4
Now that we have a suitable image to work with, we need to find the position of the LED. In order
to do that, we try to first reduce the image to show only the LED by thresholding. Please use
filterFind.py to find the appropriate threshold for the colour of LED that you are using. The values
used here are meant for a red LED. Note that we are now returning the mask as the output of the
function findPoint.
# ###########################################
# # I m p o r t OpenCV
i m p o r t numpy
i m p o r t cv2
# I n i t i a l i z e camera
cap = cv2 . VideoCapture ( 1 )
# ###########################################
# ###########################################
# # F i n d i n g t h e p o i n t (LED)
d e f findPoint ( img ) :
# # C o n v e r t t o HSV
hsv = cv2 . cvtColor ( img , cv2 . COLOR_BGR2HSV )
## Define t h r e s h o l d s
lower = numpy . array ( [ 0 , 0 , 2 3 0 ] )
upper = numpy . array ( [ 2 0 , 1 0 , 2 5 5 ] )
# # T h r e s h o l d t h e image
mask = cv2 . inRange ( hsv , lower , upper )
r e t u r n mask
# ###########################################
# # Video Loop
while (1) :
# # Read t h e image
ret , frame = cap . read ( )
# # Do t h e p r o c e s s i n g
output = findPoint ( frame )
# # Show t h e image
cv2 . imshow ( ' image ' , output )
# # End t h e v i d e o l o o p
i f cv2 . waitKey ( 1 ) == 2 7 : # # 27 ASCII f o r e s c a p e key
break
# ###########################################
# ###########################################
# # C l o s e and e x i t
# c l o s e camera
cap . release ( )
cv2 . destroyAllWindows ( )
# ###########################################
42
8.3.5
Step 5
Now we have a binary image where the LED portion of the image appears as a blob. We are looking
for a position (x,y) to tell us which is the point on the screen that the LED indicates. Therefore,
we can use contours to find the central point of the blob. In this step, let us display all the possible
contours so that we can later eliminate those contours that result from noise.
# ###########################################
# # I m p o r t OpenCV
i m p o r t numpy
i m p o r t cv2
# I n i t i a l i z e camera
cap = cv2 . VideoCapture ( 1 )
# ###########################################
# ###########################################
# # F i n d i n g t h e p o i n t (LED)
d e f findPoint ( img ) :
# # C o n v e r t t o HSV
hsv = cv2 . cvtColor ( img , cv2 . COLOR_BGR2HSV )
## Define t h r e s h o l d s
lower = numpy . array ( [ 0 , 0 , 2 0 0 ] )
upper = numpy . array ( [ 3 0 , 1 0 , 2 5 5 ] )
# # T h r e s h o l d t h e image
mask = cv2 . inRange ( hsv , lower , upper )
## Find t h e blob of r e d
contours , hierarchy = cv2 . findContours ( mask , cv2 . RETR_TREE , cv2 . CHAIN_APPROX_SIMPLE )
cv2 . drawContours ( img , contours , 1, ( 0 , 0 , 2 5 5 ) , 3 )
r e t u r n img
# ###########################################
# # Video Loop
while (1) :
# # Read t h e image
ret , frame = cap . read ( )
# # Do t h e p r o c e s s i n g
output = findPoint ( frame )
# # Show t h e image
cv2 . imshow ( ' image ' , output )
# # End t h e v i d e o l o o p
i f cv2 . waitKey ( 1 ) == 2 7 : # # 27 ASCII f o r e s c a p e key
break
# ###########################################
# ###########################################
# # C l o s e and e x i t
# c l o s e camera
cap . release ( )
cv2 . destroyAllWindows ( )
# ###########################################
8.3 Steps
8.3.6
43
Step 6
We see that in the earlier step, there are many contours that are returned. We must now choose
the contour that indicates the LED. In order to do so, we can use contour properties of area and
moments. Firstly, we can dismiss those contours that are very small, or have a low area. Then
we can pick the biggest of those contours, using the assumption that it will be bigger than those
contours that result from noise.
# ###########################################
# # I m p o r t OpenCV
i m p o r t numpy
i m p o r t cv2
# I n i t i a l i z e camera
cap = cv2 . VideoCapture ( 1 )
# ###########################################
# ###########################################
# # F i n d i n g t h e p o i n t (LED)
d e f findPoint ( img ) :
# # C o n v e r t t o HSV
hsv = cv2 . cvtColor ( img , cv2 . COLOR_BGR2HSV )
## Define t h r e s h o l d s
lower = numpy . array ( [ 0 , 0 , 2 0 0 ] )
upper = numpy . array ( [ 3 0 , 1 0 , 2 5 5 ] )
# # T h r e s h o l d t h e image
mask = cv2 . inRange ( hsv , lower , upper )
## Find t h e blob of r e d
contours , hierarchy = cv2 . findContours ( mask , cv2 . RETR_TREE , cv2 . CHAIN_APPROX_SIMPLE )
## Find blob with b i g g e s t a r e a
i f l e n ( contours ) >0:
maxA = 0
maxC = [ ]
f o r cnt i n contours :
area = cv2 . contourArea ( cnt )
i f area >maxA :
maxC = cnt
maxA = area
cv2 . drawContours ( img , [ cnt ] , 1, ( 0 , 0 , 2 5 5 ) , 3 )
r e t u r n img
# ###########################################
# # Video Loop
while (1) :
# # Read t h e image
ret , frame = cap . read ( )
# # Do t h e p r o c e s s i n g
output = findPoint ( frame )
# # Show t h e image
cv2 . imshow ( ' image ' , output )
# # End t h e v i d e o l o o p
i f cv2 . waitKey ( 1 ) == 2 7 : # # 27 ASCII f o r e s c a p e key
break
# ###########################################
# ###########################################
44
# # C l o s e and e x i t
# c l o s e camera
cap . release ( )
cv2 . destroyAllWindows ( )
# ###########################################
8.3 Steps
8.3.7
45
Step 7
Now that we have found out the blob that we need, we need to find its centre. This is done in
OpenCV by the use of contour moments. Once we find the centre, we return it using the same
function, findPoint. We then draw a circle around it to show us where this point is.
# ###########################################
# # I m p o r t OpenCV
i m p o r t numpy
i m p o r t cv2
# I n i t i a l i z e camera
cap = cv2 . VideoCapture ( 1 )
canvas = numpy . zeros ( ( 4 8 0 , 6 4 0 , 3 ) , numpy . uint8 )
# ###########################################
# ###########################################
# # F i n d i n g t h e p o i n t (LED)
d e f findPoint ( img ) :
# # C o n v e r t t o HSV
hsv = cv2 . cvtColor ( img , cv2 . COLOR_BGR2HSV )
## Define t h r e s h o l d s
lower = numpy . array ( [ 0 , 0 , 2 0 0 ] )
upper = numpy . array ( [ 3 0 , 1 0 , 2 5 5 ] )
# # T h r e s h o l d t h e image
mask = cv2 . inRange ( hsv , lower , upper )
## Find t h e blob of r e d
contours , hierarchy = cv2 . findContours ( mask , cv2 . RETR_TREE , cv2 . CHAIN_APPROX_SIMPLE )
# contours = []
## D e f a u l t r e t u r n v a l u e
( cx , cy ) = ( 0 , 0 )
## Find blob with b i g g e s t a r e a
i f l e n ( contours ) >0:
maxA = 0
maxC = [ ]
f o r cnt i n contours :
area = cv2 . contourArea ( cnt )
i f area >maxA :
maxC = cnt
maxA = area
## Find t h e c e n t e r of t h a t blob
M = cv2 . moments ( maxC )
i f M [ ' m00 ' ] ! = 0 :
cx = i n t ( M [ ' m10 ' ] / M [ ' m00 ' ] )
cy = i n t ( M [ ' m01 ' ] / M [ ' m00 ' ] )
r e t u r n mask , ( cx , cy )
# ###########################################
# # Video Loop
while (1) :
# # Read t h e image
ret , frame = cap . read ( )
# # Do t h e p r o c e s s i n g
mask , outxy = findPoint ( frame )
# # Draw t h e p o i n t r e t u r n e d
cv2 . circle ( canvas , outxy , 5 , ( 0 , 0 , 2 5 5 ) , 2 )
46
cv2 . circle ( canvas , outxy , 2 0 , ( 2 5 5 , 0 , 2 5 5 ) , 2 )
cv2 . circle ( canvas , outxy , 4 0 , ( 0 , 2 5 5 , 2 5 5 ) , 2 )
# cv2 . l i n e ( c a n v a s , o l d P o s , pos , c o l o u r , t )
# # Show t h e image
cv2 . imshow ( ' o r i g ' , frame )
cv2 . imshow ( ' image ' , canvas )
# # End t h e v i d e o l o o p
i f cv2 . waitKey ( 1 ) == 2 7 : # # 27 ASCII f o r e s c a p e key
break
# ###########################################
# ###########################################
# # C l o s e and e x i t
# c l o s e camera
cap . release ( )
cv2 . destroyAllWindows ( )
# ###########################################
8.3 Steps
8.3.8
47
Step 8
Finally, we need to draw the points that we found earlier. This completes our sample application of
using Computer Vision to draw on air.
# ###########################################
# # I m p o r t OpenCV
i m p o r t numpy
i m p o r t cv2
# I n i t i a l i z e camera
cap = cv2 . VideoCapture ( 1 )
# c a n v a s = numpy . z e r o s ( ( 4 8 0 , 6 4 0 , 3 ) , numpy . u i n t 8 )
# ###########################################
# ###########################################
# # F i n d i n g t h e p o i n t (LED)
d e f findPoint ( img ) :
g l o b a l oldPos
# # C o n v e r t t o HSV
hsv = cv2 . cvtColor ( img , cv2 . COLOR_BGR2HSV )
## Define t h r e s h o l d s
lower = numpy . array ( [ 0 , 0 , 2 0 0 ] )
upper = numpy . array ( [ 3 0 , 1 0 , 2 5 5 ] )
# # T h r e s h o l d t h e image
mask = cv2 . inRange ( hsv , lower , upper )
## Find t h e blob of r e d
contours , hierarchy = cv2 . findContours ( mask , cv2 . RETR_TREE , cv2 . CHAIN_APPROX_SIMPLE )
# contours = []
## D e f a u l t r e t u r n v a l u e
( cx , cy ) = ( 0 , 0 )
## Find blob with b i g g e s t a r e a
i f l e n ( contours ) >0:
maxA = 400
maxC = [ ]
f o r cnt i n contours :
area = cv2 . contourArea ( cnt )
i f area >maxA :
maxC = cnt
maxA = area
i f maxC ! = [ ] :
## Find t h e c e n t e r of t h a t blob
M = cv2 . moments ( maxC )
i f M [ ' m00 ' ] ! = 0 :
cx = i n t ( M [ ' m10 ' ] / M [ ' m00 ' ] )
cy = i n t ( M [ ' m01 ' ] / M [ ' m00 ' ] )
r e t u r n mask , ( cx , cy )
# ###########################################
# # Video Loop
while (1) :
# # Read t h e image
ret , frame = cap . read ( )
# # Do t h e p r o c e s s i n g
mask , outxy = findPoint ( frame )
# # Draw t h e p o i n t
cv2 . circle ( frame ,
cv2 . circle ( frame ,
cv2 . circle ( frame ,
returned
outxy , 5 , ( 0 , 0 , 2 5 5 ) , 3 )
outxy , 2 0 , ( 2 5 5 , 0 , 2 5 5 ) , 3 )
outxy , 4 0 , ( 0 , 2 5 5 , 2 5 5 ) , 3 )
48
# # Show t h e image
cv2 . imshow ( ' image ' , frame )
# # End t h e v i d e o l o o p
i f cv2 . waitKey ( 1 ) == 2 7 : # # 27 ASCII f o r e s c a p e key
break
# ###########################################
# ###########################################
# # C l o s e and e x i t
# c l o s e camera
cap . release ( )
cv2 . destroyAllWindows ( )
# ###########################################
8.4
Experiments
8.5
Debugging
Bibliography
Books
Articles
Index
Buttons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
Gestalt Principles . . . . . . . . . . . . . . . . . . . . . . . 11
C
H
Closure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Colourspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Common Motion . . . . . . . . . . . . . . . . . . . . . . . . 11
Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Contour Properties . . . . . . . . . . . . . . . . . . . . . . 41
Contours . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
D
Debugging . . . . . . . . . . . . . . . . 17, 27, 31, 34, 46
Drawing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
E
Experiments . . . . . . . . . . . . . . . . . . 26, 31, 34, 46
HSV colourspace . . . . . . . . . . . . . . . . . . . . . . . 37
I
Image from Camera . . . . . . . . . . . . . . . . . . . . . 25
Image from file . . . . . . . . . . . . . . . . . . . . . . . . . 23
Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
L
Light Pen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Functions in Python . . . . . . . . . . . . . . . . . . . . . 38
Models of Vision . . . . . . . . . . . . . . . . . . . . . . . . 10
52
INDEX
N
Numpy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .16
O
OpenCV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
P
Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Proximity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
R
Representation . . . . . . . . . . . . . . . . . . . . . . . . . . 19
S
Sensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Shapes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Similarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Symmetry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
T
Thresholding . . . . . . . . . . . . . . . . . . . . . . . . 30, 39
Tying up loose ends . . . . . . . . . . . . . . . . . . . . . 45
V
Video feed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
Video from Camera . . . . . . . . . . . . . . . . . . . . . 26
Z
Zooming, Rotating and Panning . . . . . . . . . . 31