Você está na página 1de 73

OCR in 3D Orientation Using a Web Cam

Piyush Goel
Bsc (Hons) Artificial Intelligence
with Mathematics (Industry)
2003/2004

The candidate confirms that the work submitted is their own and the appropriate credit has been given
where reference has been made to the work of others.

I understand that failure to attribute material which is obtained from another source may be
considered as plagiarism.

(Signature of student)____________________________
Summary

The aim of the project was to develop a software program to rectify perspective distortion in
document images so as to make them readable by an OCR package. The objectives of the project
included understanding the limitations of OCR packages and complexities of the using web cam to
capture document images. Furthermore, it required investigating and understanding the research done
in the field so far and to ultimately devise an application that implemented existing or new
algorithm(s) to produce a solution for the concerned problem. This report details, in a schematic and
systematic manner, various stages of the project that were carried out in order to fulfil the low-level
objectives to achieve the overall goal. In the end the report evaluates the project and the developed
system and makes suggestion to take this work further.

i
Acknowledgements

First of all I would like to express my utmost gratitude to my parents for all their love and support
which made it possible for me to pursue my studies at Leeds.

Then I’d like thank Dr. Andy Bulpitt, my project supervisor, for his invaluable time to provide me
with various ideas throughout the project and to help me develop the system. I would also like to
thank him for providing the necessary equipment and software to carry out the project.

Finally, I would like to thank all my friends for providing me with all their support and help at all
times for sharing wonderful moments with me.

ii
Table of Contents

Summary .....................................................................................................................................i
Acknowledgements .....................................................................................................................ii
Table of Contents.......................................................................................................................iii
List of Figures.............................................................................................................................ii

Chapter 1 : Introduction.............................................................................................................1
1.1 P ROJECT AIM ......................................................................................................................1
1.2 MOTIVATION .......................................................................................................................1
1.3 OBJECTIVES ........................................................................................................................1
1.4 MINIMUM REQUIREMENTS ...................................................................................................1
1.5 P ROJECT METHODOLOGY .....................................................................................................2
1.5.1 The Waterfall model.................................................................................................2
1.5.2 The Prototype Model................................................................................................3
1.5.3 The Incremental Model.............................................................................................3
1.5.4 The Spiral Model......................................................................................................3
1.5.5 Chosen methodology with justification .......................................................................4
1.6 P ROJECT SCHEDULE.............................................................................................................4
1.7 REPORT OUTLINE ................................................................................................................5

Chapter 2 : Background Research..............................................................................................6


2.1 INTRODUCTION....................................................................................................................6
2.2 OCR P RECINCTS..................................................................................................................6
2.3 INTRICACIES OF A WEB CAM ...............................................................................................7
2.4 SYSTEM CONSTRAINTS?.......................................................................................................8
2.5 P REVIOUS WORK .................................................................................................................8
2.5.1 Introduction .............................................................................................................8
2.5.2 Extraction of illusory clues........................................................................................9
2.5.3 Using Projection Profiles........................................................................................13
2.5.4 Conclusions on previous work .................................................................................17
2.6 BASIC TECHNIQUES INVOLVED ........................................................................................... 17
2.6.1 Image warping .......................................................................................................18
2.6.2 Quadrilateral-to-Rectangle Mapping .......................................................................19
2.6.3 Interpolation ..........................................................................................................21
2.7 CONCLUSIONS ................................................................................................................... 23
2.8 REQUIREMENTS SPECIFICATION.......................................................................................... 24

iii
TABLE OF CONTENTS

2.8.1 Functional System Requirements .............................................................................24


2.8.2 Non-Functional System Requirements......................................................................24
2.9 SUMMARY OF BACKGROUND RESEARCH ............................................................................. 25

Chapter 3 : The Design.............................................................................................................26


3.1 INTRODUCTION.................................................................................................................. 26
3.2 SEGEMENTATION ............................................................................................................... 26
3.2.1 Thresholding ..........................................................................................................27
3.2.2 Edge Detection.......................................................................................................28
3.2.3 Determining the page corners..................................................................................29
3.2.4 Linear Regression...................................................................................................29
3.2.5 Mathematics Involved .............................................................................................30
3.3 QUADRILATERAL-TO-RECTANGLE MAPPING........................................................................ 31
3.4 IMPLEMENTATION TECHNOLOGY........................................................................................ 31
3.4.1 Java.......................................................................................................................31
3.4.2 MATLAB................................................................................................................32
3.4.3 Chosen Technology.................................................................................................32
3.5 DESIGN SUMMARY ............................................................................................................ 33

Chapter 4 : Implementation and Testing ..................................................................................34


4.1 INTRODUCTION.................................................................................................................. 34
4.2 COMPONENTS IMPLEMENTATION AND TESTING ................................................................... 35
4.2.1 Thresholding ..........................................................................................................35
4.2.2 Edge Detection.......................................................................................................37
4.2.3 Regression Fitting ..................................................................................................37
4.2.4 Corner Detection....................................................................................................38
4.2.5 Quadrilateral-to-Rectangle Mapping .......................................................................38
4.3 SYSTEM TESTING............................................................................................................... 40
4.4 TEST PARAMETERS ............................................................................................................ 41
4.5 TEST DATA ....................................................................................................................... 42
4.6 TEST RESULTS ................................................................................................................... 42
4.7 RESULTS ANALYSIS ........................................................................................................... 43
4.8 SUMMARY......................................................................................................................... 44

Chapter 5 : Evaluation..............................................................................................................45
5.1 INTRODUCTION.................................................................................................................. 45
5.2 EVALUATION OF THE P ROJECT............................................................................................ 45
5.3 EVALUATION OF THE SYSTEM ............................................................................................ 46

ii
TABLE OF CONTENTS

5.3.1 Against functional requirements ..............................................................................46


5.3.2 Against non-functional requirements........................................................................46
5.4 FUTURE WORK.................................................................................................................. 47
5.5 EVALUATION SUMMARY.................................................................................................... 48

Chapter 6 : Conclusion.............................................................................................................49
RECAPITULATION OF THE P ROJECT P ROCESS ............................................................................ 49
References.................................................................................................................................50
Appendix A: Project Reflection................................................................................................52
Appendix B: Project Schedule ..................................................................................................54
Appendix C: OCR Evaluation ..................................................................................................57
Appendix D: Test Samples........................................................................................................59
Appendix E: The Process..........................................................................................................65

iii
List of Figures

Figure 1.1: Project Methodology: A hybrid of realistic waterfall and incremental models......................................4

Figure 2.1: Shows vanishing points of an image of a cubical box...................................................................................9

Figure 2.2: Shows the different types of illusory clues (A,B,C,D,E)............................................................................. 10

Figure 2.3 : Associations between elongated and compact blobs respectively......................................................... 10

Figure 2.4: Association network......................................................................................................................................... 12

Figure 2.5: Shows the different types of vertical associations. ..................................................................................... 12

Figure 2.6: Left image shows the dense network of vertical association..................................................................... 13

Figure 2.7: Confidence measures for projection profiles............................................................................................... 14

Figure 2.8: Geometry involved in line spacing................................................................................................................ 15

Figure 2.9: Quadrilateral-to-rectangle mapping using transformation M................................................................. 20

Figure 2.10 : Bilinear Interpolation................................................................................................................................... 22

Figure 3.1: Histograms......................................................................................................................................................... 27

Figure 3.2: Iterative threshold method.............................................................................................................................. 28

Figure 3.3: A diagram showing a regression line fitting a set of points...................................................................... 29

Figure 4.1 : Implementation and Testing process............................................................................................................ 35

Figure 4.2: Thresholded image with a threshold of 145............................................................................................... 36

Figure 4.3 : Edge Detection image produced from Figure 4.1(b) ................................................................................ 37

Figure 4.4 : Depicts corner detection in the edge image................................................................................................ 38

Figure 4.5 : Corrected version of the mapping shown in Figure 2.8 ........................................................................... 39

Figure 4.6: Interpolation Methods.................................................................................................................................... 40

Figure 4.7: Images rotated along (a) x-axis +30°, (b) x-axis -30°, (c) y-axis +30° and (d) y-axis -30°............... 41

Figure 4.8: Illustrating perspective.................................................................................................................................... 44

ii
Chapter 1 : INTRODUCTION

1.1 PROJECT AIM

The aim of the project is to correct perspective distortion in document images captured using a
webcam, which makes the text recognizable by an existing package capable of performing OCR.

1.2 MOTIVATION

The motivation for the project is the increasing use of digital web cameras for the purpose of
capturing text documents to replace the bulky and unwieldy flatbed scanners. Limited research
performed in the field of correcting perspective distortion in text document images is another
motivating factor for the project.

1.3 OBJECTIVES

The objectives of the project that need to be completed in order to achieve the overall aim are:
• to understand the limitations of OCR packages
• to understand the complexities of the using web cam to capture document images
• to investigate the research done in the field so far
• to analyse the different approaches taken to solve the problem
• to devise an application that implements existing or new algorithm(s) to produce a solution
for the concerned problem, and
• to test and evaluate the accuracy of the developed system

1.4 MINIMUM REQUIREMENTS

The minimum requirements of the system are as follows:

1
INTRODUCTION

1. Understand the problems confronted while performing OCR on images captured using a web
cam depending upon several aspects such as distortions, varied lighting conditions and text
font size.
2. Understand the research done to date to resolve perspective distortions in captured images.
3. Produce at least one algorithm to correct perspective distortion in images captured using a
web cam.
4. OCR would perform successfully on images captured under normal lighting conditions.
5. OCR will be evaluated for different degrees of document rotation out of the image plane.

1.5 PROJECT METHODOLOGY

Avison [1] states that a methodology comprises of phases that in turn consists of sub-phases, which
guide the builders of the system to choose the techniques that may possibly be apposite at each step of
the project and assist management, control and evaluation of the project. This section presents various
project methodologies or development cycles that are employed to structure the development process
for a project. The different types of methodologies discussed below include the Waterfall model, the
Prototype model, the Incremental model and the Spiral model.

1.5.1 The Waterfall model

Generally, Royce [2] is credited for the waterfall model. This type of software development life cycle
has been described by Nasa [3] to have the following steps:

1. Document system concept


2. Identify and analyse the system requirements
3. Divide the system into sub parts (Architectural design)
4. Design the subdivision of system (Detailed design)
5. Code the components and test them separately
6. Test the system as a whole
7. Install the system and operate it

This is an idealistic model, in which the methodology only flows forwards, i.e., at any stage of the
development, a previous stage cannot be changed. This assumes that the initial user needs and system
requirements, etc, cannot be changed at a later stage and have to be perfectly laid at the beginning of
the project and hence being the major drawback of the approach. A realistic waterfall model embraces

2
INTRODUCTION

feedback loops which enable to revert from one stage of the development process to the previous one.
This induces flexibility to the project along with the added advantages of the waterfall model, the
biggest being setting intermediate milestones and deadlines to ensure that the final deadline is met
within the allocated time and budget.

1.5.2 The Prototype Model

In the prototype model, firstly, all the known initial requirements are congregated from the customer.
A design is then quickly made on the basis of these requirements and a prototype model is developed
based on the coined design. Then this prototype is evaluated by the customer. At this stage, if
sufficient information is not known to begin the development of the actual desired product, additional
iterations all the way through the prototype lifecycle are made to upgrade the requirements and to
improve the prototype. When enough is identified to commence the development of the actual product,
the prototype is discarded and the product is engineered upon. This type of lifecycle is very useful
when the customer understands his requirements vaguely and the developers are not well acquainted
with the development environment. Heusser and Sait [4] describe the limitations of the approach as
the inability set milestones and that the determination of potential problems in the project can be
difficult to foresee.

1.5.3 The Incremental Model

The incremental model slightly modifies the waterfall model. The waterfall model finished each stage
of the process and then proceeds to the next phase. The incremental model does the same till the
design phase after which, in the implementation stage, it divides the project into smaller modules that
can be developed and tested separately. This approach reduces the complexity of the development of
the product to some extent since the development of smaller entities would be simpler than the whole.

1.5.4 The Spiral Model

The spiral model is a combination of the waterfall and the prototype models [5]. It is a cyclic model
comprising of four stages: planning, risk analysis, development, and evaluation. The model starts by
gathering the initial system requirements. The project then goes to the risk analysis stage where the
any potential conditions that may affect the project adversely are determined, after which the project
may either be abandoned due to high risk or it can be kept alive. If the project is decided to be

3
INTRODUCTION

continued upon, then it enters the engineering stage where a prototype is built on the basis of the
initial requirements. Then the customer evaluates the prototype and provides feedback on it.
Subsequently, the project re-enters the planning stage. A new prototype is generated on the basis of
the feedback after performing the risk analysis and the spiral goes on until a stable prototype is
reached which matches the customers’ requirements. This method of project management is used
when the application is highly complex, involves a big risk factor, and the customer does not have
exhaustive knowledge of the requirements to begin with.

1.5.5 Chosen methodology with justification

The project at hand has a set of minimum requirements that are subject to change at a later stage but
not too much, has a complex problem to solve which is difficult to design but risk factor is low and
meeting the schedule is a high priority. Along with the analysis in the previous section about the
individual approaches and taking into account the project aspects mentioned, a combination of the
realistic waterfall model and the incremental model has been chosen as the methodology for the
project lifecycle. This would ensure that the deadlines are met on time and the development of the
product does not become overwhelmingly complex. Figure 1.1 shows a diagrammatic view of the
project methodology.

Figure 1.1: Project Methodology: A hybrid of realistic waterfall and incremental models

1.6 PROJECT SCHEDULE


Figure B.1 in Appendix B shows the planned project schedule where the blue dates represent
the milestones to be achieved as per the rules laid by the School Of Computing, and the black
dates represent the targeted completion dates set by the author.

4
INTRODUCTION

1.7 REPORT OUTLINE

The sections of this report nearly imitate the stages of the proposed methodology and are summarised
as follows:
• Chapter 1 states the aim, motivation and minimum requirements of the project. It also discusses
various project methodologies and justifies the one chosen for this project. Based on the
methodology it also states the proposed project schedule. It also introduces the report.

• Chapter 2 discusses the limitations of OCR packages and web cameras and mentions the system
constraints set on the basis on these limitations. It also gives an introduction to the problem
being solved, details the previous work done to solve the problem and makes conclusions from
the previous work. Moreover, it details the techniques involved in producing the solution. Lastly,
it defined the success criteria for the project.

• Chapter 3 details the design components of the system that needs to be built, in the order of
their occurrence in the system. It also provides a justification for the proposed design by
comparing it to the defined success criteria. Finally, it evaluates various technologies for the
development of the system.

• Chapter 4 explicates the implementation of the distortion correction program. It mentions the
problems encountered in the development phase and the rectifications made. Lastly, it justifies
the implementation as opposed to the design and the success criteria.

• Chapter 5 evaluates the project and the developed system against the requirements set at prior
stages. It also presents ideas on future work and further enhancements possible to add to this
project.

• Chapter 6 presents a recap to the whole project process and makes conclusions about the project
methodology and the project outcome.

• References and Appendices follow up. These sections contain additional information that
complements the text presented in the report but is not essential for understanding the project.

5
Chapter 2 : BACKGROUND RESEARCH

2.1 INTRODUCTION

This chapter looks at the aspects of the project that need to be unfolded before any attempt is made to
add to the knowledge. Firstly, the chapter mentions the limitations of OCR packages and complexities
of using web cameras to capture images. Subsequently, it states the setup under which the experiment
will be carried out, based on the scope defined by the intricacies of OCR and web cameras. After that
the chapter introduces the problem and looks into the attempts that have been made to devise a
solution to the concerned challenge. It then discusses the fundamental techniques that will be
invariably used in the process of developing a solution followed by a conclusion based on the content
of the entire chapter.

2.2 OCR PRECINCTS

OCR packages have a number of limitations with regard to their capabilities and the conditions under
which they can perform effectively. Most of the OCR packages assume the images to be captured
using flatbed scanners which eliminates the possibility of several distortions to crop in the image such
as perspective distortion as the document lies flat on the scanner glass, prevents shadow effect across
the document as the scanner light floods the document with ample illumination, great skew angles as
the document is aligned with the edges of the scanner bed, and there are no lens distortions in contrast
to the images captured using the webcams. Thus the OCR packages do not perform well on
documents with above-mentioned distortions.

An OCR package called ‘PageCam’ has been used to perform document recognition in this project.
This is due to fact that this package was available at the School Of Computing and acquiring another
package from the market was beyond the financial scope of the project. It has been promoted in [6] as
an OCR package that caters to problems such as poor lighting conditions, low resolution images, blur,
noise and compressions artefacts. It comes complementary with the “Philips PCVC750K Web
Camera” and hence the Philips camera has been used to capture images for this research literature.
Nicel [7] collected various images using the Philips camera and performed OCR using the PageCam.

6
BACKGROUND

He explained OCR to be successful if more than seventy percent of the characters of the text were
recognised correctly in an image. Table C.1 in Appendix C illustrates the dataset collected to
evaluate the performance of PageCam and the corresponding results. The camera was kept at a
distance of sixteen inches from the document and all the images were captured in bright illumination
and had a resolution of 640x480.

As can be seen from Table C.1, the OCR package produced varied results on the given dataset. It
recognised text with font size with 14 points or bigger but could not do so for font size 12 points or
lesser. Recognition was carried on effectively for documents that were skewed up to ten degrees and
completely failed for the ones more than that. The package faired well with images oriented
differently and recognised text with great precision. The OCR successfully identified text in only two
instances in case of images containing perspective distortion and those two instances contained very
little distortions. OCR could not cope well with the illumination differences across the document.

2.3 INTRICACIES OF A WEB CAM

Web cams are a cost effective way of capturing digital images but they have their own limitations. All
the aspects that affect the performance of the OCR package are the majority of issues that occur when
acquiring images using a web cam.

Converse to the advantages of flatbed scanners mentioned in section 2.2, images captured with
webcams suffer with shadows due to illumination variations. The images may contain skew due to no
marked area as the “bed”. The images may also contain perspective distortion if the document is held
up in front of the camera rather than laying it on a flat surface with the camera right above the
document. The images may, furthermore, have poor picture quality due to low resolution of the
webcam and due to this small sized text becomes very difficult to recognise correctly as the text
letters become indistinct. Moreover, due to the wide angle of the camera lens, lens distortions such as
barrel and pincushion may creep in to reduce the efficiency of OCR.

These defects need to be corrected before OCR is applied to them due to the fact that most OCR
packages require the image quality to be like the one that is obtained using a flatbed scanner, which a
normal web cam does not deliver.

7
BACKGROUND

2.4 SYSTEM CONSTRAINTS?

Based on the discussion done in the section 2.2 and section 2.3, we can now decide what areas need to
be dealt with before OCR is performed on images captured using a webcam. Although there are many
issues that need to dealt with, due to time constraints it will not be possible to solve most of them
within the time frame set for this project. So this project will deal with one such area, that is, the
correction of perspective distortion due to the rotation of either the camera or the document which
results in the image pla ne and the document plane to be non-parallel. For this purpose other
constraints need to be set so that they do not hinder the OCR process. We shall make the following
assumptions for the entirety of the project:

• Image capturing will be done in uniform and optimum lighting conditions.

• Document will contain black text on white background.

• The text font size of the document will be greater than or equal to fourteen points.

• There will be no skew present in the captured image.

• Correct orientation will be present.

• Document will contain only textual data.

2.5 PREVIOUS WORK

2.5.1 Introduction

Efforts to read text pages using OCR engines have been in practice for a long time. These efforts have
dealt noticeably with the issues of resolving skew in the document images captured using flat bed
scanners and even web cams. However, even though the problem of perspective distortion in
document images captured using webcams has been recognised, it has not been addressed greatly until
now. Little research is evident in this field, where the plane of the paper is not fronto -parallel with the
camera view.

The following sections account for the methods that have been implemented to date to solve the
problem of perspective in document images to make it possible for the OCR package to read the text.
But before these methods are looked at, let us understand what the basic motive of these approaches is.

8
BACKGROUND

At the top level it is understood that the methods discussed below would all aim at removing the
perspective distortion in document images, but if we go a level below this, the main objective of all
these methods is to determine the vanishing points. What are vanishing points and what is their
relevance to the problem?

Vanishing Points
As we look at a pair of straight railway tracks, we observe that at a point far away, theses tracks
appear to merge into one point. This point, at which real-world parallel lines seem to converge in the
2D plane, is the called a vanishing point. Moreover, real world 3-dimensional scenes when projected
onto the 2-dimensional image planes, depict parallel lines as lines that meet at the vanishing point.
Figure 2.1 shows an example of vanishing points in an image. The figure shows a cuboid whose sides
are parallel in the real world but appear to converge at vanishing points in the image.

Figure 2.1: Shows vanishing points of an image of a cubical box

Vanishing points are important information to determine the orientation of the image objects. Once
these points are detected in a particular image, they are used to form intersecting quadrilaterals that
determine the geometry of the plane. How this is done will be discussed in more detail later in the
report.

2.5.2 Extraction of illusory clues

Pilu [8] has prescribed an approach in which illusory [9] clues are used to determine the skew and
perspective distortions in the image. These are clues which are not directly evident in the image but
can be extracted as they correspond to linear features that arise due to linear arrangement of text. Pilu
has mentioned five such clues:
(a) “vertical illusory clues” which emerge from the apparent vertical lines due to alignment of
text.
(b) “vertical hard lines” that correspond to actual vertical document edges.

9
BACKGROUND

(c) “horizontal illusory clues” that originate from the arrangement of letters and words in text
lines.
(d) “horizontal hard lines” which correspond to actual horizontal document edges, and
(e) “quadrilateral” that corresponds to either illusory quadrilaterals formed from the vertical and
horizontal illusory clues or to actual quadrilaterals in the text.
These clues are respectively shown in Figure 2.2.

Figure 2.2: Shows the different types of illusory clues (A,B,C,D,E) (Source: [8])

Horizontal and vertical hard lines can be found using either Hough Transforms (consult [10] for
details on Hough Transform) or edge detectors likes Sobel and Canny edge detectors which will be
discussed in the upcoming sections of this report. The illusory vertical and horizontal clues are rather
complicated to detect. The image is turned into blobs of text, “compact” (made of characters) or
“elongated” (comprising words or portions of lines), based on the font size and resolution of the
image. A blob is classified as elongated if its major axis is more than three times longer than its minor
axis. Different types of blobs are shown in Figure 2.3.

Figure 2.3 : (a), (b) show compact blobs and the associations between them (c), (d) show elongated blobs
their associations with elongated and compact blobs respectively. (Source: [8])

10
BACKGROUND

A “Pairwise saliency measure” is calculated for all different blobs using two blob saliency features,
“relative minimum distance” and “blob dimension ratio”. Blob dimension ratio (BDR) and relative
minimum distance (RMD) are given as:

(Equation 2.1)
and

(Equation 2.2)

where DijMIN is the minimum distance between two blobs Bi and Bj , and, AxMIN and AxMAX
respectively represent the minor and major axes of a blob Bx . Pairwise saliency measure is a
probability measure, which, for two compact blobs is given by:

(Equation 2.3)

where N ( x, µx, s x ) gives a Gaussian distribution of x with mean as µx and standard deviation as s x,
and for one or two elongated blobs is given by:

(Equation 2.4)

where a ij is the angle between the horizontal axis of one elongated blob and either the centre of a
compact blob or the horizontal axis of another elongated blob. This pairwise saliency measure is then
used to form curvilinear arcs between blobs that represent the same line of text. To do this task a
greedy path growing approach is adopted. In this approach random seeds (blobs) are selected and arcs
are formed in one particular direction (right or left) by joining the blobs which satisfy the minimum
required saliency measure with arcs until a blob is found which does not fit with the previous ones.
Then, starting at the same seed arcs are formed in the opposite direction in the same way, ensuring
that the blobs that have been previously used are not used again. This way, an association network is
formed where the blobs act as nodes and curvilinear arcs between them form their associations.
Horizontal lines are then fitted to these arcs in order to determine the exact angle of the direction of
the linear groups. This is done by using linear regression which is discussed in section 3.2.4 in detail.
Thus formed association network is depicted in Figure 2.4.

11
BACKGROUND

Figure 2.4: A:Original binary image, B:Association network, C:the extracted curvilinear groups, D:fitted
horizontal lines (Source: [8])

The fitted horizontal lines are used to find the horizontal vanishing points. The homography thus
calculated from this vanishing point can be used to correct the skew in the document. After the skew
is corrected using the horizontal clues, it becomes easier to find the vertical clues as a rough idea of
the vertical direction is known. To find the vertical illusory clues, the same blobs are used but a
different sort of association network is formed. In this network the associations are formed between
blobs which lie in the near vertical direction. Figure 2.5 depicts these associations.

Figure 2.5: Shows the different types of vertical associations. (Source: [8])

The associations are refined by rejecting the ones that are almost impossible to represent the vertical
clues rather than choosing them by saliency measures, as was done for horizontal clues. Pilu has
mentioned four rejection rules that are applied to the initial dense network:
(a) “longest of two overlapping associations.
(b) left-end-to-right-end-associations (and vice versa), since they can’t be formed from a justified
paragraph.
(c) associations that are at too much of an angle form the vertical direction.

12
BACKGROUND

(d) associations of blobs of two different heights as they are most unlikely to form the part of
same paragraph.”

Using these rules, the dense network is reduced to a network with relevant associations, though
amongst quite a few insignificant ones. Ultimately, as done in the case of horizontal clues, a greedy
split and merge policy is applied to bracket together all the clues in near-vertical groups to form the
vertical clues. Figure 2.6 shows examples of the detected vertical clues.

Figure 2.6: Left image shows the dense network of vertical association and the right image shows the
selected vertical clues. (Source: [8])

The vertical and horizontal lines thus formed are used to find the horizontal and vertical vanishing
points. A homography is computed using the two vanishing points which is then used to warp the
original image based on a transformation that maps quadrilaterals to rectangles. This technique is
discussed in detail in section 2.6.2.

The table below analyses the pros and cons of the method described above:

Pros 1. This method can correct perspective distortion in text documents even when
there are no page borders present in the document image.

Cons 1. The method does not work efficiently when there is only one vertical illusory
clue available. This occurs when the text document is left justified, right
justified or centrally justified.

2.5.3 Using Projection Profiles

Projection profiles are graphs made on the basis of number of black and white pixels along a
particular axis in a binary image (that only contains black and white pixels). Such graphs contain
defined peaks and troughs corresponding to the pixels along the given axis. These projection profiles
are used to determine both, the skew and perspective, in an image. The idea is that if the lines of text

13
BACKGROUND

are parallel to a particular axis then these text lines will produce peaks when a profile is created taking
the projection in the axis perpendicular to the former.

Projection profiles have proved to be an effective tool to determine the skew in fronto-parallel
document images as discussed in [11]. Here various profiles are created for different probable angles
of skew and the one with minimum entropy (a measure of least error) is marked as the angle of skew.
However, Clark and Mirmehdi in [12] have utilised the technique of projection profiles to determine
the perspective distortion from the text image. In their approach, they formed a circular sample space
of all the possible vanishing points for a text region and generated projection profiles from the view
point of all these points. The basic idea behind this approach is that all the parallel text lines point
towards the vanishing point and thus the actual vanishing point will have distinct peaks for every text
line, as they will have high number of black pixels. A “confidence measure” is calculated for each
projection. This confidence measure is high for a point for which more distinct peaks are produced in
the projection profile. The point with the highest confidence measure is chosen as the horizontal
vanishing point of the text document plane. Confidence measures for all possible vanishing points are
plotted as shown in Figure 2.7(b).

Figure 2.7: (a) Sample space, (b) confidence measures for projection profiles, white arrow represents a
probable vanishing point and black cross is not a vanishing point. (Source: [12])

Once the horizontal point is found, then the justification of the document is determined, i.e., whether
the document is left, centre, right or fully justified. This is done by first marking the starting-points,
the mid-points and the end-points of all the text lines. Then lines are fitted to the starting-points of all
text lines, the mid-points of all text lines and the end points, using the RANSAC method (for details
on the RANSAC method consult [13]). The error associated with each of these fittings determines the
justification of the document. If the line fitting the starting points of the text line has the least error
associated with it, then the document is said to be left justified. Similarly, it is said to be centrally
justified if the line joining the mid points has the least error and right justified if the line fitting the

14
BACKGROUND

end points produces the smallest error. However, if the errors associated with the lines fitting the
starting and ending points of the text lines are found to be nearly equal then the document is marked
as fully justified. In this case the two lines fitting the left margin points and the right margin points are
used to determine the vertical vanishing point as the point of their intersection. But if the document is
not fully justified then further work needs to be done. The line with the least error is referred to as the
baseline. The image is rotated to make the baseline perpendicular to one axis so that the problem gets
less complicated by having to consider only two-dimensional (y,z-plane) image from here onwards.
Figure 2.8 shows the image after making the baseline vertical.

Figure 2.8: Geometry involved in line spacing (Source: [12])

In the figure, P is the bottom of the paragraph and the text lines are at regular intervals of Q. Hence
the nth line is given as (from the bottom, P):
L(n) = P + nQ (Equation 2.5)

and the projection of this line in the image plane is given as:

(Equation 2.6)
where f is the camera’s focal length and, Py and Pz, and, Qy and Qz are the y and z components of P
and Q respectively.

Clark and Mermehdi state that without resulting in a loss of the nature of the projection, this scene can
be scaled about the focal point O to make Pz equal to f which would give an effect of the paragraph
touching the image plane. Thus making Py = y(0) which leads to the following formula from equation
2.5:

15
BACKGROUND

(Equation 2.7)
where U = y(0), V = Qy / Py and W = Qz / Pz. This makes the process independent of the focal length
of the camera and hence any camera can be used in the experiment.

The position of the nth line is given by,


Xn = y(n) (Equation 2.8)

and the line spacing at position Xn is given as,


Y n = y(n+1) + y(n) (Equation 2.9)

By doing this, any lines in the image that are not found to be consistently spaced will outstand as
having unusual spacing but this inconsistency will not promulgate through the rest of the lines.

By substituting equation 2.7 into equation 2.9 gives:

(Equation 2.10)

and substituting equation 2.7 into equation 2.8 and then using equation 2.10 gives:

(Equation 2.11)

Different values of V and W are substituted in the above equation for fitting lines by obtaining least
error. However, many unwanted minima may be produced due to the complexity of the equation 2.11
and hence it is approximated as follows:

(Equation 2.12)
This makes sure that values for V and W, close to the actual minima are obtained. These values are
finally substituted in equation 2.7 to get altitude of the horizon which is given by:
y(8 ) = UV/W (Equation 2.13)

The rotation made earlier to make the baseline upright is revered to achieve the vertical vanishing
point, which this point corresponds to.

The vertical vanishing point and the horizontal vanishing points found will be used to find the lines
fitting the paragraph. These will intersect each other forming a quadrilateral enclosing the paragraph.

16
BACKGROUND

This quadrilateral, like in the previous method discussed in section 2.5.2, will be mapped onto a
rectangle to produce an undistorted image.

The table below summarises the pros and cons of the described method.

Pros 1. Can correct perspective distortion in text documents when there are no page
borders present in the document image.

2. Efficiently corrects the distortion even if the paragraphs are not fully justified.

3. The method is independent of the focal length and other internal features of
the camera being used.

Cons 1. The method does not works well on images taken using high resolution
cameras to make the undistorted document suitable for OCR.

2. It is computationally expensive.

2.5.4 Conclusions on previous work

The sections above discussed two methods to remove perspective distortion from text document
images along with other techniques involved in doing so. From the past research it has been realised
that apart from the above mentioned two methods, there has been no other research done in the field
of correcting perspective distortion in document images. Both the approaches mentioned here are
novel approaches to solve the problem however the second method, i.e., using projection profiles
together with line spacings, takes an edge over the method of finding and using illusory clues. This is
due to the fact that the former can perform even when the document is not fully justified, whereas the
latter cannot. However, the implementation of the approaches mentioned is beyond the scope of this
project in terms of the time allocated for it. This project will therefore try to implement parts of the
methods discussed above to solve the problem.

2.6 BASIC TECHNIQUES INVO LVED

Basic understanding of the problem, together with the notions congregated from the previous work,
suggests that some basic techniques will be inevitably used in this project. This section throws light
on some such procedures.

17
BACKGROUND

2.6.1 Image warping

The whole project idea is to remove some sort of geometric distortion from the image. For this
purpose, once the knowledge of the geometry is known, ‘image warping’ will be implemented to
create an undistorted image. Image warping is a burgeoning field of image processing which deals
with geometric alterations of images. The increasing availability of powerful computers and advanced
graphics stations has broadened the vistas of image warping to create special effects in real-time
videos.

Image warping involves mapping a set of control points from the reference image I(x, y) onto a set of
points in the target image I’(x’, y’). This method is also referred to as forward mapping. This
transformation can be represented in the form of equations as follows:

x’ = a1 x2 + a2 y2 + a3 xy + a4 x + a5 y + a6
y’ = a7 x2 + a8 y2 + a9 xy + a10 x + a11 y + a12 (Equation 2.14)

There are twelve unknown coefficients in this quadratic equation and it represents a quadratic warp.
In such a case, six control points would be required in both the images to determine the unknown
coefficients [14]. Substituting the coordinates of these six control points in equation 2.12 would give
twelve equations and thus would be sufficient to find the twelve unknowns. The type of equation
stated in 2.12 represents a quadratic warp. This will induce complex distortions in the image being
warped, for example polynomial curves. Similarly, different degrees of warps are used to induce
different types of distortions in the images. For instance, cubic warps are used to remove the
pincushion and barrel distortions in images that are induced by the camera lens.

In this project, once the text region or the borders of the document page are identified, quadrilateral-
to-rectangle mapping can be used to warp the original image to give the undistorted text image. This
warping can be done using a transformation that would map the identified quadrilateral, which either
represents the border of a text grouped as a paragraph or the actual page boundaries, onto a rectangle
in a target image so that the undesired perspective distortion is removed. Such mapping has been
discussed by in Kim et al [15] in their work and has been detailed in the next section.

18
BACKGROUND

2.6.2 Quadrilateral-to-Rectangle Mapping

Quadrilateral-to-rectangle mapping implements the basic fundamentals of perspective geometry and


vanishing points to perform a transformation of a quadrilateral in the source image I(x, y) to a
rectangle in the target image I’(x’, y’). This perspective transform has eight degrees of freedom, i.e., it
involves determining eight unknown coefficients. These eight coefficients can be determined using
four corresponding points in the source and the target images. As Kim et al has mentioned, a general
planar perspective transform can be written in the matrix form as,

(Equation 2.15)

and for the perspective transformation M, the forward transformations will be,

(Equation 2.16)

To perform the required, quadrilateral-to-rectangle mapping, a unit square-to-quadrilateral mapping


[10] is scaled, translated and reversed. The unit-square-to-quadrilateral mapping is performed between
the points (0,0), (1,0), (0,1), (1,1) in the reference image and (x0’, y0 ’), (x1 ’, y1 ’), (x2 ’, y2 ’), (x3’, y3’), on
the target image. The perspective transformation is given by:

(Equation 2.17)
Where,

(Equation 2.18)

(Equation 2.19)
and,

19
BACKGROUND

(Equation 2.20)

(Equation 2.21)

This transformation can then be applied to map a rectangle with coordinates (x0, y0), (x1, y1), (x2, y2),
(x3, y3 ) to a quadrilateral with coordinates (x0 ’, y0 ’), (x1’, y1 ’), (x2 ’, y2 ’), (x3 ’, y3 ’) in the target image. The

intended transformation is shown in Figure 2.9.

Figure 2.9: Quadrilateral -to-rectangle mapping using transformation M. (Source: [16])

This rectangle -to-quadrilateral mapping can be obtained by taking the unit-square-to-quadrilateral


mapping and scaling and translating it as follows:

(Equation 2.22)
Here,

20
BACKGROUND

(Equation 2.23)

(Equation 2.24)
and,

x’ = u’/w’ and y’ = v’/w’ (Equation 2.25)

Finally, this transformation can then be used to find the quadrilateral-to-rectangle mapping by
reversing the mapping as:

(Equation 2.26)

This type of image warping is inevitably used to correct perspective distortion and hence will be of
grave relevance to this project.

2.6.3 Interpolation

In forward mapping, each point corresponding to the reference image is mapped onto to the target
image. However, depending on the nature of the transformation, on one hand, two or more points
from the reference image may map onto the same point in the target image, and on the other, a few
points in the target image may not be mapped onto at all. This would create holes in the target image
corresponding to the points which will not be mapped onto. Thus this data will need to be filled
(interpolated) somehow in order to produce an image without any holes. The remedy to this problem
is to perform a backward mapping. In backward mapping, an inverse mapping is used to trace the
points of the target image back to the points in the reference image, i.e., a point (x, y) will be mapped
on to each point (x’, y’). Let T be the transformation applied to the initial image, so x and y would be
given as:

21
BACKGROUND

x = x ’ * inv(T)
and y = y’ * inv(T) (equation 2.27)

where inv(T) represents the inverse of mapping T.

This way it can be ensured that each of the target image points have some value corresponding to the
original image. Nonetheless, when this is done, a point in the target image may trace back on to non-
integer values, i.e., x and y may have value including non-zero value after the decimal place. These
points cannot be traced back to the initial image since the points in an image are represented by
integer coordinates. So these points have to be rounded to the nearest integer value which would mean
that the pixel value of the integer point closest to these x and y values will be mapped onto the point
(x’, y’). This process is known as nearest neighbour interpolation.

A more accurate method of approximating the value of the target image pixel is to employ a method
called bilinear interpolatio n. In this technique, the target pixel value is taken as the weighted
combination of the four nearest pixels in the reference image where the weights are directly
proportional to the distance between the pixel to be mapped onto and the nearby pixel. Figure 2.10
illustrates this.

Figure 2.10 : Bilinear Interpolation (Source: [14])

The bilinear interpolation function can be given as:


I’(x’, y’) = I(x 1 , y 1 ) + [I(x 2 , y2 ) - I(x1 , y 1 )] dx + [I(x 4 , y4 ) - I(x 1 , y1 )] dx

+ [I(x3 , y 3 ) + I(x 1 , y1 ) - I(x4 , y 4) - I(x2 , y 2 )] dxdy (Equation 2.28)

where dx = x - x 1 and dy = y - y1 .

22
BACKGROUND

Bilinear interpolation is computationally more expensive than nearest-neighbour interpolation but the
results produced are much better. Especially when the images are rotated using nearest-neighbour
interpolation, the rotated image gives a blocky look thus making the edges look jagged.

2.7 CONCLUSIONS

On the basis of the background research and considering the complexity of the involved methods, a
basic idea can be made as to what method the system would follow in order to produce the desired
output. The method would involve finding and using the hard horizontal and vertical clues, as
mentioned in section 2.5.2, to find the vertical (or horizontal) vanishing point of the document page.
The lines found this way will be used to form a quadrilateral representing the document page. The
project will then try to find the corners of the page from the quadrilateral formed. Then, the
techniques discussed in section 2.6 will be applied to calculate a transformation that will map the
points of the quadrilateral on to a rectangle along with the notion of interpolation in mind. The
transformation calculated will be applied to the original image in order to produce an undistorted
image.

The initial overview of the design of the system mentioned here states that the system will aim at
finding the page borders in order to correct the distortion. For this purpose the images of the
document will have to be taken from a distance further away from the document and hence the initial
claim, of keeping the text font size to fourteen points in section 2.4 will have to be revised. For this
purpose the camera will be placed at various distances from the document to be captured. This
resulted in reaching an optimum distance of nineteen inches from the document in order to capture the
whole document and to leave some room for the rotations that would be made to the paper in order to
induce perspective distortion. Subsequently, same document was captured in various font sizes
keeping the camera at the optimum distance of nineteen inches. The OCR best responded when the
font size was thirty-two points and hence the initial claim of keeping the font size of fourteen will be
changed to this new value.

As mentioned in the Mid-Project Report, due to a delayed start and the difficulties encountered in
comprehending the complex previous work, the background research took longer than as planned in
the project schedule in the section 1.6. To accommodate this, a new project schedule has been
planned as given shown in Figure B.2 of appendix B. Here the time allocated to the design phase has

23
BACKGROUND

been reduced to adjust the change because the background reading has given a basic idea about the
probable design of the system.

2.8 REQUIREMENTS SPECIFICATION

A set of requirements for the system acts as a blueprint for the system to be developed. These
requirements are not only important to understand the problem that needs to be solved, but also
provides guidelines on the basis of which the system can be evaluated at various stages of the project.
There are two types of system requirements – functional requirements and non-functional
requirements. Fulfilled functional requirements specify that the system is a working one and satiated
non-functional requirements determine that the system is workable on.

2.8.1 Functional System Requirements

Functional requirements are the “must have” requirements of the system. These requirements lay out
the most basic tasks and function the system is expected to perform. Listed below are functional
requirements of the project at hand:
• The system must correct the perspective distortion in the document image that is up to 30° out
of the image plane.
• The system should increase the accuracy of the OCR in reading a text document with
perspective distortion.
• The OCR must be able to recognise at least seventy percent of the characters in the document
image given the constraints mentioned in sections 2.4 and 2.7.

2.8.2 Non-Functional System Requirements

“Quality is not an after thought, it has to be built in right from the beginning” – Eur ing Peter Jesty

Non-Functional Requirements present a schematic and viable approach to build quality in the system.
According to ISO 9126 standards [16], six quality characteristics determine the workability of a
system and state the attributes which can be used to evaluate the final product: functionality,
reliability, usability, efficiency, maintainability and portability . All the characteristics are broad
categories which are further divided into many attributes. An overview of these characteristics is
given below:

24
BACKGROUND

• Functionality defines a set of attributes that uphold the being of the functions that fulfil the
stated needs of the user.
• Reliability defines a set of attributes that determine the competence of the system to sustain its
performance under a given set of conditions.
• Usability defines a set of attributes that ascertain the effort required to use the system.
• Efficiency defines a set of attributes that uphold the amount of resources used by the system to
give a certain level of performance.
• Maintainability defines a set of attributes that state the effort required to make changes to the
system.
• Portability defines a set of attributes that bear upon them the capability of the system to work
in a new environment.

All these attributes need to be built into the system to ensure that a fine quality product is built. These
characteristics will be used to evaluate the system once it is developed. The first quality trait is the
functionality, which is the same as the functional requirements of the system. Hence all the other
quality characteristics except this one will be considered while evaluating the system for non-
functional requirements.

2.9 SUMMARY OF BACKGROUND RESEARCH

This chapter detailed the knowledge required to understand the problem in order to device its possible
solutions. It first evaluated existing OCR packages and recognised the limitations of using web
cameras for the purpose of capturing images. Based on these limitations, the system scope was set. A
basic introduction to the problem was given and the motive of the previous work was explained. Then
the work recognised in this area was detailed. This included the approaches that have been followed to
correct the perspective distortion along with the techniques complementing these methods such as
rectangle -to-quadrilateral mapping and interpolation. These methods were evaluated and conclusions
were made to present the basic ideas that would be used to produce a solution to the problem.
Following the conclusion, amendments were made to the system constraints. Finally, the chapter
discussed the needs that the system is required to fulfil in order to be a success. The following chapter
will discuss a detailed design of the system based on these basic ideas.

25
Chapter 3 : T HE DESIGN

3.1 INTRODUCTION

Design is the third phase of the project according to the project methodology. This chapter discusses
the design of the system in detail. First it gives an overview of the design and then discusses the
various steps the system that will be required to do in order to meet the final goal. At every stage of
the design, it discusses the techniques involved in the development of that component. Lastly, the
chapter compares implementation technologies that will be employed to construct the system and
justifies the chosen technology.

The system will be provided with an initial grey level image with the undesired perspective distortion.
To achieve the main goal of removing the perspective in the document image, the overall design of
the system can be sub-divided in the following objectives:

• Original input image will be provided to the system.

• Segmentation of the features of interest from the original image will be performed.

• The corners of the document page will be detected.

• The corners of the quadrilateral representing the document page will be mapped on to a
rectangle to correct the perspective distortion.

• The missing data in the output image will be interpolated.

• The output image will be passed to the OCR package.

The following sections discuss the above mentioned objectives in detail.

3.2 SEGEMENTATION

In most vision applications, it is very useful to differentiate between the parts of the image we are
interested in from the unwanted ones. This is known as segmentation. Thresholding is a very
convenient technique to perform segmentation. This is effective in cases where the foreground in the

26
DESIGN

image, usually the region of interest, has grey level intensity different to that of the background.
Thresholding an image produces a binary output image. In this project, first a binary output image
will be produced after which the edges of the document will be needed to be found. This can be done
using edge detectors. These techniques are discussed in details below.

3.2.1 Thresholding

In thresholding, a certain grey level value is set as a limiting value. All the pixels in the initial image
with grey level intensity values below this limiting value are set to 0, the minimum grey level
intensity representing black, in the target image, and the pixels with grey level value above this value
are set to 255, the maximum grey level intensity signifying white. This results in the desired binary
image, with the bright white foreground pixels on a dark background or vice versa. In more complex
situations multiple thresholding values can also be set to determine the band of intensities that need to
be mapped onto white or black.

However, the choic e of the threshold (limiting value) is very critical in this process. This can be
determined by looking at image histograms. Histogram is a graph screening the number of pixel with
a particular grey level intensity. In an 8-bit greyscale image there are 256 grey level intensities, hence
a histogram for such an image would give the number of pixels with each of these 256 intensities. An
image with two distinct grey level intensities will produce a histogram with two well-defined peaks.
In such a case the grey level intensity at the lowest point of the trough between these two peaks can be
chosen as the threshold which would effectively separate the desired regions of the image. In case
there are more than two peaks then two or more threshold values can be chosen to determine the range
of grey levels that need to be segmented. On the other hand, if the grey intensities in the image are not
distinctly present in the image then overlapping peaks are produced in the histogram. Figure 3.1
shows the three types of histograms with T1 and T2 as optimal thresholds.

Figure 3.1: Histograms with (a) two well defined peaks, (b) more than two peaks, (c) overlapping peaks.
[18]

27
DESIGN

When there are overlapping peaks in the histogram, then adptive thresholding is used to determine the
threshold value. In this process, the original image is divided in smaller regions and a local threshold
is set for each of these regions. The basic idea is that for naturally taken images, under different
lighting conditions, smaller regions of images have more consistent gradient, i.e., there is less
variations of grey level intensities over these smaller regions.

The system constraints mentioned in section 2.4 state that uniform lighting will be present during the
experiment. Also, the project assumptions include that bla ck text will be present on white background;
hence global thresholding will be sufficient to accurately segment the desired regions, i.e., the page,
from the rest of the image. However, this threshold needs to be calculated during the experiment,
automatic ally, according to the image. This can be done using the iterative threshold method.

(a) (b)

Figure 3.2: (a) A grey level image (b) Corresponding thresholded image using iterative threshold method.

3.2.2 Edge Detection

Edge detection is a vastly used application in the Image processing industry. Edge detection is done
by locating the places in an image where the colour (in case of colour images) or grey level (in case of
greyscale images) varies suddenly. The elements of interest such as solid objects, shapes, shadows,
etc. generally produce variations in the colour or intensity of grey level and hence it is of utmost
relevance to find their edges in order to identify and label them as separate regions, for which edge
detection is used. Various types of edge detectors have been developed namely the Prewitt, Roberts,
Sobel and Canny edge detectors. The simplest, most widely used edge detector is the Sobel edge
detector which will be used for the project.

After the original text image has been thresholded, the borders of the document page need to be
identified. This can be done using the Sobel edge detector.

28
DESIGN

3.2.3 Determining the page corners

The next step after doing the basic image processing and segmenting out the page borders is to use the
page borders to determine the page corners. This can be achieved by finding the equations of the lines
representing the edges of the pages and subsequently find the corners as the intersection points of
these lines. To do the required task, we can use a technique called linear regression.

3.2.4 Linear Regression

Regression is a method used to predict a dependent variable on the information available about one or
more independent variables. In linear regression, the dependent variable is a linear function of one or
more independent variables. For example, when two points are given in a 2D-plane then there is at
least one line that passes through these points, but when there are three or more points in the plane,
then there may not exist a single straight line that contains all three points. In such a case, regression
is used to fit a line through these points which bets fits the given set of points. This is usually done by
a method called the least-squares method. When a single line is drawn which passes close to all these
points, the points not lying on the line will either lay above or below the line. The distances of these
points are measured from the line, which give the error of estimation for each point. The best fit of the
line would be the line that would minimize this error, but, since some errors will be negative (lying
above the line) and some will be positive (lying below the line), these errors are squared and the sum
of their squares in minimised to determine the best fitting line. Figure 3.3 shows an example of
regression line fitting.

Figure 3.3: A diagram showing a regression line fitting a set of points (Source: [17])

29
DESIGN

Following text determines how the application of regression fits the concerned problem. As
mentioned in Section 2.4, there will be no skew present in the document page, hence the equations of
the top edge of the document page can be the equation y = <value> where <value> is the y coordinate
of centre pixel of the top edge. Similarly, the bottom equation of the bottom edge of the page can be
determined by the same equation where the <value> is the coordinate of centre pixel of the bottom
edge of the page.

However, due to the perspective in the document, the left and the right edges of the page will be
slanted and will require regression to determine their equations. Firstly, coordinates of five equally
spaced points will be taken from the left edge of the page. Then, using regression, a line that best fits
these points will be determined. This line will represent the left edge of the document. In a similar
way, a line that represents the right edge of the document will be determined.

After this has been done, four equations representing the four edges of the page will be present. These
equations can then be solved in pairs to find the corners of the page. The next section details the
mathematics involved in doing so.

3.2.5 Mathematics Involved

If there are two unknown variables, then a system of two equations containing these variables will be
required and sufficient to determine the unknown variables. This is based on basic Linear Algebra
principles as mentioned by Hoffman and Kunze [19].
Let two linear equations be given by:
y = m1 x + c 1 (Equation 3.1)
and,
y = m2 x + c2 (Equation 3.2)
where x and y need to be determined from the known values of m1 , c1 , m2 and c2.
From the equation 3.1, x can be written as:
x = (y – c1 ) / m1 (Equation 3.3)
This value of x can be substituted in the equation 3.2 to give:
y = m2 * ((y – c1 ) / m1 ) + c 2 (Equation 3.4)
which can be solved to find out y.
This value of y can be then substituted back into equation 3.1 to get the value of x.

30
DESIGN

The above mathematics can be used to find the page corners by selecting pairs of equations from the
four equations that were determined using regression.

The next step after locating the coordinates of the corners of the page would be to map these
coordinates onto a rectangle. The following section discusses this aspect.

3.3 QUADRILATERAL- TO-RECTANGLE MAPPING

This part of the project design implements the technique discussed in section 2.6.2. The corner points
of the page found in the previous section embody a quadrilateral. This quadrilateral when mapped
onto a rectangle will produce an, undistorted, upright image of the document image. As part of this
mapping, backward mapping using either nearest-neighbour or bi-linear interpolation scheme
(discussed in section 2.6.3) will be implemented in order to fill the missing points in the output image.
Which scheme will be finally used will be decided after implementing both the strategies and
evaluating the quality of the output and the processing times.

Following the steps followed above will produce an output image with corrected perspective
distortion and this image will be passed on to the OCR package to perform character recognition.

3.4 IMPLEMENTATION TECHNOLOGY

The proposed design can be put into action using various programming languages. A few of these
potential technologies have been discussed and evaluated below:

3.4.1 Java

Java is a platform-independent object-oriented programming language invented by Sun Microsystems


around 1991 and officially released in 1995 [21]. It has entities called classes and these classes are
comprised of entities called methods that process a particular input and provide with an output on the
basis of some calculations. Classes together with their methods form objects. There are many in-built
class libraries in Java that contain classes along with their methods, which can be used directly to
build complex programs.

31
DESIGN

Java is especially renowned to create web based technologies by building on small programs called
Applets. These applets are downloaded to a java enabled computer on which the application is to be
run.

The pearsoneduc imaging library contains many image processing classes. These classes along with
standard Java classes make it possible to perform image manipulation in Java.

3.4.2 MATLAB

Matlab is an acronym for MATrix LABoratory, a language built by Prof. Cleve B. Moler at the
University of New Mexico who was an expert in numerical analysis [21]. As the name suggests,
Matlab is a language which basically deals in the manipulation of matrices.

Based on this language, The MathWorks, have built the programming language MATLAB which is a
very potent tool to perform matrix computations. It has numerous inbuilt functions with the help of
which complicated programs can be create with great ease. Moreover, it is a powerful tool for 2D and
3D graphics and the Image Processing Toolkit makes it possible to perform image manipulation
maintaining the simplicity of MATLAB programming.

3.4.3 Chosen Technology

Both the technologies mentioned above have the capability of performing image processing using the
inbuilt class libraries and packages. However, MATLAB code is far more succinct than an analogous
code written in Java programming language. Moreover, image processing involves great deal of
matrix manipulation and MATLAB is better in carrying out calculations with matrices. Apart from
these factors, the author has knowledge about Java programming Language through the course
“Object-Oriented Programming (SO21)” studied at the second year at University Of Leeds hence
working in MATLAB would add to the knowledge of the author. Considering all these factors,
MATLAB has been chosen as the programming language to implement the design discussed in this
chapter.

“For the purpose of an engineer or scientist, MATLAB has the most features and is the best
development program in its class.” – IEEE Spectrum Magazine.

32
DESIGN

3.5 DESIGN SUMMARY

The chapter first presented the overall design of the system and then explained the design of the
system in detail. It discussed the procedures and tools involved in the development of various design
components. Lastly, the chapter compared technologies that will be employed to implement the
system and justified the chosen technology. The next chapter gives a comprehensive discussion on
how the proposed design is executed using the chosen development tool.

33
Chapter 4 : IMPLEMENTATION AND T ESTING

4.1 INTRODUCTION

Following the incremental waterfall model, next step after a feasible design has been made, is to
implement the design and then to test it. This chapter details the execution of the prescribed design. It
presents a systematic implementation of the design components along with their testing. followed by
detailed implementation and testing of According the project process mentioned in Section 1.5.5, the
implementation of the system will be broken down into smaller parts that will be separately
implemented and tested. The chosen programming language was MATLAB and hence the following
execution will discuss how it will be implemented in the language.

Before these components are implemented, test data needs to be collected, with which, each of the
components will be tested individually so as to ensure that each component works well before the
system is integrated as a whole. Figure D.1 shows the apparatus used in the project to capture images.
The setup consists of a web camera mounted on top of the monitor and a TV antenna as the platform
to place the document pages intended to be captured. The flexibility of the antenna allows the page to
be rotated out of the image plane. The environment to capture various images has to be kept constant
in order to ensure that other factors do not impede the performance of the system. For all experiments
the following constraints will be applicable:

1. The camera will be placed 19 inches from the document plane (as mentioned in section 2.7).

2. The camera will be focused on the centre of the document page.

3. Illumination will remain constant.

4. Image resolution will be kept at 640x480.

5. The font size will be 32 pts (as mentioned in section 2.7).

6. OCR package used will be ‘Page Cam’.

A number of images were captured using the standard apparatus of the project. The images were
captured at different angles of orientation of the page (0º - 50º) from the camera axis. The degree of

34
IMPLEMENTATION AND TESTING

rotation was measured using a protractor and a compass. Appendix D shows this sample test data.
This data set is not exhaustive and a complete set will be collected at a later stage to test the integrated
system.

In accordance with the design mentioned in the previous chapter, implementation will be divided into
following phases and each of these segments will be tested before the next component is developed.

• Thresholding

• Edge Detection

• Regression fitting

• Corner detection

• Quadrilateral-to-Rectangle Mapping

Figure 4.1 shows diagrammatic view of the implementation and testing process.

Figure 4.1 : Implementation and Testing process

4.2 COMPONENTS IMPLEMENTATION AND TESTING

4.2.1 Thresholding

35
IMPLEMENTATION AND TESTING

The design mentions implementing the global iterative threshold method to segment the page from
the rest of the image. The optimal threshold of the image was calculated using the isodata [22]
method of MATLAB language. This optimal threshold was then used to produce a thresholded image
by mapping the pixels with grey level below this threshold to 0 and the ones above that to 255. The
results produced are shown by Figure 4.2 which illustrates a grey scale image thresholded using this
method where the calculated threshold value was 145.

(a) (b)

Figure 4.2: (a) Original greyscale image rotated at 30º from the camera axis, (b) Thresholded image with
a threshold of 145.

This method was tested for all the images in the test data collected. The threshold calculated and the
thresholded image thus produced, were found to be optimum in each case. Table 4.1 summarises the
test results for each of the test images.

Table 1: Test Results for iterative thresholding

Test Image Calculated Optimal Threshold Threshold Outcome

Image 1 145 Success


Image 2 146 Success
Image 3 146 Success
Image 4 147 Success
Image 5 149 Success

36
IMPLEMENTATION AND TESTING

4.2.2 Edge Detection

The edges of the page were detected using the Sobel Edge Detector which is implemented in the in-
built function of MATLAB called edge. This method was implemented in the program and was test
for the threshold images produced in the previous phase. Figure 4.3 illustrates the effect of
performing edge detection on the threshold image produced in Figure 4.2(b).

Figure 4.3 : Edge Detection image produced from Figure 4.1(b)

Each of the test images produced good edges. A good edge is defined as a continuous edge, without
any breaking points.

4.2.3 Regression Fitting

As mentioned in the design, first a set of points need to be detected to which a regression line will be
fitted. To collect such points on the left and the right lines, the following piece of code was executed
where m and n are the dimensions of an (m x n) image:

for each vertical position = (m/2-40), (m/2-20), m, (m/2+20), (m/2+40)


Scan the image from left to right
Mark the first white pixel encountered as the left edge pixel
Mark the last white pixel encountered as the right edge pixel
Append the pixels found to respective arrays
End for

37
IMPLEMENTATION AND TESTING

The code will save the points of the left and right edges in respective arrays. These points are then
used to fit a regression line to them. The in-built backslash operator ( \ ) in MATLAB performs the
regression functionality. Regression finds the equations of the form y = m x + c for the left and the
right edges of the page.

The top and the bottom edges of the pages are represented as the equation y = c since they will be
parallel to the x-axis (discussed in Chapter 3, Section 3.3.1). The following pseudo code finds the
respective points (c) on the top and the bottom edges of the page.

At the vertical centre of the image scan from top to bottom


Mark the first white pixel encountered as the top edge pixel
Mark the last white pixel encountered as the bottom edge pixel

4.2.4 Corner Detection

Based on the concepts of Mathematics discussed in Section 3.3.2, the corners of the pages were
determined by writing simple code which used the equations found earlier using regression.
Figure 4.4 demonstrates corner detection in the image shown in Figure 4.2; the red circles encircle
the detected corner points. Note that in the given example, the top left corner has not been shown, as it
lies outside the image.

Figure 4.4 : (a) Image from figure 4.2, (b) Depicts corner detection in the edge image

4.2.5 Quadrilateral-to-Rectangle Mapping

38
IMPLEMENTATION AND TESTING

The final step after the detection of the corner points of the page was to map it onto a rectangle so that
an undistorted image is produced. This was implemented based on the mathematics and principles
mentioned in section 2.6.2 along with the mapping techniques discussed in section 2.6.3.

The implementation of the quadrilateral-to-rectangle mapping technique took time lo nger than that
had been expected and planned. This was due to inconsistencies in the referred document mentioned
in section 2.6.2. These errors were identified after a great deal of testing and implementation of the
technique and excessive brain racking. The mapping shown in the section 2.6.2 which was taken from
Kim et al [15] did not have the points in the cyclic manner and no where in their work was it
mentioned either that the points need to be in a cyclic order. This was later realised during trial and
error methods and Figure 4.5 shows the correct mapping.

Figure 4.5 : Corrected version of the mapping shown in Figure 2.8

Moreover, the equation 2.10, given by the text was incorrect which was corrected as given by
equation 4.1 below:

(Equation 4.1)
These corrections were later found to be in accordance with another paper on the topic presented by
Heckbert [23]. The corrected version of the code was then implemented and it succeeded in doing the
required task of mapping a quadrilateral on to a rectangle. However, due to this delay the project
process had to be rescheduled. Figure B.3 shows the revised plan.

As mentioned in the design (section 3.4), both the interpolation schemes were implemented and tested
here. It was observed that the implementations of the two techniques in MATLAB took almost equal
computational time but bilinear interpolation produced a smoother image than nearest neighbour

39
IMPLEMENTATION AND TESTING

interpolation. Figures 4.6 (a) and (b) show same parts of two images produced using the two
interpolation methods. It can be clearly seen that the corresponding marked areas are smoother in
4.6(b). Hence bilinear method was chosen as the final implementation scheme. Figure 4.6 (c) shows
the final undistorted image produced using bi-linear interpolation and shows the part used to compare
the interpolation schemes.

Figure 4.6: (a) Nearest-neighbour interpolation, (b) Bi-linear interpolation, and (c) Final image produced
after mapping the original Image shown in Figure 4.1(a) onto a rectangle using bi -linear method.

The code was then tested for all the images in the test data and all the images produced rectangular
undistorted images for the given distorted images.

4.3 SYSTEM TESTING

According to Duetsch [24], software development encompasses a sequence of building tasks which
provide with a colossal opportunity of inducing human errors due to the inability of humans to
perform and commune perfectly. Hence to certify that the produced system maintains quality
standards, it is an utmost necessity to test it for such errors. Boehm [25] suggests that system testing
also verifies and validates the system. Verification realises if the product is being built right where as
validation observes if the right product is being built.

All the individual components of the system have been implemented and tested positively. This
section aims to test the system as a whole using the OCR package. This would involve passing the

40
IMPLEMENTATION AND TESTING

undistorted images produced by the system to the OCR engine in order to recognise the text. The
criteria for success in testing can be traced back to the functional requirements of the system as
mentioned in section 2.8.1. A system is said to have tested positively if it satisfies all the functional
requirements.

According to the functional requirements the success criteria is described as follows:


• For the given input image with perspective distortion, the system produces an output image
free of the distortion. The system does this for images with at least up to 30° rotation out of
the image plane.
• The OCR accuracy is increased by the system.
• The OCR package recognises at least 70% of the characters in the improved text document
image.

4.4 TEST PARAMETERS

All the conditions mentioned in section 4.1 will remain applicable for the rest of the duration of the
testing process as well.

There are two parameters involved in the performance of the system:


1. The axis of rotation: Assuming the camera axis to be the z-axis, the document image can be
rotated either along the x-axis or the y-axis.
2. Degree of Rotation: Along the axes, the document can be rotated to different degrees of
rotation. This includes turning the page in the positive and negative directions.

Figure 4.7 illustrates examples of such rotations.

(a) (b) (c) (d)

Figure 4.7: Images rotated along (a) x-axis +30°, (b) x-axis -30°, (c) y-axis +30° and (d) y-axis -30°.

41
IMPLEMENTATION AND TESTING

The contents of the document pages can be thought of as a test variable, i.e., documents with non-
textual such as pictures can be considered for testing the system. However, the algorithm used in the
system is independent of the content of the page and only depends on the page borders. Hence testing
for documents containing non-textual information is not required.

4.5 TEST DATA

Test data was collected by varying the value of parameters mentioned in the previous section. Images
were captured by rotating the document page in the range of -50° to +50° along the x-axis and -40° to
+40° along the y-axis. Appendix D contains samples of the test data and Appendix E contains an
example of the whole process on an image which was rotate +20° along the y-axis.

4.6 TEST RESULTS

The table below shows the test results when the system was tested with the test data.

Legend: Red = failure, Green = success


Degree of OCR performance (%)
Serial No. Rotation axis
Rotation (degrees) Before Correction After Correction
1 x +10° 100 100

2 x +20° 96 97

3 x +30° 96 97

4 x +40° 77 88

5 x +45° 70 75

6 x +50° 61 66

7 x -10° 100 100

8 x -20° 97 98

9 x -30° 96 98

10 x -40° 76 85

42
IMPLEMENTATION AND TESTING

Degree of OCR performance (%)


Serial No. Rotation axis
Rotation (degrees) Before Correction After Correction
11 x -45° 69 73

12 x -50° 63 66

13 y +10° 98 100

14 y +20° 96 98

15 y +30° 64 97

16 y +35° 45 81

17 y +40° 15 72

18 y +45° 10 57

19 y -10° 98 100

20 y -20° 94 98

21 y -30° 62 96

22 y -35° 50 84

23 y -40° 17 74

24 y -45° 12 58

4.7 RESULTS ANALYSIS

The system test results show successes and failures at various points. In cases when the document was
rotated along the x-axis, the OCR package performed fairly well on original images but the system-
produced images enhanced the performance of the system in all occurrences. System produced
positive results up to 45° where more than seventy percent of characters were recognised by the OCR
engine. On the other hand, when the document page was rotated along the y-axis, the system showed
drastic improvement in text recognition. In the original images, with the perspective distortion, the
OCR package’s accuracy plummeted steeply for images that were rotated more than 20° out of the
image plane. In contrast, the images produced by system succeeded for rotations 40° out of the image
plane.

43
IMPLEMENTATION AND TESTING

The sudden down fall in OCR’s performance for images rotated along the y-axis was due to the fact in
these images (as shown in Figure 4.8), the lines towards the top and the bottom of the page of text
were at a big angle with respect to the horizontal. The OCR package is incapable of reading lines with
more than ±2° of rotation along the z-axis [7]. The files produced by the system, after correcting the
perspective distortion, had all the lines parallel to the horizontal and hence gave good results with the
OCR.

Figure 4.8: Illustrating perspective

System testing shows that the program satisfies all the functional requirements and thus is marked as a
success.

4.8 SUMMARY

The chapter saw a schematic implementation of the various components of the system along with their
individual testing. It then detailed the process of testing the integrated system starting with the
importance of system testing. Subsequently, the variables that affect the performance of the system
were mentioned. The chapter then gave the test data that was collected by varying these variables. It
then asserted the test results gathered by testing the system with the collected data set. Finally, the test
results were analysed and conclusions were made. The next chapter makes an attempt to evaluate the
project and the developed system.

44
Chapter 5 : EVALUATION

5.1 INTRODUCTION

This chapter aims at evaluating the project as a whole and the developed software program (or
system). The project is evaluated against the minimum requirements mentioned in section 1.4 and the
system is evaluated against the functional and non-functional requirements discussed in section 2.8.
Subsequently, the chapter throws light on the possible enhancements that can be made to the system
and the scope of future work in this field. Lastly, it summarises the evaluation in the last section.

5.2 EVALUATION OF THE PROJECT

Minimum requirements were delivered in the mid-project report which were not modified and are re-
stated as is in section 1.4 of this report. The criterion for the success of the project was to fulfil these
minimum requirements. The first minimum requirement states that the problems confronted with the
performing OCR and the limitations of using web cams to capture document images need to be
understood. These requirements have been fulfilled by the sections 2.2 and 2.3 of this report. The
second of the minimum requirements was to acquire and develop an understanding of the previous
work done in the field of solving perspective distortion for OCR. This requirement has been met and
has been detailed in the sections 2.5 and 2.6. The next requirement was to produce and implement an
algorithm to correct the perspective distortion. The design of such an algorithm has been discussed in
Chapter 3 and the implementation of the same in Chapter 4. In the implementation of the algorithm,
various problems were encountered which have been mentioned in Chapter 4. The major problem was
to correct the formulae and concepts mentioned in the past work which was used as a part of the
system. The identification and correction of such errors was beyond the expected scope of the project
and displays extended effort in coping with the encountered impediments to the project. Minimum
requirements number 4 and 5, state that under normal lighting conditions OCR will perform well for
documents rotated at various degrees out of the image plane. This has been fulfilled and detailed in
section 4.3 of the report.

45
EVALUATION

The above evaluation states that all the minimum requirements were met by the project and hence
evaluates the project as a success.

5.3 EVALUATION OF THE S YSTEM

The success criteria for the system were laid down in the Requirement Specifications mentioned in
section 2.8. This included two types of requirements; functional and non-functio nal. The system will
be evaluated separately against the two types of requirements.

5.3.1 Against functional requirements

First functional requirement stated that the system should be able to correct perspective distortion in
images that were rotated up to 30° out of the image plane. The test results given in section 4.6
manifest that the system could correct perspective distortion in images that were rotated up to ±45°
out of the image plane along the x-axis and up to ±40° along the y-axis. This shows that the system
has not only satisfied the requirement but has exceeded it by achieving higher results.

The second functional requirement stated that the system should increase the accuracy of the OCR
package. This is again clearly evident from the test results.

The third functional requirement asserted that the system should be able to recognize at least seventy
percent of the characters in the produced undistorted document image. Once again this can be directly
seen from the outcome of the test.

5.3.2 Against non-functional requirements

As mentioned in section 2.8.2, the system will be evaluated for the non-functional requirements
beginning from the second quality characteristic laid down by ISO 9126.

Reliability
System constraints had been set as mentioned in section 2.4 and later in section 4.1. While the system
is operated within these constrains it has been found to work well as seen in the previous section. The
system is not required to work outside these settings and hence this trait is not applicable for system
evaluation there.

46
EVALUATION

Usability
Setting up of the environment to capture images takes time. However once this is done, the system
itself is operated by typing just one command at the command prompt, which does not require any
effort indeed. Hence the system is found to be easily-to-use.

Efficiency
From the tests done, it was noted that the OCR package takes about 6-7 seconds to perform OCR on
the text documents. The correction algorithm takes similar amount of time to correct the image. The
trade-off is between time-required and quality-achieved which is perfectly acceptable for the achieved
results. Hence the program is classified as ‘efficient’.

Maintainability
The program has been implemented in MATLAB, which is itself an easy-to-understand language.
Moreover the code produced is modular and all the modules have been commented appropriately to
be understood even by a novice programmer. Hence it is easy to maintain the code by anyone who
reads it for the first time. Thus the system is ‘maintainable’.

Portability
The system is built in MATLAB and requires to be executed in the same. Hence it is as portable as the
package itself. The official MATLAB website [26] states that MATLAB is compatible to work with
various platforms such as Windows, Linux, Unix, Macintosh and Solaris, which constitute majority of
platforms used today. As the system is as portable as the package itself, thus the system is categorised
as highly ‘portable’.

The system is found to satisfy all the applicable non-functional requirements and hence is said to be
evaluated positively.

5.4 FUTURE W ORK

The project has presented an approach that has begun the exploration in the field of performing OCR
on images suffering from perspective distortion and has provided with a good foundation for
prospective students wanting to further explore the research area.

47
EVALUATION

Now-a-days, handheld devices with built-in camera technology, such as PDAs, mobile phones, etc.,
are becoming increasingly popular due to their portability and cost effectiveness. The project can be
further studied into to apply perspective distortion corrections to images captured using contemporary
handheld devices. These devices pose greater problems, the biggest being of poor image resolution.
Thouin and Chang [27] have presented an approach to restore low resolution images for the purpose
of OCR by producing strong bimodal images.

More realistically talking, the images may be captured under poor lighting conditions that would
introduce lighting variations like shadows in the document image. This is another area of the
application that needs to be explored in order to be able to apply this to real world scenarios.

The system developed does not deal with images containing skew distortions. Nevertheless, the
previous work mentioned in the background chapter mentions the two approaches that have been
made in order to restore document images with both perspective and skew distortions. The system can
be further developed to implement these techniques.

The project can also be evolved to cater to distortions due to different document orientations and with
varied types of paper quality such as glossy, wrinkled or curved media. Furthermore, the assumptions
for the project include for the document to have black text written on white background, which can be
extended to cater to multi-coloured documents.

5.5 EVALUATION SUMMARY

The evaluation of the project against the success criteria has shown that the overall project and system
were a success. The goals set initially have all been met and further enhancements to the system have
been suggested. It is tough to compare the results of the project with other approaches taken as the
combination of system constraints, the test images, and the OCR package used to evaluate, has not
been implemented by other authors and can greatly affect the results. Moreover, details of the test
results performed by other authors have not been mentioned in the published papers and are not
available elsewhere, which makes it impossible to compare the results.

48
Chapter 6 : CONCLUSION

RECAPITULATION OF THE PROJECT P ROCESS

The investigation and deployment process of the project closely followed the various stages of the
hybrid of waterfall and the incremental models to develop the novel system. Firstly, a comprehensive
background research was performed to understand the work done in the field of correcting perspective
distortion in document images and the complexities involved with using webcams and OCR packages.
Further research was devoted to identify the basic techniques of image processing used to carry out
the processes involved in performing such a task. Given the constraints, a complete implementation of
the complex algorithms used to solve the problem was (and is) not possible. Hence the important
sections were identified and chosen to be employed to build a system. The next phase was to lay out a
success criteria that would be used to evaluate the completed system and this was done through
requirements specification. The following stage and chapter of the project was to propose a design
independent of the implementation technology. This also involved evaluating various tools that could
be used to develop the system and justification for the chosen one. Implementation phase followed the
design phase adhering to the project methodology. The phase included modularisation of the
implementation process into parts that were deployed and tested separately. All the modules once
tested positive were integrated into one system and this was tested upon. The system was finally
evaluated against the success criteria laid out at the beginning of the process. Suggestions for future
work were presented for the students looking forward to extend the research in this field.

The methodology chosen for the project was most appropriate as it comprised of the timeliness
provided by the use of the waterfall model, the flexibility of the realistic model and the reduction in
implementation complexity by the use of the incremental approach at the implementation stage. This
is an ideal methodology for the type of project that was to be carried out.

The project and the system finally satisfied all the requirements stated at the early stages and thus are
considered to be successful in achieving their aim.

49
References

[1] Avison D. and Fitzgerald G., (1995), Information System development: methodologies,
techniques and tools , 2nd Edition, McGraw-Hill.

[2] Royce W., (1970), Managing the development of large software systems: concepts and
techniques, in: Proceedings IEEE WESCON, pp: 1-9.

[3] The Standard Waterfall Model for Systems Development, URL: http://asd-
www.larc.nasa.gov/barkstrom/public/The_Standard_Waterfall_Model_For_Systems_Developme
nt.htm, [13/01/2004].

[4] General Idea of Iterative Models , URL:


http://www.csis.gvsu.edu/~heusserm/CS/CS641/FinalSpiralModel97.ppt [17/01/2004].

[5] The Spiral Model, URL:


http://searchvb.techtarget.com/sDefinition/0,290660,sid8_gci755347,00.html, [18/04/2004].

[6] PageCam , URL: http://www.pagecam.com/ [18/12/2003].

[7] Nicel D., (2003), OCR from a web camera, Leeds: University of Leeds, School of Computer
Studies, pp 6 – 8.

[8] Pilu M., (2001), Extraction of illusory linear clues in perspectively skewed documents , in:
Proceedings of the 2001 IEEE Computer Society Conference. pp: I-363 – I-368 vol.1

[9] Bruce V., Green P. R, (1991), Visual Perception: Physiology, Psychology and Ecology, 2nd
Edition, Psychology Press.

[10] Sonka M., Hlavac V., and Boyle R., 1993, Image processing, analysis and machine vision,
Chapman & Hall.

[11] Modena S. and Messelodi C.M., (1999), Automatic identification and skew estimation of text
lines in real scene images, PR, Vol. 32, No. 5, pp 791-810.

[12] Clark P. and Mirmehdi M, (2001), On the recovery of oriented documents from single
images, in: Proceedings of ACIVS 2002.

[13] Bolles R. and Fischler M., A RANSAC-based approach to model fitting and its application
to finding cylinders in range data, in: Proceedings of the 7th International Conference on
Artificial Intelligence, 1981, pp. 637 – 643.

50
REFERENCES

[14] Efford N., (2000), Digital Image Processing: a practical introduction using Java, 1st Edition,
Pearson Education Ltd.

[15] Kim D., Jang B. and Hwang C., (2002), A planar perspective image matching using point
correspondences and rectangle -to-quadrilateral mapping, in: Proceedings of Fifth IEEE
Southwest Symposium on Image Analysis and Interpretation on 7-9 April 2002, pp. 87 – 91.

[16] Software Project Management, URL:


http://www.comp.leeds.ac.uk/se22/le ctures/9QualityAndDesignC.pdf [13/03/04].

[17] Linear Regression, URL:


http://people.hofstra.edu/faculty/Stefan_Waner/RealWorld/calctopic1/regression.html
[18/03/2004].

[18] Thresholding, URL: http://www.cee.hw.ac.uk/hipr/html/threshld.html [16/03/2004].

[19] Hoffman K. and Kunze R., (1961), Linear Algebra, Englewoods Cleff: Prentice-Hall.

[20] Java, URL: http://philip.greenspun.com/wtr/dead-trees/53008.htm [26/04/04]

[21] MATLAB , URL: http://ccrma-www.stanford.edu/~jos/matlab/What_is_Matlab.html


[12/04/2004].

[22] Isodata method, URL:


http://www.mathworks.com/matlabcentral/fileexchange/loadCategory.do?objectType=category
&objectId=26 [26/03/04].

[23] Heckbert P., (1989), Fundamentals of Texture Mapping and Image Warping, Master's thesis,
UCB/CSD 89/516, CS Division, U.C. Berkeley, pp. 17 – 20.

[24] Deutsch M., (1982), Software verification and validation: realistic project approaches,
Prentice Hall.

[25] Boehm B., (1979), Software Engineering: R&D trends and defense needs, in: Research
Direction in Software Technology, MIT Press, Cambridge.

[26] MATLAB requirements , URL:


http://www.mathworks.com/products/matlab/requirements.html [20/04/04].

[27] Thouin P. and Chang C., (2000), A method for restoration of low-resolution document
images, in: International Journal on Document Analysis and Recognition (IJDAR), Springer-
Verlag Heidelberg, vol. 2, no. 4, pages 200-210.

51
Appendix A: Project Reflection

The motive of this section is to provide my reflection on the project experience, the knowledge I have
gained and the lessons that I have learnt in order to advise future students undertaking this type of a
project. The project area comes under the category of a branch of Artificial Intelligence known as
‘Document Image processing’. I was suggested by few of my friends, who had done projects in past,
to take a project I was really interested in. After pursuing course modules like AR11 – ‘Introduction
to Artificial Intelligence’, AI21 – ‘Image and Speech Processing’ and AI31 – ‘Computer Vision’ , I
had developed an understanding and keen interest in the field of Artificial Intelligence, and, Computer
Vision in particular. Apart from this I also had an interest in programming. A combination of both
these aspects led me to take up the project I chose.

While choosing a project, I came across the short description of this project and read the first line –
“The project will investigate how you might perform Optical Character Recognition from a document
held in front of a camera in ordinary lighting conditions.” The first impression that I got after reading
the line was that I had to develop an OCR tool which will be capable of reading text. I found this
extremely challenging and was stimulated to take the project. After I undertook the project and
discussed in further detail with my supervisor, I understood the real problem that was to be solved and,
as can be seen from the project, it was quite different from my perception. This did shake my interest
initially, which I regained after realizing the challenge of the real problem. The lesson I learnt, and the
advice I would like to give is that, it is very important to understand the problem properly before
making the decision of undertaking the project and before starting to develop a solution.

Due to my options and interest, I had elected for modules that happened to constitute fifty credits in
my first semester and thirty credits in my second semester, apart from the forty credit project. This
proved to be a great advantage to me, because, due to prior engagements in other modules and
graduate job applications (which take a lot of time, if you are planning to take up a job), I was late in
realizing the need to start the project at an early stage during my first semester. By the end of
December I had done little to even write in the mid-project report. But by the end of the first semester
I had completed fifty credits of modules, which left me with only three modules in the second
semester and a lot more time to spend on the project as compared to the first semester. I found that
this was the case with a lot of other students who had more credits in their second semester and equal
amount of work to do in the project. Hence it was easier for me to cope with the conditions than other
people and a lot more time to write up the report at the end. At the beginning of the first semester it
seems that there are six months to do the project and even if you make a start, you won’t feel the need

52
APPENDIX A: REFLECTION

to hurry due to the time relaxation. Thus, I would recommend, firstly, taking up more credits in
the first semester than the second. Secondly, and most importantly, realizing the stringency of
time and thus managing the time throughout the final year appropriately, keeping room for
unexpected requirements (coursework, interviews, sickness and relatives!) is extremely
important.

Consult the past year reports . Reading good reports give a better idea of how to structure the report
and about general concepts used in projects. Comparing good reports to not-so-good ones can give an
understanding of do’s and don’ts of a report. But the idea is not to spend time too much time over
this!!

I added to my technical skills by learning a new programming language – MATLAB. I had never
programmed in MATLAB before this project hence it was a good learning experience. Initially I had
problems learning the new language but I gradually picked up and it came out well.

Overall, this project has been a great learning experience. I am more confident of undertaking the
responsibility of large scale projects, which might be soon required when I enter the corporate world.

53
Appendix B: Project Schedule

Figure B.1: Gantt Chart of the Project Schedule

54
APPENDIX B: P ROJECT SCHEDULE

Figure B.2: Revised Project Schedule 1

55
APPENDIX B: P ROJECT SCHEDULE

Figure 6.3: Revised Project Schedule 2

56
Appendix C: OCR Evaluation

Table C.1: Test results of ‘PageCam’ (source: [12])

57
APPENDIX C: OCR EVALUATION

Table C.1 (continued…)

58
Appendix D: Test Samples

The first five pictures in this appendix were used at the implementation stage to test the individual
components. The rest of the images were used to test the integrated system.

Figure D.1: +10° along x-axis

Figure D.2: +20° along x-axis

59
APPENDIX D: TEST SAMPLES

Figure D.3: +30° along x-axis

Figure D.4: +40° along x-axis

60
APPENDIX D: TEST SAMPLES

Figure D.5: +45° along x-axis

Figure D.6: +50° along x-axis

61
APPENDIX D: TEST SAMPLES

Figure D.7: +10° along y-axis

Figure D.8: +20° along y-axis

62
APPENDIX D: TEST SAMPLES

Figure D.9: +30° along y-axis

Figure D.10: +35° along y-axis

63
APPENDIX D: TEST SAMPLES

Figure D.11: +40° along y-axis

Other test images, which were rotate in the negative direction, were mirror images of the above given
images.

64
Appendix E: The Process

Figure E.1: Thresholded image for Figure D.8 which was rotated +20° along y-axis

Figure E.2: Edge Detected image

65
APPENDIX E: THE P ROCESS

Figure E.3: Detected corners in the image

Figure E.4: Final Image produced using backward mapping and bilinear interpolation

66

Você também pode gostar