Abstract OCR

I
1.1 PROBLEM DEFINATION:
NTRODUCTION
While data can be processed very quickly by computers, the input of data into them is still very slow and tedious and has been considered as the real bottleneck in data processing. Punch key is not only slow and cumbersome but also error prone due to keying errors. License Plate Recognition (LPR) provides data capture plus preprocessing capabilities such as editing, verifying, sorting, merging reformation and balancing, offering significantly more advantages over punch card equipments. A complete LPR system consists of a scanner, the recognition component, and the LPR software that interacts with the other components to store the computerized document in the computer. Most LPR systems use a combination of hardware (specialized circuit boards) and software to recognize characters, although some inexpensive systems do it entirely through software. The process of inputting the material into the computer begins with the scanner taking a picture of the printed material. Then, during the recognition process, the picture is analyzed for layout, fonts, text and graphics. Finally, the picture of the document is converted into an electronic format that can be edited with an application software. License Plate Recognition is the translation of optically scanned bitmaps of printed or written text characters into character codes, such as ASCII. This is an efficient way to turn hard copy materials into data files that can be edited and otherwise manipulated on a computer.
In laymans terms it converts graphics or visual data, which is in human readable format, into text in computer readable format that can be manipulated by the computer or edited in a word processing program. The LPR software provides a means of reading printed characters on documents and converting them into digital codes that can be read into a computer as actual text rather than just a picture.
License Plate recognition (LPR) is one of the most popular areas of research in pattern recognition because of its immense application potential. There has been particular interest over the last decade in recognition of printed characters. Recognition of printed character is itself a challenging problem since there is a variation of the same character due to change of fonts or introduction of noise. The basic problem in character recognition is to assign the image of a given character into its symbolic class. The classes is the roman script are the upper and lower case characters, the ten digits, and special symbols such as period, comma, dollar, and pound signs, etc. Pattern recognition algorithms are used to match the basic shape and features of a given character with a stored database of feature of expected character. Most practical License Plate Recognition systems involve many different tasks, which require either integrated treatment of entire documents, or treatment of isolated words or characters. A complete text reading system includes the following major tasks: analysis of the document into its constituents, such as photographs, graphics and text; segmentation of the text into columns, paragraphs, lines, words and characters; recognition of the segmented characters; ambiguity resolution which might involve returning back to previous stages of the segmentation / recognition procedure. Other tasks also include preprocessing of the input image (gray scale normalization, noise elimination), post processing of the derived text (spelling verification or correction, sometimes incorporating customized lexicons), as well as the unavoidable interaction with human operators.
1.3 PURPOSE:
To give an idea of the power of LPR, let us take a look at a real world example. Imagine a police department that has all its criminal records stored in vast file cabinets. Although scanning millions of pages would be an expensive and time - consuming undertaking, the benefits are huge. Once the LPR system has converted the pages into computer readable text, a detective, for example, could search through the entire history in a few seconds. Manually finding a particular record might not be too difficult, but imagine a detective trying to search for all the crimes committed on a certain intersection between 8 and 8:30. This example only scratches the surface of the power of searchable text and it is only one reason that many companies and institutions are spending millions of dollars to LPR their legacy data.
Consider another example where one wishes to digitize a novel say Harry Potter overnight. He could stay up all night typing and still not finish. Or could use a high end scanner and in minutes scan all of author J. K. Rowlings works into a computer using LPR technology. Similarly there are many more technologies that require LPR software for easy and quick conversion into electronic form so that it can be conveniently manipulated.
1.4 SCOPE OF PROJECT:

There are certain limitations in the existing LPR software which are as follows: Using text from a source with font size less than 12 pts or from a fuzzy copy will result in more errors. Accept for tab stops and paragraph marks, most document formatting is lost during text scanning, (bold, italics and underline are sometime recognized) The output from a finished text scan will be a single column editable text file. This text file will always require spell checking and proof reading as well as reformatting to desire final layout. Scanning plane text files or printouts from a spread sheet usually works but the text must be imported into a spread sheet and reformatted to match the original.
In does not perform well in recognizing handwritten or fonts that look similar to handwritten text.
It cannot recognize mathematical formulae and some special characters.
We shall try to overcome these limitations and also include the following features: Template matching in LPR engine. Ability to limit the character set used in LPR. Supports input from tiff, bmp, and jpeg image files. Supports colored and grayscale images. Verifier that compares suspected errors to the original image.
The output format supported are ASCII, word and other file formats.

Abstract OCR

Enviado por

Dados do documento

Descrição original:

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Abstract OCR

Enviado por

Direitos autorais:

Formatos disponíveis

I

1.1 PROBLEM DEFINATION:

1.4 SCOPE OF PROJECT:

It cannot recognize mathematical formulae and some special characters.

Você também pode gostar