Assignement 1 Machine Learning

Introduction to Handwriting Recognizer or OCR
Assignment 1
Presented to
Meritorious. Professor .Dr.Aqil Burni
Head of Actuarial Sciences
Institute of Business Management
Machine Learning Adnan Alam Khan Std_18090@iobm.edu.pk
Page 1
Machine learning is about designing algorithms that allow a computer to learn. Learning is not
necessarily involves consciousness but learning is a matter of finding statistical regularities or other
patterns in the data. Thus, many machine learning algorithms will barely resemble how human
might approach a learning task. However, learning algorithms can give insight into the relative
difficulty of learning in different environments. The performance and computational analysis of
machine learning algorithms is a branch of statistics known as computational learning theory. A
computer system learns from data, which represent some past experiences of an application
domain. Types of Machine learning are as follows:
Supervised learning: where the algorithm generates a function that maps inputs to desired
outputs. One standard formulation of the supervised learning task is the classification problem: the
learner is required to learn (to approximate the behavior of) a function which maps a vector into one
of several classes by looking at several input-output examples of the function.
Unsupervised learning: which models a set of inputs: labeled examples are not available.
Semi-supervised learning: Which combines both labeled and unlabeled examples to generate an
appropriate function or classifier?
Reinforcement learning: Where the algorithm learns a policy of how to act given an observation of
the world. Every action has some impact in the environment, and the environment provides
feedback that guides the learning algorithm.
Transduction: similar to supervised learning, but does not explicitly construct a function: instead,
tries to predict new outputs based on training inputs, training outputs, and new inputs.
Learning to learn: where the algorithm learns its own inductive bias based on previous experience.
Page 2
In the area of supervised learning which deals much with classification. These are the algorithms
types:
Linear Classifiers
Logical Regression
. ) = ( . )
Y=f(
Nave Bayes Classifier
Perceptron
Support Vector Machine
Quadratic Classifiers
K-Means Clustering
Boosting
Neural networks
Bayesian Networks
Page 3
Our focus: learn a target function that can be used to predict the values of a discrete class
attribute, e.g., approve or not-approved, and high-risk or low risk.
A credit card company receives thousands of applications for new cards. Each application contains
information about an applicant,
age
Marital status
annual salary
outstanding debts
credit rating
etc.
Problem: to decide whether an application should approved, or to classify applications into two
categories, approved and not approved.
Learn a classification model from the data

Use the model to classify future loan applications into
Yes (approved) and
No (not approved)
What is the class for following case/instance?
Accuracy
Number of correct classifica tions

Total number of test cases
Page 4
Given
a data set D,
a task T, and
a performance measure M,
A computer system is said to learn from D to perform the task T if after learning the systems
performance on T improves as measured by M.
In other words, the learned model helps the system to perform T better as compared to no learning.
Data: Loan application data
Task: Predict whether a loan should be approved or not.
Performance measure: accuracy.
No learning: classify all future applications (test data) to the majority class (i.e., Yes):
Accuracy = 9/15 = 60%.
We can do better than 60% with learning.
Decision Tree:
Page 5
Page 6
Handwriting Recognizer or OCR:

In general machine learning develops algorithms for making predictions (means statistical sense)
from data.
Confuse is it a)statistics or b) data mining a)explain the data b)Task you have to solve. In other words
we say machine learning I between a & b.
Explanation of data in ML: Data consists of data instances, representation as feature vector.
Technical definition of ML is features are chosen for specific task.ML is about generalization.
Machine learning is consist of a) Classification b) Clustering c) Regression
Classification : Data belongs to certain group or training phase result classification model.
Clustering
:Which group we have the data, close to each other data set.
Regression
:ranking of data points
No ML works with 100% precision (means chances for success)
Page 7
Page 8
Next to Binary is Trinary data
Page 9
SVM: Does not require too much training data, its training is expensive and a million instances would
be the upper bound. Note it requires parameter tuning.
Decision Tree:
Understanding Handwriting Recognition in 6 easy steps:
Step1: developing features
Center, right , left and the up in red.
Step2: Feature overlapping on numbers then removal.

Removing numbers the remaining feature is shown above.
Step 3: Arranging features are as follows.
Step 4: Now add fillers or blanks
Step 5: Confining features
Page 10
Step 6: Decision Tree.
One of the most important trends in databases is the increased use of parallel evaluation techniques
Another name of Machine Learning is Supervised Learning
Supervised learning (machine learning) takes a known set of input data and known responses to the
data, and seeks to build a predictor model that generates reasonable predictions for the response to
new data.
For example, suppose you want to predict if someone will have a heart attack within a year. You
have a set of data on previous people, including their ages, weight, height, blood pressure, etc. You
know if the previous people had heart attacks within a year of their data measurements. So the
problem is combining all the existing data into a model that can predict whether a new person will
have a heart attack within a year.
Supervised learning splits into two broad categories:
Known Data
Model
Known Responses
Model
Predicted Response
Predicted Data
Supervised learning splits into two broad categories:

Classification for responses that can have just a few known values, such as 'true' or 'false'.
Classification algorithms apply to nominal, not ordinal response values.
Regression for responses that are a real number, such as miles per gallon for a particular car.
You can have trouble deciding whether you have a classification problem or a regression problem. In
that case, create a regression model firstregression models are often more computationally
efficient.
Page 11
While there are many Statistics algorithms for supervised learning are present, most use the same
basic workflow for obtaining a predictor model:
1. Prepare Data
2. Choose an Algorithm
3. Fit a Model
4. Choose a Validation Method
5. Examine Fit; Update Until Satisfied
6. Use Fitted Model for Predictions
Prepare Data
All supervised learning methods start with an input data matrix, usually called X in this
documentation. Each row of X represents one observation. Each column of X represents one
variable, or predictor. Represent missing entries with NaN values in X. Statistics can supervised
learning algorithms can handle NaN values, either by ignoring them or by ignoring any row with a
NaN(not a number) value.
You can use various data types for response data Y. Each element in Y represents the response to
the corresponding row of X. Observations with missing Y data are ignored.
For regression, Y must be a numeric vector with the same number of elements as the number of
rows of X.
For classification, Y can be any of these data types. The table also contains the method of including
missing entries.
Choose an Algorithm:
There are tradeoffs between several characteristics of algorithms, such as:
Speed of training
Memory utilization
Predictive accuracy on new data
Transparency or interpretability, meaning how easily you can understand the reasons an algorithm
makes its predictions
Characteristics of Algorithms
* SVM prediction speed and memory usage are good if there are few support vectors, but can be
poor if there are many support vectors. When you use a kernel function, it can be difficult to
interpret how SVM classifies data, though the default linear scheme is easy to interpret.
** Naive Bayes speed and memory usage are good for simple distributions, but can be poor for
kernel distributions and large data sets.
*** Nearest Neighbor usually has good predictions in low dimensions, but can have poor predictions
in high dimensions. For linear search, Nearest Neighbor does not perform any fitting. For kd-trees,
Nearest Neighbor does perform fitting. Nearest Neighbor can have either continuous or categorical
predictors, but not both.
Page 12
Pairwise Distance
Categorizing query points based on their distance to points in a training dataset can be a simple yet
effective way of classifying new points. You can use various metrics to determine the distance,
described next. Use pdist2 to find the distance between a sets of data and query points.
Page 13
Page 14
Handwriting Recognizer or OCR

Matlab Code
%This Code is developed by Adnan Alam Khan for Machine Learning

%Course Ph.D Computer Science
% clear; % Erase all existing variables. Or clearvars if you want.
clc;
% Clear the command window.
close all;
clear all;
workspace; % Make sure the workspace panel is showing.
format long g;
format compact;
fontSize = 22;
cho=0;
possibilityy=3;
while cho~=possibilityy,
cho=menu('HAND WRITING RECOGNIZOR','UPLOAD HAND WRITTEN IMAGE
','CONVERSION','E X I T');
%||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
if cho==1,
clc;
%----Selection of image ---h = waitbar(0,'P l e a s e
w a i t . . . ');
for i=1:1000, % computation here %
waitbar(i/1000)
end;
close(h) ;
clc;
[namefileA,pathname]=uigetfile('*.*','Select Image ');
if namefileA~=0
[imagen,mapL]=imread(strcat(pathname,namefileA));
%figure('Tag','Plotting Figure');
imshow(imagen);
else
warndlg('Image must be selected .',' Warning ')
end;
end;
if cho==2,
imshow(imagen);
title('INPUT IMAGE')
% Convert to gray scale
if size(imagen,3)==3 %RGB image
imagen=rgb2gray(imagen);
end
% Convert to BW
threshold = graythresh(imagen);
imagen =~im2bw(imagen,threshold);
% Remove all object containing fewer than 30 pixels
imagen = bwareaopen(imagen,30);
%Storage matrix word from image
word=[ ];
re=imagen;
%Opens text.txt as file for write
fid = fopen('text.txt', 'wt');
Page 15
% Load templates
load templates;
global templates;
% Compute the number of letters in template file
num_letras=size(templates,2);
while 1
%Fcn 'lines' separate lines in text
[fl re]=lines(re);
imgn=fl;
% Label and count connected components
[L Ne] = bwlabel(imgn);
for n=1:Ne
[r,c] = find(L==n);
% Extract letter
n1=imgn(min(r):max(r),min(c):max(c));
% Resize letter (same size of template)
img_r=imresize(n1,[42 24]);
%Uncomment line below to see letters one by one
%imshow(img_r);pause(0.5)
% Call fcn to convert image to text
letter=read_letter(img_r,num_letras);
% Letter concatenation
word=[word letter];
end;
fprintf(fid,'%s\n',word);%Write 'word' in text file (upper)
% Clear 'word' variable
word=[ ];
%*When the sentences finish, breaks the loop
if isempty(re) %See variable 're' in Fcn 'lines'
break
end;
end;
fclose(fid);
%
%Open 'text.txt' file
%
fprintf(fid,'%s\n',word);%Write 'word' in text file
(upper)
%
% Clear 'word' variable
%
word=[ ];
%
%*When the sentences finish, breaks the loop
%
if isempty(re) %See variable 're' in Fcn 'lines'
%
break
%
end
%
fclose(fid);
winopen('text.txt');
fprintf('Computational Intelligence Project\nMade by:\n Adnan Alam
Khan Std_18090@iobm.edu.pk\n
Institute of Business Management
2015\n');
%
clear all;
end;
if cho==3,
%clc;
button = questdlg('Ready to quit?','Exit Dialog','Yes','No','No');
switch button
case 'Yes',
display('
Characters are: ');
%display(NumberOfOnes)
disp('Exiting MENU.................');
disp('......................................');
close all;
Page 16
%break ;
case 'No',
quit cancel;
end;
end;
end;
clear all;
Related Images:
Page 17
Page 18
Page 19
Page 20
Page 21

Assignement 1 Machine Learning

Enviado por

Dados do documento

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Assignement 1 Machine Learning

Enviado por

Direitos autorais:

Formatos disponíveis

Introduction to Handwriting Recognizer or OCR

Machine Learning Adnan Alam Khan Std_18090@iobm.edu.pk

Machine Learning Adnan Alam Khan Std_18090@iobm.edu.pk

Machine Learning Adnan Alam Khan Std_18090@iobm.edu.pk

Learn a classification model from the data

Number of correct classifica tions

Machine Learning Adnan Alam Khan Std_18090@iobm.edu.pk

Machine Learning Adnan Alam Khan Std_18090@iobm.edu.pk

Machine Learning Adnan Alam Khan Std_18090@iobm.edu.pk

Handwriting Recognizer or OCR:

No ML works with 100% precision (means chances for success)

Machine Learning Adnan Alam Khan Std_18090@iobm.edu.pk

Machine Learning Adnan Alam Khan Std_18090@iobm.edu.pk

Next to Binary is Trinary data

Machine Learning Adnan Alam Khan Std_18090@iobm.edu.pk

Understanding Handwriting Recognition in 6 easy steps:

Step1: developing features

Center, right , left and the up in red.

Step2: Feature overlapping on numbers then removal.

Machine Learning Adnan Alam Khan Std_18090@iobm.edu.pk

Step 6: Decision Tree.

Supervised learning splits into two broad categories:

Machine Learning Adnan Alam Khan Std_18090@iobm.edu.pk

Machine Learning Adnan Alam Khan Std_18090@iobm.edu.pk

Machine Learning Adnan Alam Khan Std_18090@iobm.edu.pk

Handwriting Recognizer or OCR

%This Code is developed by Adnan Alam Khan for Machine Learning

Machine Learning Adnan Alam Khan Std_18090@iobm.edu.pk

Machine Learning Adnan Alam Khan Std_18090@iobm.edu.pk

Machine Learning Adnan Alam Khan Std_18090@iobm.edu.pk

Machine Learning Adnan Alam Khan Std_18090@iobm.edu.pk

Machine Learning Adnan Alam Khan Std_18090@iobm.edu.pk

Machine Learning Adnan Alam Khan Std_18090@iobm.edu.pk

Machine Learning Adnan Alam Khan Std_18090@iobm.edu.pk

Você também pode gostar