Você está na página 1de 54

Deep learning overview

Nguyen Quang Uy

During the presentation

Please ask questions whenever you have.

4/15/16

Outline
This presentation provides an introduction to the machine
learning and deep learning.
Concept of Machine Learning

Artificial Neural network

Deep learning

3/11/16

Machine Learning Concept


Machine learning:

Methods that can automatically detect patterns in data


Use the uncovered patterns to predict future data,

3/11/16

Machine Learning System


Data Collection

New Data

Features
Extraction/Selection

Features
Extraction/Selection

Learning

Learnt Model

Learnt Model

Decision

a) Training phase
3/11/16

b) Testing/Deploying
5

Datasets
Often in the form of tables

Samples: A record/item in the dataset

Features: A column in the dataset presenting a property


of the object of the learning problem.

Iris flower

3/11/16

Data set: Iris


sepal
length

sepal
width

petal
length

petal Class
width

5.1

3.5

1.4

0.2

Iris-setosa

7.0

3.2

4.7

1.4

Irisversicolor

6.3

3.3

6.0

2.5

Iris-virginica

..

3/11/16

Classification Problem

Given a dataset with


all samples have
already been labeled to
several classes.

Find the label class for


new samples.

3/11/16

F1
5.1
7.0
6.3
2.1
1.0
F1
2.2
3.5

F2
3.5
3.2
3.3
1.3
1.8
F2
1.8
2.9

F3
1.4
4.7
6.0
6.2
2.5
F3
3.7
4.8

F4
0.2
1.4
2.5
5.7
2.6

Class
1
0
1
1
0

F4 Class
4.6 ?
5.2 ?
8

K-Nearest Neighbor Classifiers

Learning by analogy: Tell me who your friends are


and Ill tell you who you are

3/11/16

K-Nearest Neighbor Algorithm

To determine the class of a new sample

Calculate the distance between the new sample and all examples
in the training set
Select K-nearest examples to the new sample in the training set
Assign the new sample to the most common class among its Knearest neighbors

3/11/16

10

K-Nearest Neighbor Example


Paper tissue dataset

X1 (Acid durability)
(seconds)

X2 (Strengh)
(Kg/m2 )

Y=Classification

Bad

Bad

Good

Good

Please test the new paper tissue with X1=3, X2=7 with
K=3

3/11/16

11

K-Nearest Neighbor Example


Paper tissue dataset

X1 (Acid
durability)
(seconds)

X2 (Strengh)
(Kg/m2 )

Distance to the
new sample

Y=Classification

16

Bad

25

Bad

Good

13

Good

Please test the new paper tissue with X1=3, X2=7 with
K=3
Since K=3 we have two out of three closet samples are
good then the testing sample is good.
3/11/16

12

K-Nearest Neighbor Algorithm

There are several key issues that affect the performance of


kNN:

One is the choice of k

The distance metric

The approach to combining the class labels.

3/11/16

13

Performance Measure

Popular measure for classification is accuracy:

Accuracy =

Number of correct classified samples


Total number of samples

3/11/16

14

Overfitting and model selection

Overfitting: When model A is beter than model B on training

data, but B is better than A on testing data, A is said to be


overfitted.

Model selection: Problem of selecting a good model for


unseen data

3/11/16

15

No free lunch theorem

No free lunch theorem (David Wolpert and William Macready,


1997).

On average: the performance of all algorithms is the same and


equal to the random algorithm.

3/11/16

16

When to apply machine learning

Human expertise is absent.

Humans are unable to explain their expertise

Speech recognition, Face recognition

The problem size is too vast

Robotics on Mars

Calculating webpage ranks, matching ads Google pages

Solution changes with time

Network traffic monitoring

3/11/16

17

Outline
This presentation provides an introduction to the machine
learning and deep learning.
Concept of Machine Learning

Artificial Neural network

Deep learning

3/11/16

18

Artificial Neural Network


This learning model is inspired by the biological neural
network.

A biological neural network is a series of interconnected


neurons which interact each other to process information.

3/12/16

19

Artificial Neural Network

A Artificial Neural Network is a system composed of many


simple processing elements operating in parallel:

Each element of NN is a node called unit.


Units are connected by links and the links has a numeric weight.
The activation function of a node defines the output of that node
given a set of inputs.

3/12/16

20

Activation function
The activation function defines the output of that node
given a set of inputs.

Used to transform input to different domain where they


may be easily separable.

3/12/16

21

Popular activation functions

3/12/16

22

Nummerical Example

Neth1=0.15*0.05+0.2*0.1+0.35=0.3775

Outh1=1/(1+e-Neth1)=1/(1+e-0.3775)=0.596

3/12/16

23

Nummerical Example

Neth1=0.15*0.05+0.2*0.1+0.35=0.3775

Outh1=1/(1+e-Neth1)=1/(1+e-0.3775)=0.596

3/12/16

24

Multilayer Perceptron

An MLP consists of multiple layers of nodes in a directed


graph, with each layer fully connected to the next one.
One hidden layer neural network is the most popular
structure in MLP

3/12/16

25

Training MLP

Finding the parameters (weights) so that the objective


function is optimal.
We need to:

Define an objective function


A method for adjusting the parameters

3/12/16

26

Cost function

Cost function is the objective function for the model that we want to
find the parameters to optimize.
One popular cost function for neural network is cross-entroy cost
function:

J()=-[ylogh(x)+(1y)log(1h(x))]
where y is the objective value with the input is x
h(x) is the output of the model given x

3/12/16

27

Parameters Estimation

Find the parameters so that the cost function is minimal.

We select that:
J()=[ylogh(x)+(1y)log(1h(x))]
is minimal

3/12/16

28

Gradient Decent Algorithm


0. Start at xk, k = 0, select .

J ( xk )
1.Compute a search direction pk =

2.Update xk+1 = xk - pk
3.Check for convergence (stopping criteria) e.g. df/dx = 0
4. k=k+1, repeat step 1 to 4.

3/12/16

29

Gradient Decent Algorithm


J ( xk )
pk =

3/12/16

30

Patch Gradient Decent Algorithm


To increase the convergence speed, we often update the
parameters of logistic model after a mini patch of training
samples.
This approach is reffered to as minipatch gradient descent.

3/12/16

31

MLP model selection

Several aspects need to consider when using MLP:

The way to initialize the weights and biases.


The number of neural in hidden layers: of the number of
inputs.

The learning rate.

Stopping criteria?

3/12/16

32

Initialization MLP

For biases:

Can initialize all to zero

For weights

Should not be the same for all


Should not be zero if the activation is tank
The common recipe is to initialize wij uniformly from [-a,a]
where

6
a=
H k + H k1

3/12/16

33

How do we pick ?

The stochastic gradient decent will converge if

t =
t =1

<

t=1

Where t is the learning rate at tth update

if is a constant then the algorithm is not converged.

3/12/16

34

How do we pick ?

Decreased strategies:

1+ t

t =

1+t

t =

Where is constant, usually selected between [0.5,1]


It is often better to used fixed learning rate for first few
updates.
3/12/16

35

When to stop Backprobagation?

Some common criteria are:

The fixed number of epoch.


Stop when can not reduce more error.
Using early stopping: Stop training when the validation error
increase (with some look ahead).

3/12/16

36

Neural work for digit recognition

If the output at position


k is greatest, than the
network will recognize
the input as number k.

3/12/16

37

Universal approximation theorem


Theorem (Cybenko 1989): A feed-forward network with a
single hidden layer containing a finite number of neurons
can approximate any continuous function.
In other words, a set of weights exists that can produce
the targets from the inputs. However, the problem is
finding them.

3/12/16

38

Outline
This presentation provides an introduction to the machine
learning and deep learning.
Concept of Machine Learning

Artificial Neural network

Deep learning

3/11/16

39

DL appears in The New York Times


Scientists See Promise in Deep-Learning Programs
John Markoff
November 23, 2012

3/12/16

40

DL is the center at Microsoft Research

3/12/16

41

Leading researchers

Hitton at Google and Toronton Uni.

Lecun at Facebook

Andrew Ng. at Stamford Uni.

3/12/16

42

Successful application

Deep learning is a powerful methodology well-suited to


training deep and large networks for big data applications.
Successful applications of deep networks have already
been presented on a large variety of applications:

Computer vision: Facebook image tagging

Natural Language Processing: Google translation

Speech Recognition: Google doc (voice to text)

3/12/16

43

Multilayer Neural Network

Can we use multilayer neural network with a lot of


layers?

The anwer is yes and people have tried this but


without much success.

3/12/16

44

Problems for many layers neural network


When training MLP with many hidden layer, gradient
descent algorithm is not suitable since:
Two many parameters

A network with 1000 inputs two hidden layers with 500 nodes
and 10 output have more than 1000*500*500*10=2,500,000,000
parameters
Computationally expensive
Gradient decay quickly.

3/12/16

45

45

What is novelty of DL
1. What exactly is deep learning?
2. Why is it generally better than other methods on
image, speech and certain other types of data?

3/12/16

46

Deep Learning Overview

Deep learning is training the neural network with many


layers with some important modifications:

Network structure

Training algorithms

3/12/16

47

Deep Learning Objective

Many layers work to build an improved feature space

First layer learns 1st order features (e.g. edges)


2nd layer learns higher order features (combinations of first layer
features, combinations of edges, etc.)

3/12/16

48

48

Why is deep learning great?

Data (feature) representation is important

Features are problem dependent

Engineering require a lot of domain knowledge

Automatically learning data representation is desired

Deep learning is one of such method

3/12/16

49

49

Why does deep learning work?

Biological Plausibility e.g. Visual Cortex

3/12/16

50

50

Why does deep learning work?

Hastad proof - Problems which can be represented with a


polynomial number of nodes with k layers, may require an
exponential number of nodes with k-1 layers (e.g. parity).

*Hastad 2014, On the correlation of parity and small-depth


circuits, Society for Industrial and Applied Mathematics Journal.

3/12/16

51

51

Deep learning categorization

Deep networks for supervised learning

Deep networks for unsupervised learning

Convolutional neural network


deep-structured CRFs

Restricted Boltzmann machine


Deep autoencoder

Hybrid deep networks

Use optimization or regularization methods to alleviate


unsupervised or supervised methods

3/12/16

52

Deep learning framework

Three are some frameworks for implementing deep


learning
In out systems:
Python

Numpy

Theano

3/12/16

53

Thank you!

3/11/16

54

Você também pode gostar