Você está na página 1de 15

Machine Learning

Alexey Badalov, 2016-05

Atari 2600
In 2015, an AI learned to play 29 different video
games with no instructions, just by trying the
controls and looking at the screen and the score.

Video Pinball

Boxing

Breakout

Stargunner

AlphaGo
In 2016, the AlphaGo AI defeated Lee Se-dol, the
worlds second-best Go player an achievement
comparable to Deep Blues victory over Garry
Kasparov in chess 20 years earlier.

Image from The Verge

definition
Learning is the acquisition of knowledge or skills
through study, experience, or being taught.
Oxford Dictionary

Learning is a change in probability of response.


B. F. Skinner

Learning is improving performance in a task with


experience.
Tom Mitchell

reinforcement learning
You change the world from one state to another
through your actions, and sometimes you get
rewarded for it.
The meaning of life is to get the maximum total
reward.
Reinforcement learning algorithms can gradually
explore the different states and learn the long-term
reward values of different actions.

human visual system

convolutional neural
networks
Work analogously to the primary visual cortex:
successive layers extract higher-level features.
Can be many levels deep, but finding the correct
depth and structure is not an exact science.

Atari 2600
In 2015, an AI learned to play 29 different video
games with no instructions, just by trying the
controls and looking at the screen and the score.

Video Pinball

Boxing

Breakout

Stargunner

the human benchmark

1 person
game tester, but not a professional gamer
2 hours to practice for each game
no sound
no pause, no save/load

The AI got 75% of his score on 29/49 games.

teaching the AI
38 days worth of recorded games were split into
SARSA fragments, which were used to train a
neural network.
The state is 4 consecutive video
Aaction leading to the reward frames.
Sstate before the reward

R reward
Sstate after the reward

The action is some combination of


Atari 2600 controls.

A the following action

The reward is either -1, 0, or 1,


depending on which way the score
changes.

The network learned to recognize situations in


which to press specific buttons to keep increasing
the score.

AlphaGo
In 2016, the AlphaGo AI defeated Lee Se-dol, the
worlds second-best Go player an achievement
comparable to Deep Blues victory over Garry
Kasparov in chess 20 years earlier.

Image from The Verge

Go
Players take turns placing stones on the board.
Stones that get surrounded are removed from the
board.
The goal is to capture the most territory and stones.
On the order of 10170 different states, compared to
1047 in chess.

teaching the AI
SARSA fragments based on 29,400,000 positions
from 160,000 games were used to teach two neural
networks. 50 GPUs worked for a month
The state is a carefully designed set of
Aaction leading to the reward parameters about each board position,
such as stone colour, turns since last
R reward
move, legality, number of liberties, etc.
Sstate before the reward

Sstate after the reward

A the following action

The action is a position for placing a


stone.
The reward is -1 if the fragment
comes from a losing game and 1 if it
comes from a winning game.

differences from Atari 2600


2 neural networks:
- a value network that judges the value of different
board positions at a glance. This serves to
replace human intuition.
- a policy network that, like in Atari 2600 gaming,
learns to recognize situations on the board and
which moves will bring the maximum reward
Monte Carlo Tree Search a forward-thinking
algorithm that evaluates moves by repeatedly
playing against itself using the value and policy
neural networks.

the match
The game lasted 3.5 hours.
Lee Se-dol is the worlds second-best player. He
says he does not consider AlphaGo a superior
player, but a large factor in his loss was the novelty
of playing against a non-human opponent.

1202 CPUs
176 GPUs
100+ scientists

1 human brain
1 cup of coffee

Você também pode gostar