Escolar Documentos
Profissional Documentos
Cultura Documentos
Atari 2600
In 2015, an AI learned to play 29 different video
games with no instructions, just by trying the
controls and looking at the screen and the score.
Video Pinball
Boxing
Breakout
Stargunner
AlphaGo
In 2016, the AlphaGo AI defeated Lee Se-dol, the
worlds second-best Go player an achievement
comparable to Deep Blues victory over Garry
Kasparov in chess 20 years earlier.
definition
Learning is the acquisition of knowledge or skills
through study, experience, or being taught.
Oxford Dictionary
reinforcement learning
You change the world from one state to another
through your actions, and sometimes you get
rewarded for it.
The meaning of life is to get the maximum total
reward.
Reinforcement learning algorithms can gradually
explore the different states and learn the long-term
reward values of different actions.
convolutional neural
networks
Work analogously to the primary visual cortex:
successive layers extract higher-level features.
Can be many levels deep, but finding the correct
depth and structure is not an exact science.
Atari 2600
In 2015, an AI learned to play 29 different video
games with no instructions, just by trying the
controls and looking at the screen and the score.
Video Pinball
Boxing
Breakout
Stargunner
1 person
game tester, but not a professional gamer
2 hours to practice for each game
no sound
no pause, no save/load
teaching the AI
38 days worth of recorded games were split into
SARSA fragments, which were used to train a
neural network.
The state is 4 consecutive video
Aaction leading to the reward frames.
Sstate before the reward
R reward
Sstate after the reward
AlphaGo
In 2016, the AlphaGo AI defeated Lee Se-dol, the
worlds second-best Go player an achievement
comparable to Deep Blues victory over Garry
Kasparov in chess 20 years earlier.
Go
Players take turns placing stones on the board.
Stones that get surrounded are removed from the
board.
The goal is to capture the most territory and stones.
On the order of 10170 different states, compared to
1047 in chess.
teaching the AI
SARSA fragments based on 29,400,000 positions
from 160,000 games were used to teach two neural
networks. 50 GPUs worked for a month
The state is a carefully designed set of
Aaction leading to the reward parameters about each board position,
such as stone colour, turns since last
R reward
move, legality, number of liberties, etc.
Sstate before the reward
the match
The game lasted 3.5 hours.
Lee Se-dol is the worlds second-best player. He
says he does not consider AlphaGo a superior
player, but a large factor in his loss was the novelty
of playing against a non-human opponent.
1202 CPUs
176 GPUs
100+ scientists
1 human brain
1 cup of coffee