Key Papers in Deep RL (Personal Copy)

Key Papers in Deep RL
Josh Achiam, OpenAI
The papers highlighted in blue are more important---even if you skip all of the others, you need these on your radar.
1. Model-Free RL
a. Deep Q-Learning
i. Original Deep Q-Learning Paper (https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf)
ii. Deep Recurrent Q-Learning (https://arxiv.org/abs/1507.06527)
iii. Dueling Architectures (https://arxiv.org/abs/1511.06581)
iv. Deep Double Q-Learning (https://arxiv.org/abs/1509.06461)
v. Prioritized Replay (https://arxiv.org/abs/1511.05952)
vi. Rainbow (https://arxiv.org/abs/1710.02298)
b. Policy Gradients
i. A2C / A3C (https://arxiv.org/abs/1602.01783)
ii. TRPO (https://arxiv.org/abs/1502.05477)
iii. TRPO+GAE (https://arxiv.org/abs/1506.02438)
iv. ACKTR (https://arxiv.org/abs/1708.05144)
v. PPO (https://arxiv.org/abs/1707.06347,
https://arxiv.org/abs/1707.02286)
vi. ACER (https://arxiv.org/abs/1611.01224)
vii. Soft Actor-Critic (https://arxiv.org/abs/1801.01290)
c. Deterministic Policy Gradients

i. Original DPG Paper (http://proceedings.mlr.press/v32/silver14.pdf)
ii. DDPG (https://arxiv.org/abs/1509.02971)
iii. TD3 (https://arxiv.org/abs/1802.09477)
d. Distributional RL
i. C51 (https://arxiv.org/abs/1707.06887)
ii. QR-DQN (https://arxiv.org/abs/1710.10044)
iii. IQN (https://arxiv.org/abs/1806.06923)
iv. Dopamine (code library) (paper link, code link)
e. Policy Gradients with Action-Dependent Baselines

i. Q-Prop (https://arxiv.org/abs/1611.02247)
ii. Stein Control Variates (https://arxiv.org/abs/1710.11198)
iii. Mirage of Action-Dependent Baselines (https://arxiv.org/abs/1802.10031)
f. Path-Consistency Learning
i. Original PCL Paper (https://arxiv.org/abs/1702.08892)
ii. Trust PCL (https://arxiv.org/abs/1707.01891)
g. Other Directions for Combining Policy Learning and Q-Learning

i. PGQ (https://arxiv.org/abs/1611.01626)
ii. Reactor (https://arxiv.org/abs/1704.04651)
iii. Interpolated Policy Gradients (this link is way too long)
iv. Equivalence Between PG and SQL (https://arxiv.org/abs/1704.06440)
h. Evolution Algorithms
i. Evolutionary Strategies (https://arxiv.org/abs/1703.03864)
2. Exploration
a. Intrinsic Motivation
i. VIME (https://arxiv.org/abs/1605.09674)
ii. Count-Based
1. Original pseudocounts paper (https://arxiv.org/abs/1606.01868)
2. Neural Density Models (https://arxiv.org/abs/1703.01310)
3. Hashing (https://arxiv.org/abs/1611.04717)
4. EX2 (https://arxiv.org/abs/1703.01260)
iii. Self-Supervised Prediction (https://arxiv.org/abs/1705.05363)
iv. Large-Scale Study of Curiosity (https://arxiv.org/abs/1808.04355)
v. Random Network Distillation (https://arxiv.org/abs/1810.12894)
b. Unsupervised RL
i. Variational Intrinsic Control (https://arxiv.org/abs/1611.07507)
ii. Diversity is All You Need (https://arxiv.org/abs/1802.06070)
iii. Variational Option Discovery Algorithms (https://arxiv.org/abs/1807.10299)
3. Transfer and Multitask RL

a. Progressive Networks (https://arxiv.org/abs/1606.04671)
b. Universal Value Function Approximators (http://proceedings.mlr.press/v37/schaul15.pdf)
c. RL + Unsupervised Auxiliary Tasks (https://arxiv.org/abs/1611.05397)
d. Intentional / Unintentional Agent (https://arxiv.org/abs/1707.03300)
e. PathNet (https://arxiv.org/abs/1701.08734)
f. Mutual Alignment Transfer Learning (https://arxiv.org/abs/1707.07907)
g. Learning an Embedding Space for Transfer (https://openreview.net/pdf?id=rk07ZXZRb)
h. Hindsight Experience Replay (https://arxiv.org/abs/1707.01495)
4. Hierarchy
a. Feudal Networks (https://arxiv.org/abs/1703.01161)
b. Strategic Attentive Writer (https://arxiv.org/abs/1606.04695)
c. Data-Efficient Hierarchical RL (https://arxiv.org/abs/1805.08296)
5. Fast Memory
a. Model-Free Episodic Control (https://arxiv.org/abs/1606.04460)
b. Neural Episodic Control (https://arxiv.org/abs/1703.01988)
c. Neural Map (https://arxiv.org/abs/1702.08360)
d. MERLIN (https://arxiv.org/abs/1803.10760)
e. Relational RNNs (https://arxiv.org/abs/1806.01822)
6. Model-Based
a. Learned Model
i. Imagination-Augmented Agents (https://arxiv.org/abs/1707.06203)
ii. Model-Based Plus Model-Free Fine-tuning (https://arxiv.org/abs/1708.02596)
iii. Model-Based Value Expansion (https://arxiv.org/abs/1803.00101)
iv. Stochastic Ensemble Value Expansion (https://arxiv.org/abs/1807.01675)
v. Model Ensemble TRPO (hyperlink)
vi. MB-MPO (https://arxiv.org/abs/1809.05214)
vii. World Models (https://arxiv.org/abs/1809.01999)
b. Given Model
i. AlphaZero (https://arxiv.org/abs/1712.01815)
ii. Expert Iteration (https://arxiv.org/abs/1705.08439)
7. Meta-RL
a. RL^2: Fast RL via Slow RL (https://arxiv.org/abs/1611.02779)
b. Learning to Reinforcement Learn (https://arxiv.org/abs/1611.05763)
c. MAML (https://arxiv.org/abs/1703.03400)
d. SNAIL (link to openreview)
8. Scaling RL
a. Accelerated Methods for Deep RL (https://arxiv.org/abs/1803.02811)
b. IMPALA (https://arxiv.org/abs/1802.01561)
c. Distributed Prioritized Experience Replay (https://openreview.net/forum?id=H1Dy---0Z)
d. R2D2 (https://openreview.net/forum?id=r1lyTjAqYX)
e. RLLib---Distributed RL with Ray (https://arxiv.org/abs/1712.09381)
9. RL in the Real World

a. Benchmarking Deep RL in the Real World (https://arxiv.org/abs/1809.07731)
b. Learning Dexterity (https://arxiv.org/abs/1808.00177)
c. QT-Opt (https://arxiv.org/abs/1806.10293)
10. Safety
a. Concrete Problems in AI Safety (https://arxiv.org/abs/1606.06565)
b. Learning from Human Preferences (https://arxiv.org/abs/1706.03741)
c. Constrained Policy Optimization (https://arxiv.org/abs/1705.10528)
d. Safe Exploration in Continuous Action Spaces (https://arxiv.org/abs/1801.08757)
e. Trial Without Error (https://arxiv.org/abs/1707.05173)
f. Leave No Trace (https://arxiv.org/abs/1711.06782)
11. Imitation Learning and Inverse Reinforcement Learning

a. MaxEnt IRL Thesis (link)
b. Guided Cost Learning (https://arxiv.org/abs/1603.00448)
c. GAIL (https://arxiv.org/abs/1606.03476)
d. DeepMimic (link)
e. VAIL (https://arxiv.org/abs/1810.00821)
f. One-Shot High-Fidelity Imitation Learning (https://arxiv.org/abs/1810.05017)
12. Bonus: Classic Papers in RL Theory or RL Review

(Not necessarily deep RL, but foundational nonetheless!)
a. Policy Gradient Methods for RL with Function Approximation (link)

b. TD Learning with Function Approximation (link)
c. RL of Motor Skills with Policy Gradients (link)
d. Approximately Optimal Approximate RL (link)
e. A Natural Policy Gradient (link)
f. Algorithms for Reinforcement Learning (Szepesvari) (link)
Other:
● Unicorn: Continual Learning (https://arxiv.org/pdf/1802.08294.pdf)
● Learning by Playing (https://arxiv.org/pdf/1802.10567.pdf)

Key Papers in Deep RL (Personal Copy)

Enviado por

Dados do documento

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Key Papers in Deep RL (Personal Copy)

Enviado por

Direitos autorais:

Formatos disponíveis

Key Papers in Deep RL

Josh Achiam, OpenAI

c. Deterministic Policy Gradients

e. Policy Gradients with Action-Dependent Baselines

g. Other Directions for Combining Policy Learning and Q-Learning

3. Transfer and Multitask RL

9. RL in the Real World

11. Imitation Learning and Inverse Reinforcement Learning

12. Bonus: Classic Papers in RL Theory or RL Review

a. Policy Gradient Methods for RL with Function Approximation (link)

Você também pode gostar

Key Papers in Deep RL (Personal Copy)

Enviado por

Dados do documento

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Key Papers in Deep RL (Personal Copy)

Enviado por

Direitos autorais:

Formatos disponíveis

Key Papers in Deep RL

Josh Achiam, OpenAI

c. Deterministic Policy Gradients

e. Policy Gradients with Action-Dependent Baselines

g. Other Directions for Combining Policy Learning and Q-Learning

3. Transfer and Multitask RL

9. RL in the Real World

11. Imitation Learning and Inverse Reinforcement Learning

12. Bonus: Classic Papers in RL Theory or RL Review

a. Policy Gradient Methods for RL with Function Approximation (​link​)

Você também pode gostar

a. Policy Gradient Methods for RL with Function Approximation (link)