Escolar Documentos
Profissional Documentos
Cultura Documentos
The papers highlighted in blue are more important---even if you skip all of the others, you need these on your radar.
1. Model-Free RL
a. Deep Q-Learning
i. Original Deep Q-Learning Paper (https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf)
ii. Deep Recurrent Q-Learning (https://arxiv.org/abs/1507.06527)
iii. Dueling Architectures (https://arxiv.org/abs/1511.06581)
iv. Deep Double Q-Learning (https://arxiv.org/abs/1509.06461)
v. Prioritized Replay (https://arxiv.org/abs/1511.05952)
vi. Rainbow (https://arxiv.org/abs/1710.02298)
b. Policy Gradients
i. A2C / A3C (https://arxiv.org/abs/1602.01783)
ii. TRPO (https://arxiv.org/abs/1502.05477)
iii. TRPO+GAE (https://arxiv.org/abs/1506.02438)
iv. ACKTR (https://arxiv.org/abs/1708.05144)
v. PPO (https://arxiv.org/abs/1707.06347,
https://arxiv.org/abs/1707.02286)
vi. ACER (https://arxiv.org/abs/1611.01224)
vii. Soft Actor-Critic (https://arxiv.org/abs/1801.01290)
d. Distributional RL
i. C51 (https://arxiv.org/abs/1707.06887)
ii. QR-DQN (https://arxiv.org/abs/1710.10044)
iii. IQN (https://arxiv.org/abs/1806.06923)
iv. Dopamine (code library) (paper link, code link)
f. Path-Consistency Learning
i. Original PCL Paper (https://arxiv.org/abs/1702.08892)
ii. Trust PCL (https://arxiv.org/abs/1707.01891)
h. Evolution Algorithms
i. Evolutionary Strategies (https://arxiv.org/abs/1703.03864)
2. Exploration
a. Intrinsic Motivation
i. VIME (https://arxiv.org/abs/1605.09674)
ii. Count-Based
1. Original pseudocounts paper (https://arxiv.org/abs/1606.01868)
2. Neural Density Models (https://arxiv.org/abs/1703.01310)
3. Hashing (https://arxiv.org/abs/1611.04717)
4. EX2 (https://arxiv.org/abs/1703.01260)
iii. Self-Supervised Prediction (https://arxiv.org/abs/1705.05363)
iv. Large-Scale Study of Curiosity (https://arxiv.org/abs/1808.04355)
v. Random Network Distillation (https://arxiv.org/abs/1810.12894)
b. Unsupervised RL
i. Variational Intrinsic Control (https://arxiv.org/abs/1611.07507)
ii. Diversity is All You Need (https://arxiv.org/abs/1802.06070)
iii. Variational Option Discovery Algorithms (https://arxiv.org/abs/1807.10299)
4. Hierarchy
a. Feudal Networks (https://arxiv.org/abs/1703.01161)
b. Strategic Attentive Writer (https://arxiv.org/abs/1606.04695)
c. Data-Efficient Hierarchical RL (https://arxiv.org/abs/1805.08296)
5. Fast Memory
a. Model-Free Episodic Control (https://arxiv.org/abs/1606.04460)
b. Neural Episodic Control (https://arxiv.org/abs/1703.01988)
c. Neural Map (https://arxiv.org/abs/1702.08360)
d. MERLIN (https://arxiv.org/abs/1803.10760)
e. Relational RNNs (https://arxiv.org/abs/1806.01822)
6. Model-Based
a. Learned Model
i. Imagination-Augmented Agents (https://arxiv.org/abs/1707.06203)
ii. Model-Based Plus Model-Free Fine-tuning (https://arxiv.org/abs/1708.02596)
iii. Model-Based Value Expansion (https://arxiv.org/abs/1803.00101)
iv. Stochastic Ensemble Value Expansion (https://arxiv.org/abs/1807.01675)
v. Model Ensemble TRPO (hyperlink)
vi. MB-MPO (https://arxiv.org/abs/1809.05214)
vii. World Models (https://arxiv.org/abs/1809.01999)
b. Given Model
i. AlphaZero (https://arxiv.org/abs/1712.01815)
ii. Expert Iteration (https://arxiv.org/abs/1705.08439)
7. Meta-RL
a. RL^2: Fast RL via Slow RL (https://arxiv.org/abs/1611.02779)
b. Learning to Reinforcement Learn (https://arxiv.org/abs/1611.05763)
c. MAML (https://arxiv.org/abs/1703.03400)
d. SNAIL (link to openreview)
8. Scaling RL
a. Accelerated Methods for Deep RL (https://arxiv.org/abs/1803.02811)
b. IMPALA (https://arxiv.org/abs/1802.01561)
c. Distributed Prioritized Experience Replay (https://openreview.net/forum?id=H1Dy---0Z)
d. R2D2 (https://openreview.net/forum?id=r1lyTjAqYX)
e. RLLib---Distributed RL with Ray (https://arxiv.org/abs/1712.09381)
10. Safety
a. Concrete Problems in AI Safety (https://arxiv.org/abs/1606.06565)
b. Learning from Human Preferences (https://arxiv.org/abs/1706.03741)
c. Constrained Policy Optimization (https://arxiv.org/abs/1705.10528)
d. Safe Exploration in Continuous Action Spaces (https://arxiv.org/abs/1801.08757)
e. Trial Without Error (https://arxiv.org/abs/1707.05173)
f. Leave No Trace (https://arxiv.org/abs/1711.06782)
Other:
● Unicorn: Continual Learning (https://arxiv.org/pdf/1802.08294.pdf)
● Learning by Playing (https://arxiv.org/pdf/1802.10567.pdf)