Deep Learning

Review - Decision Transformer & SPIRAL

This week’s Deep Learning Paper Review is Decision Transformer: Reinforcement Learning via Sequence Modeling and SPIRAL: Self-supervised Perturbation-Invariant Representation Learning for Speech Pre-Training

Review - Decision Transformer & SPIRAL

This week’s Deep Learning Paper Reviews are Decision Transformer: Reinforcement Learning via Sequence Modeling and SPIRAL: Self-supervised Perturbation-Invariant Representation Learning for Speech Pre-Training

Decision Transformer: Reinforcement Learning via Sequence Modeling

What’s Exciting About this Paper

Traditionally, Reinforcement Learning (RL) is solved using Temporal Difference (TD) learning. Decision Transformer casts RL as a sequence modeling task, where the goal is to model the next action given the both the current and past states, actions, and reward along with the expected future reward from current time step. This method trains the agent to identify the optimal action to take given all its history and the goal (future cumulative reward) it is asked to achieve.

Photo Credit

Key Findings

The authors evaluated the Decision Transformer (DT) on Atari and OpenAI Gym benchmarks by training on trajectories collected from expert and mediocre policies. They found that DT outperforms behavior cloning and is competitive with State-of-Art Temporal Difference learning techniques. In addition, the authors found that DT is particularly capable in tasks involving long-term credit assignment. For example, on the Key-to-Door task, DT significantly outperformed both baselines and was only trained using random rollouts.

Our Takeaways

Decision Transformer (DT) is one of the first successful attempts of casting RL as a sequence modeling problem and is able to achieve State-of-the-Art performance.

The policy learned by DT is often superior to the policy used to generate the offline data because it is able to “stitch together” different parts of the suboptimal trajectories to find optimal actions.

SPIRAL: Self-supervised Perturbation-Invariant Representation Learning for Speech Pre-Training

What’s Exciting About this Paper

SPIRAL, draws inspiration from BYOL and offers an alternative ASR speech pre-training method that claims to reduce training costs by 80% compared to wav2vec2 for the base model and 65% for the large model.

Photo Credit

Key Findings

The researchers use a gradual down-sampling strategy in the model to reduce the amount of computation required during training. They speculate that this removes redundancy in way that is similar to quantization. The LL-60k SPIRAL model needed 500k training steps and 232 GPU days while a standard wave2vec2 model needs 1000k training steps and 665.6 GPU days. They also showed that the SPIRAL model handled noisy inputs better by evaluating both models on the Amazon CHiME dataset.

Our Takeaways

SPIRAL is an alternative Self Supervised Learning (SSL) pre-training method that, through aggressive downsampling, allows for up to 80% training costs savings while still matching the performance of wav2vec2. Additionally, the noise perturbation mechanism they propose might be helpful for other SSL pipelines.