Deep Learning

Review - SimCLS and RefSum - Summarization Techniques

This week’s Deep Learning research papers are SimCLS: A Simple Framework for Contrastive Learning of Abstractive Summarization and RefSum: Refactoring Neural Summarization.

Review - SimCLS and RefSum - Summarization Techniques

This week’s Deep Learning research papers are SimCLS: A Simple Framework for Contrastive Learning of Abstractive Summarization and RefSum: Refactoring Neural Summarization.

What’s Exciting About This Paper

Natural Language Generation (NLG) tasks such as text summarization are typically some of the hardest problems in Natural Language Processing (NLP) and Automatic Speech Recognition (ASR) due to the discrepancy between training and testing. During training, these models are typically trained using the framework of Maximum Likelihood Estimation (MLE) and autoregressive teacher-forcing, whereas during testing the optimal token is selected at every step to generate a summary.

Although techniques such as beam search can be used to combat this issue, final evaluation metrics (rouge scores for summarization) often don’t map perfectly to the log likelihood estimation used in selecting the optimal beam. These papers introduce a “generate then evaluate” approach where a generative summarization model is stacked with a discriminative scoring model to both generate candidate summaries and then score those summaries allowing the optimal candidate to be selected that most closely maps to the final evaluation metric.


Key Findings

RefSum set State-of-the-Srt (SOTA) summarization performance on both the CNN / Daily Mail and Xsum datasets at the time of its publication by unifying its base and meta summarization systems. This showed that the highest probability output from beam search isn’t always an ideal solution and that “scoring models” could be used to further rerank or stack various candidates from either the same model or multiple models to boost downstream performance.

SimCLS advanced and simplified this idea, while also setting a new SOTA, by training its scoring model in a contrastive learning setting. Here BART was used to generate candidate summaries and RoBERTa was trained to judge how well candidates actually did at summarizing the text by analyzing text embeddings.

Our Takeaways

These papers show that taking a basic multi-model pipelined approach to summarization can lead to the best downstream performance. “Generate then evaluate” approaches are an interesting area of research and could potentially lead to SOTA performance in other generation tasks such as speech synthesis, image generation, and question answering to name a few.