Toward Robust Abstractive Multi-Document Summarization and Information Consolidation
Abstract
Humans can consolidate textual information from multiple sources and organize the content into a coherent summary. Can machines be taught to do the same? The most important obstacles facing multi-document summarization include excessive redundancy in source content, less-understood sentence fusion and the looming shortage of training data. In this talk, I will present our recent work tackling these issues through decoupling of content selection and surface realization. I will describe a lightly-supervised optimization framework using determinantal point processes (DPP) for content selection. I will further present a new method leveraging DPP to select self-contained summary segments to be highlighted on the source documents to make it easier for users to navigate through a large amount of text. Finally, I will discuss challenges and opportunties for driving forward research on abstractive multi-document summarization.