Picking Apart Story Salads

Abstract: During natural disasters and conflicts, information about what happened is often con- fusing, messy, and distributed across many sources. We would like to be able to auto- matically identify relevant information and assemble it into coherent narratives of what hap- pened. To make this task accessible to neural models, we introduce Story Salads, mixtures of multiple documents that can be generated at scale. By exploiting the Wikipedia hierarchy, we can generate salads that exhibit challenging inference problems. Story salads give rise to a novel, challenging clustering task, where the objective is to group sentences from the same narratives. We demonstrate that simple bag-of-words similarity clustering falls short on this task and that it is necessary to take into account global context and coherence.

Paper: Su Wang, Eric Holgate, Greg Durrett, and Katrin Erk. Picking Apart Story Salads. Proceedings of EMNLP.

Code: on github

Story Salad Data: