What is Character Mining?
Character Mining is an ongoing project of the Emory NLP research group since 2015. This project currently focuses on the generation of document-level semantic representations consisting of relations between entities and attributes in multi-party dialogue. For the example below (exceprted from the famous TV show, Friends), the project aims to build the semantic graph representing the whole dialogue:
|Doctor||I'm getting three separate heartbeats.|
|Phoebe||Three? You guys were worried I wouldn’t even have one!|
|Rachel||Well, so, are-are you sure that there are three?!|
|Pheobe||Oh my God! So I-I mean so in a few months I'm going to have three full grown babies just walkin’ around inside me?!|
The long-term goal of this project is to develop a machine comprehension system that understands human dialogue and answers questions regarding to the contexts in the dialogue.
We introduce a new entity linking task, called character identification, that links mentions in multi-party dialogue to their referent entities. Mentions in this task are nominals implying humans and entities are certain characters in the TV show. For the example below, mentions (e.g., you, mom, Ross) are linked to specific characters in the show if applicable:
- Character mining corpus: v1.0.
- Character idetnfication demo (coming soon).
- Character Identification on Multiparty Conversation: Identifying Mentions of Characters in TV Shows, Henry Yu-Hsin Chen, Jinho D. Choi, Proceedings of the 17th Annual SIGdial Meeting on Discourse and Dialogue, SIGDIAL'16, Los Angeles, CA, 2016.