What is Character Mining?

Character Mining is an ongoing project of the Emory NLP research group since 2015. This project currently focuses on the generation of document-level semantic representations consisting of relations between entities and attributes in multi-party dialogue. For the example below (exceprted from the famous TV show, Friends), the project aims to build the semantic graph representing the whole dialogue:

Doctor I'm getting three separate heartbeats.
Phoebe Three? You guys were worried I wouldn’t even have one!
Rachel Well, so, are-are you sure that there are three?!
Doctor Definitely.
Pheobe Oh my God! So I-I mean so in a few months I'm going to have three full grown babies just walkin’ around inside me?!

The long-term goal of this project is to develop a machine comprehension system that understands human dialogue and answers questions regarding to the contexts in the dialogue.

Character Identification

We introduce a new entity linking task, called character identification, that links mentions in multi-party dialogue to their referent entities. Mentions in this task are nominals implying humans and entities are certain characters in the TV show. For the example below, mentions (e.g., you, mom, Ross) are linked to specific characters in the show if applicable: