PhD Defence | Entity-centric Document Understanding: Entity Aspects and Salience
Chuan Wu’s thesis aims at enhancing document understanding by using entity aspects and entity salience information. Wu completed his research under the supervision of Evangelos Kanoulas and Maarten de Rijke (both from University of Amsterdam).
The volume of information has increased dramatically and has led to an information overload for the general public. Automated information processing techniques are widely used to improve understanding of textual documents. Textual documents usually describe stories centered around particular news, topics, or events. Entities often play an important role in formulating the main thread of the stories.
In his thesis, Wu first focuses on improving document understanding with entity aspects. It is assumed that entities have multiple aspects and entity-centric document representations were learned for documents that reflect aspects of entities. A neural network based algorithm is proposed to perform entity aspect linking, which links entities in documents to their aspects in knowledge base. Then, a novel graphical model is devised to incorporate entity salience information into a document generative process. Wu also presents a new dataset to support research on entity salience related tasks such as entity salience detection and salient entity linking.