The Utility of Context When Extracting Entities From Legal Documents


Jonathan Donnelly Adam Roegiest

Publication Date:

October 2020


CIKM 2020



When reviewing documents for legal tasks such as Mergers and Acquisitions, granular information (such as start dates and exit clauses) need to be identified and extracted. Inspired by previous work in Named Entity Recognition (NER), we investigate how NER techniques can be leveraged to aid lawyers in this review process. Due to the extremely low prevalence of target information in legal documents, we find that the traditional approach of tagging all sentences in a document is inferior, in both effectiveness and data required to train and predict, to using a first-pass layer to identify sentences that are likely to contain the relevant information and then running the more traditional sentence-level sequence tagging.

Moreover, we find that such entity-level models can be improved by training on a balanced sample of relevant and non-relevant sentences. We additionally describe the use of our system in production and how its usage by clients means that deep learning architectures tend to be cost inefficient, especially with respect to the necessary time to train models.

Read our other research papers

Dancing with the AI Devil: Investigating the Partnership Between Lawyers and AI

Read The Paper →

Towards Protecting Sensitive Text with Differential Privacy

Read The Paper →