Good way to build accurate contract provision extraction models?
tl;dr: If you wouldn’t trust the people who instructed a contract review system to accurately review contracts, can you trust results from the system they built? Yet some automated contract provision extraction systems are instructed by under-qualified people.
It’s quiz time.
Do you need to know how to do a task well to instruct a computer program to do that task well?
Is contract review simple enough that anyone can do it well? Would a random person off the street be accurate in identifying contract provisions? How about someone with a consulting background who knows how to write regular expressions, a paralegal, or even a fresh law school graduate—how would their accuracy be? Would it be very high? How would it match up with someone who had spent years as a corporate lawyer at a leading law firm? Who would you choose to extract data from contracts if cost was not an object and you needed it to be done right?
Contract provision extraction systems apply the judgement of the people who instructed them to decide whether new text is a certain contract provision. If you wouldn’t trust the contract review judgements of the people who instructed the contract provision extraction system, how can you trust the system they helped build?
The last six posts in the Contract Review Software Buyer’s Guide have focused on the different technologies automated contract provision extraction system providers use to get their products to find contract provisions, and why these differences matter. This and the next post focus on why the people instructing contract provision extraction systems are also critical.
To recap, there are three main ways to build provision extraction models:
- Rules - People create rules (aka “policies”) describing contract provisions, and text captured by these manual rules is extracted. Rule-based extraction is relatively easy to set up and build out, but will give middling results on unfamiliar agreements and poor quality scans, and it is hard for builders to know their system’s accuracy on unfamiliar agreements.
- Comparison - Builders add provision examples to a database and set a similarity threshold (e.g., ≥85% similar to clauses in the database). Text that falls within the similarity threshold is extracted. Comparison technology itself is easy to build out, though assembling provision examples isn’t. Comparison-based systems will likely have middling results on unfamiliar agreements and poor quality scans, and it is hard for builders to know their system’s accuracy on unfamiliar agreements.
- Machine Learning - Specialized algorithms build probabilistic models from examples of contract provisions. We found it took a lot of effort to get machine learning technology to work well at learning sentence-to-paragraph text, and it also took a lot of work to assemble sufficient provision examples. The plusses of this technological approach are (i) that once it works it gives robust models that perform well on unfamiliar agreements and poor quality scans, and (ii) builders can test their system’s accuracy on unfamiliar agreements.
Is Law School or Experience Writing Regular Expressions a Good Qualification for Building Provision Models?
The right underlying technology is a critical element of accurate contract provision extraction. But it’s not the only thing that matters. The technology is essentially a vehicle for transforming a builder’s guidance on what is a given contract provision into a model of that contract provision, and the guidance itself is the other critical piece of what makes a contract provision system accurate at this task. In a rules-based system, “guidance” and “instruction” mean what rules are used, and, in comparison and machine learning-based systems, they mean the provision examples used to stock the comparison database or compose the provision training set, respectively. Here’s the core takeaway from this post:
Contract provision extraction systems are only as good as the data they are trained or instructed with. Training data or rules are assembled by people. These people need to know what they are doing. Garbage in, garbage out.
You would think, given how important quality input data is, that automated contract abstraction providers would put a lot of effort into making sure the people training their systems were capable of doing this work well. Our sense is that this is often not the case. We know one provider used paralegals to assemble its provision training set, and we are fairly sure another heavily uses entry-level lawyers, while we think a third uses staff with a background in consulting and technology who know how to write extraction rules using regular expressions. Junior lawyers at the world’s best law firms regularly make mistakes in due diligence contract review (a big reason we built our system)—this work is very hard. And we are not talking about other providers using junior lawyers from the world’s best firms. How well is a system going to work when built off the labor of less qualified people than the people who already make mistakes at this work?
Here’s how we assemble the training and test data used in our system: A team member with years of experience as a corporate lawyer at a well-regarded Biglaw firm has inspected every single contract provision in our training set, in many cases multiple times. We then have our technology inspect their work, and typically manually re-review any discrepancies. And the experienced lawyer then re-reviews many contracts. I personally did this work day in, day out for more than the first year of our existence. We think this human element is a critical reason we are the only provider in our area to disclose our system’s accuracy on unfamiliar agreements.
Experienced contract reviewers sit at the base of accurate contract review software. We’ll cover a few related issues on this theme in the next post.