Kira Systems quickly reads agreements for user–specified provisions (e.g., term, assignment, change of control) and puts its findings into summary charts. We say our software finds 90% or more of nearly every provision covered. If you’re itching with curiosity over what our accuracy statement actually means, this is the post for you. I’ll cover the two parts to this issue: (i) “90% or more” and (ii) “nearly every provision covered.”
We, like others in the search area, assess our accuracy by recall and precision.
- Recall - Measures the extent to which relevant results are found. For example, in a set of documents with 50 assignment provisions, if a search system found 48 of the 50, recall would be 96%.
- Precision - Measures the extent to which only relevant results are found; irrelevant results hurt precision. Effectively, precision shows how many hits are junk out of the total number returned. In the recall example above, if the system presented 300 provisions as assignment hits to turn-up 48 actual assignment provisions, precision would be 16%.
There is generally a tradeoff between recall and precision. In our problem of trying to find contract provisions, one way to get perfect recall would be to return entire contracts as results. We would never miss a provision following this strategy. Alternatively, we could make extra-sure the system only ever showed users relevant results, but this could come at the cost of missing some (perhaps atypical) provisions. Neither approach is right for our users. Lawyers care a lot about not missing provisions, especially in the large transactions regularly done at Biglaw firms. But it’s hard to see a system that only turned-up the odd accurate provision (such as in the 16% precision example above) helping users work more efficiently.
In our view, our users are much more focussed on recall. They really care about finding all relevant results. And even use our system as a second level check to make sure human reviewers don’t miss anything. That said, results need to be generally relevant to be useful. So what does that all mean?
- We skew our results towards recall but make sure to have decent precision.
- Our user interface makes eliminating false positives fast and easy.
- By “90% or more,” we only refer to recall. Try the system to see how solid our precision levels are!
≥90% on Nearly Every Provision Covered
Our usual policy is to not add provisions to the system unless the software finds at least 90% of provisions in testing. We have made a small number of exceptions where (i) recall was high, if not 90% and (ii) we felt the provision would be useful to our users at the found recall level. Again, nearly all of our provisions are at least 90% accurate. Consider whether your human-done diligence is that accurate (and ask how you would even know if it was). And, if you’re curious, try DiligenceEngine out and see how well it performs. Or get in touch - we can go into a lot more detail with you on this and other questions.
How Do We Know How Accurate Our System Is?
Stay tuned—this will be the subject of an upcoming post.