GPT-3 and Prospects for Legal Applications

Written by: Alexander Hudek

August 6, 2020
Artificial Intelligence

8 minute read

GPT-3 is the newest and largest variant of the GPT architecture by OpenAI, a company dedicated to broad artificial general intelligence research. Proponents and practitioners of new machine learning technologies can be both cautious and enthusiastic about the prospects for AI, depending on their perspective, but in this case the enthusiasm is apparent.

Beta testers have been populating twitter with colorful examples of the various tasks that GPT-3 can do. It has been used to write poetry, prose, or stand-up comedy scripts; imitate legalese; create charts and graphs; and answer all kinds of natural-language questions. See the good list of examples on GitHub.

At Kira Systems, we are always looking at new AI breakthroughs through the lens of our customers’ applications and requirements. GPT-3 might well prove useful in the legal space, but there are also a number of aspects that will limit its usefulness in the short run.

What’s New About GPT-3?

The primary novel aspect of GPT-3 is its size. It has been trained on a substantial portion of the internet and is the largest language model trained to date by a large margin. At 175 billion parameters GPT-3’s size greatly exceeds the second largest language model today, Microsoft’s Turing NLG, which has only 17 billion parameters. The actual underlying technology isn’t that different from previous language models. The only changes are to the training routines, modified to accommodate the large size.

The much larger size has also led to a second novel property: the ability to use a less common but simpler technique to adapt the model to different tasks. On its own, all a language model does is predict the next word given a series of previous words. Typically, a language model like GPT-3 would be adapted to a particular task, such as answering questions about or summarizing a document, by a technique called “fine tuning.” This is where you modify the architecture to get it to do a new task, then continue training it on 100-1000 new examples to “tune” it to the new task.

Instead, with GPT-3, OpenAI uses a simpler technique that conditions the model towards mimicking a particular behaviour by giving it just a few examples. The weights of the model itself remain unchanged. The surprising result is that, thanks to its increased size, GPT-3 can often come close to or match state of the art results that have been achieved by fine tuning on smaller models, without fine tuning. This is important because fine tuning is expensive when it comes to computing costs_._ Particularly as the model grows in size. Training GPT-3 has been estimated to cost more than 4 million dollars worth of compute time, and that’s the low side guess. Fine tuning doesn’t cost that much, but it is still costly.

In fact, no one has tried fine tuning GPT-3 yet, when they do you’ll see far far better results than we see today. Without fine tuning, GPT-3 is in fact not as good as smaller previous models on many tasks, and often only matches them on other tasks. That’s not to detract from the achievement at all, as I said, being able to match state of the art without fine tuning is huge. But it does matter when we start to talk about practical applications of this technology.

On the question of “will this be useful in practice?”, the answer is yes in some tasks, no in others. Because it does not require fine tuning, it may become easier to deploy for tasks where it can match state of the art results, because you don’t have to do expensive fine tuning to get it working. That said, there are drawbacks that can limit its application.

Privacy considerations will be one barrier. Today, the only way to get access to it is via an API from OpenAI. Unfortunately, this means that any organization that is highly privacy focused or requires deep integration in their own systems is unlikely to use it, including Kira Systems.

However, even if we had direct access to the model, there would still be privacy challenges. For context, our clients often have a legal obligation to delete data after they finish a project. If they’ve used that data to train a machine learning system, and that data is reconstructible from the trained model, they could get in serious trouble.

Even though GPT-3 can be adapted to new tasks without fine tuning, you still need to prime it with examples of what you want it to do. If you want to avoid priming it every time you use it, you must save the state of the primed model. It is an open question whether you can learn anything about the examples used to prime the model by inspecting its state.

If we had to fine tune the model, assuming it was practical from a computing cost standpoint, this problem would get even worse. It is well known that language models like GPT-3 can generate text that looks exactly like the documents it was trained on! In fact, most popular deep learning systems suffer these privacy issues with regards to training data. This will be a barrier to using them on sensitive training data. There are potential solutions to this, such as differential privacy, with TensorFlow’s library being a good example, but work would need to be done to apply these solutions to popular models like GPT-3.

Another barrier with GPT-3 and similar technology is the source of the training data. Particularly, for legal, we would want to know whether the model has been exposed to the sorts of contracts we would need it to work on. We can be certain that GPT-3 has not seen private data; that could mean that it will be less effective on the types of documents that are typically not publicly available. It’s also not clear if OpenAI even included EDGAR data when training GPT-3, which is an important public source of contracts. This is not a fundamental limitation of the technology, but retraining or even continuing to train a model like GPT-3 to include more domain specific data would be potentially expensive, and is not possible at all if OpenAI does not release the model itself. The majority of the data GPT-3 is trained on is also English. To be useful on other languages it might be necessary to further train it with more non-english data.

Bias is another potential barrier. GPT-3 is trained on a subset of the internet, the internet has inherent biases. This could be racial biases, biases based on sex, or others. The amount of training data does mitigate this a bit, but it’s still a big issue and one of the main problems OpenAI is trying to overcome as they release the model for use by others. The usual solution would be to try to “fix” the training data by removing bias, but in this case that isn’t practical. OpenAI is instead exploring approaches that filter results to eliminate bias instead. You can be sure that this model is capable of generating some pretty ugly language! Bias can also come into play in ways other than ethics. For example, if it’s trained primarily on US legal sources, it might be biased towards US law, which could be a problem for UK or Canadian legal applications.

Finally, it’s worth keeping in mind that although this model often looks like it’s doing some pretty sophisticated reasoning, that can be an illusion. Take arithmetic for example. OpenAI tested its ability to add numbers and were amazed to find that it could! However, when you gave it larger numbers, suddenly the model could no longer do arithmetic. It has likely simply memorized addition tables for small numbers, it didn’t truly learn to add. You can see some other examples of it failing as part of this article. One way to think of GPT-3 is that it has, in a way, simply “memorized” the internet. The implications of this are that you’ll get extremely impressive results for situations where it can just regurgitate something it has already seen, but in novel or rare cases it might fail. That’s not a property that you generally want in an AI system. It’s a form of overfitting in a way. You can partially address this by training it on additional data, but it’s worth remembering that it isn’t truly a general intelligence.

Generally, we see a few applications where GPT-3 could be valuable in the legal field:

  • Answering questions or finding results from public data. Legal research might be an area where it will make a big impact.
  • Generating text in situations where you have a human in the loop. Think Google’s auto-complete but even more sophisticated. That capability has potential applications in contract drafting, but the system would have to be trained on the right data.

Looking ahead, we see GPT-3 as a very important piece of work, but it won’t change most of the industry overnight. There are definitely a few tasks where it might have a bigger impact in the near term. Here at Kira we’re not planning on adopting GPT-3 any time soon due to the various shortcomings, particularly around privacy.


The original publication introducing GPT-3 can be found here

You can read more about Alex’s insights on GPT-3 in Artificial Lawyer and Legal IT Insider.

Share this article:

Get the latest legal tech insights sent straight to your inbox.