ChatGPT and the future of law
In June, Sir Geoffrey Vos, Master of the Rolls and Head of Civil Justice in England & Wales, gave an address to attendees of the Law Society of Scotland’s Law and Technology Conference. The full text of the speech is here.
The tone is light, conversational, even humorous. However, make no mistake: Sir Geoffrey is a reformer with a vision of large-scale adoption of new technologies by the profession and the judiciary. His wide-ranging talk suggests that lawyers might use ChatGPT and similar technologies for drafting documents, document review, predicting case outcomes and assisting in settlement negotiations. Sir Geoffrey envisages “robot judges” in the courts. He sees a role for AI in delivering access to justice.
This article urges the Scottish profession to adopt a cautious approach to the use of AI (an umbrella term which we use to refer to AI in its current data-driven machine learning paradigm). It explores why ChatGPT is not suited for transactional and litigation-related tasks, and argues that while AI-enabled legal technologies may hold some benefits for the legal profession, Scots lawyers should take time to understand their capabilities and limitations and should get involved in policy discussions about their deployment. The debate about the use of AI-enabled legal technologies is, after all, about nothing less than the future of law.
What is ChatGPT?
ChatGPT is an AI chatbot developed by OpenAI. According to OpenAI, it is “a conversational AI that can chat with you, answer follow-up questions, and challenge incorrect assumptions”. It is built on GPT-3.5, a large language model (“LLM”) that can generate text by statistically predicting what the next word(s) should be that follow on from the words entered by the user. For ChatGPT, OpenAI fine-tuned (customised) GPT-3.5 to be more conversational, allowing the system to provide responses to questions input by users.
ChatGPT is not a search engine. It has no access to an external data source. The data “retained” in ChatGPT is derived from its training data. For the most part this is data that was scraped from the web, with or without authorisation.
What are the concerns about use of ChatGPT for legal work?
Confidentiality of input data
When you enter a query into the ChatGPT web interface you share that input with OpenAI, to allow ChatGPT to generate an output in response. By default, whatever you enter will also be used to train ChatGPT. Research suggests that bad actors could extract training data from the model by using suitably crafted queries that cause the model to generate responses that replicate word-for-word those user inputs that were previously used as training data.
Users can opt out of having their inputs used as training data. However, OpenAI will still retain the inputs for 30 days, and they reserve the right to review those inputs as required to “monitor abuse”. The OpenAI privacy policy, which applies where the user accesses ChatGPT through the web interface, also states that “We disclose this information [including your inputs] to our affiliates, vendors and service providers, law enforcement, and parties involved in Transactions.”
None of this is a problem (as regards confidentiality) if, say, you want to use ChatGPT to produce text for your website. Nor do issues about confidentiality arise if you ask ChatGPT to summarise a published case report or to produce a basic template for a particular kind of contract. It is a problem, however, if you want to ask ChatGPT to respond to inputs containing confidential or privileged information – the kind of information that would likely be relevant for drafting or reviewing documents, predicting case outcomes or assisting in settlement negotiations.
Personal data
The UK’s Data Protection Act 2018 obliges data controllers to enter into data processing agreements with those persons that process personal data on their behalf. OpenAI will only enter into a data processor agreement with business users who access its models through its APIs (application programming interfaces). Lawyers should therefore not input their clients’ personal data into ChatGPT via the web interface.
Limited functionality
Sir Geoffrey suggests that ChatGPT might be used for document review. At the time of writing, you cannot upload documents into ChatGPT – all input data must be manually typed or pasted into the chatbot interface. There is also a limit on the combined size of any given user input and the system’s response. These are significant practical limitations if, for example, you want to ask ChatGPT to review a document or predict the outcome of a case based on a summary of the facts. Realistically, lawyers are not going to cut and paste multiple pages of legal documents into the ChatGPT web interface. By contrast, subscribers to ChatGPT Plus have access to additional functionality including the ability to upload documents directly to the ChatGPT interface.
ChatGPT’s outputs: the truth, the whole truth and nothing but the truth?
ChatGPT can produce basic templates for contracts or other legal documents. However, recall that at the heart of ChatGPT is a model which is trained to predict the next word or sequence of words that follow a textual input. OpenAI’s models are trained on data which is scraped from the internet. That data will no doubt include some publicly accessible legal data (OpenAI has not disclosed precise details of the content of the training data for ChatGPT). However, ChatGPT is a general purpose and not a domain-specific model. It is not fine-tuned with data specific to law, let alone Scots law. ChatGPT has no access to your contract playbook, templates, or databases of case law and knows nothing of your concerns other than what you tell it. It cannot think like a lawyer. How good can you expect it to be?
In fact, the outputs of the system are generally surprisingly good but, like all systems built on LLMs, ChatGPT makes up facts and produces output that is not supported by source material (even assuming the training data is accurate and appropriate). This is because its outputs are based on the statistical relationship between words in the data it was trained on, rather than on any understanding or sensitivity to the real world. As Sir Geoffrey Vos himself notes, a New York lawyer called Steven Schwartz found that out to his cost when he relied on ChatGPT to prepare research for a brief — the system helpfully provided details of cases which did not exist. Schwartz was fined and admonished by the District Court for acting in bad faith. Despite the ChatGPT interface now including various disclaimers, the tendency of large language models to make things up — because of their intrinsic approach to generating text — is still a concern for lawyers.
Professional obligations and overreliance on AI
Scottish solicitors “must have the relevant legal knowledge and skill to provide a competent and professional service. They must be thorough and prepared in all their work” (Law Society of Scotland, Standards for solicitors, 6). Clearly a solicitor could not fulfil this obligation through wholesale reliance on the output of an AI-enabled system like ChatGPT. However, the risk of overreliance is real – the more convincing the output of such a system the greater the risk. The temptation to rely on those outputs is surely considerable – after all, what is gained by using ChatGPT (or similar systems) if you have to check every aspect of its output from scratch? The system might offer an extra “pair of eyes” and a way of testing your own legal knowledge – asking yourself, is my own knowledge sufficiently good to allow me to judge whether the system’s outputs are accurate or not – but can your practice afford the additional time commitment involved in using it in that way?
Jurisdictional integrity
Scotland is a small jurisdiction, but one with a long-established and distinctive legal system. ChatGPT is a general-purpose large language model (LLM), trained on text taken from the broader internet, with no specific focus on the law, much less the particularities of Scots law.
As the example of Steven Schwartz shows, use of ChatGPT creates a risk of reliance on false information. However, use of ChatGPT for legal work may give rise to a more subtle risk – that the outputs of the system may nudge the unwary to use or rely on legal concepts or terminology which are not germane to Scots law but are borrowed from other jurisdictions, notably the US, which are more strongly represented in the training data. There is a potential here for the subtle erosion of jurisdictional integrity.
Specific tasks for which ChatGPT is not well suited
Sir Geoffrey suggests that ChatGPT might assist lawyers in settlement negotiations and prediction of judgment. However, it is hard to see how ChatGPT (or any public, general purpose LLM) would be of much assistance to lawyers in either task. Since settlement negotiations are confidential, it is highly unlikely that such information would form part of the training data of ChatGPT.
Furthermore, most research on the prediction of case outcomes by AI systems does not involve prediction in the sense that you or we would understand it; the systems do not predict the outcome of a case which is still to be heard. Instead, the “prediction” involves identifying the outcome in an already-reported judgment, or inferring the outcome of a case by reference to other elements of the reported judgment in that case.
The practical utility for lawyers is therefore extremely limited, because they can already identify the outcome of a reported case since it is right there in the judgment. In any event, recent research suggests that even on these limited “prediction of case outcome” tasks, general-purpose LLMs such as ChatGPT do not perform particularly well.
The “alignment problem”
Sir Geoffrey’s speech touches on the so-called AI “alignment” problem. This can refer to any or all of the following questions: whether an AI system behaves in a way that aligns with human morals or values; whether the system does what its human users intend it should do; and relatedly, whether the system’s output or behaviour conforms to human expectations. If, for example, ChatGPT generates outputs that have no basis in fact, or (as in the example Sir Geoffrey offers from the Schwartz case) fails to give an accurate response to “the question… [the user] had intended to ask”, these behaviours might be characterised as indicative of an “alignment” problem.
There are some problems with AI “alignment” rhetoric. First, it contains more than a hint of the idea that AI might some day soon become a “moral” actor, holding acceptable values and exhibiting desirable behaviour. This is associated with the idea that AI might achieve general intelligence (“AGI”), that is, the ability to carry out any task which a human is capable of performing. Even if this were possible and desirable – both issues which are highly debatable – we are nowhere near achieving AGI. ChatGPT’s outputs are generated by reference to statistical correlations in existing word sequences sourced from the internet. It does not “understand” its inputs or outputs, and has no conception of morally good or bad behaviour or values. If we input poorly phrased questions and obtain responses which fail to reflect our real intentions, the system is behaving just as one should expect!
Secondly, the solutions proposed in relation to the alignment problem are arguably as bad as or worse than the problem itself. AI systems of the kind we are discussing (i.e., machine learning systems including LLMs) are built on code, but their behaviours are not explicitly programmed. Indeed, the distinguishing characteristic of such AI systems is that they “learn” from their training data. So, it is no easy matter to design an AI system so that it behaves in a particular way, and it is impossible explicitly to code an AI system such that it holds a set of values – indeed, holding values is something only humans can do.
Current approaches for achieving “alignment” in systems like ChatGPT depend on fine-tuning the underlying LLM through a process known as reinforcement learning with human feedback (“RLHF”). In RLHF a second model is trained on a dataset which consists of matched pairs of inputs and outputs ranked by human annotators according to their conformity with human preferences (e.g. accuracy; freedom from toxic, or harmful, or biased content, etc). The reward model is used to fine-tune the behaviour of the initial model, by providing a signal to that model as to whether a given output is in line with those learned preferences.
This approach raises various questions. What data should be used for this process, and whose preferences or values should be used to nudge the behaviour of AI systems such as ChatGPT? Who is doing the annotating? If legal accuracy is a goal of the system, are the annotators qualified in the relevant domains? How can we assess whether the system will comply with these preferences once it is deployed? Why should we suppose that the myriad values that we humans might want to espouse (including the “principles upon which lawyers, courts and judges operate”) are somehow capable of being inferred from a dataset?
These are hard problems with social, philosophical, political, economic and technical dimensions. We should certainly recognise that ChatGPT and other AI-enabled systems do not always behave in ways that we might want or expect. This is a reason to educate ourselves about the limitations of these systems – and to be wary about solutions which carry their own limitations, introduce new ones, and potentially shift the logic and language of law in ways that might not be immediately apparent to us.
AI-enabled technologies in law and legal practice
ChatGPT is only one product among many that rely on LLMs. Other such products, tailored for lawyers, are already available. Some of these systems can access the internet or domain-specific datastores, and are designed to reduce unwanted behaviours and to tackle concerns about privacy, confidentiality, and data protection. However, all such systems share certain characteristics. First, they depend on their training data. Training data is the grist to the mill of these systems – it does not wholly determine their outputs (the choice of system architecture, algorithms, parameters and hyperparameters will all play a part) – but it both enables and constrains those outputs. Secondly, they cannot learn from anything other than their training data (or, in the case of LLMs that employ RLHF, from a reward model derived from ranked training data). Finally, these systems “learn” by finding patterns in the training data.
Law, however, is much more than the sum of legal data. There is a mismatch between that pale and reductive vision of law held by some proponents of data-driven legal technologies which finds law only in legal data – its case law (usually that of higher courts), its legislation, its books and journals, contracts and documents (and then only to the extent these are accessible, digitised, machine-readable) – and law in action, in its institutional settings, its procedures and practices, its engagement with and development in the messy world of people, facts, lives. Much of law is implicit, tacit, interstitial, not immediately to hand. Moreover, law changes, and must change, in response to shifts in society and the impetus of the demands of justice. In a very real sense law is always incipient, a work in progress, being simultaneously read and re-articulated. These aspects, fundamental to law, resist being captured in a dataset.
This is not to say that there is no place in law and legal practice for AI-enabled technologies, but that we lawyers need to understand the capabilities and limitations of these technologies and get involved in discussions and policymaking around their deployment. Otherwise, we risk waking up to find the tail wagging the dog: AI, with all its limitations, is interpolated into the institutions of law and, instead of simply producing outputs which we can independently assess as meaningful and useful or not, it comes to shape the very content and habits of law.
Conclusion
Sir Geoffrey Vos suggests that the uses to which ChatGPT and similar technologies should be put should be determined by professional codes of conduct or rules committees such as the Online Procedure Rules Committee in England & Wales. No doubt this is both helpful and appropriate. However, this is no niche issue but one which affects the profession as a whole.
Our hope is that the issues raised by these technologies – including their impacts on law, legal practice and citizens – will be discussed within firms, local and international bar associations, at conferences, law schools and in public forums. Do we want “robot judges”, even for so-called minor cases? If ChatGPT-style systems can provide meaningful guidance on settlement negotiations or predict the outcome of cases, will clients obtain such guidance direct? If so, should these systems be regulated? Who should solve the “alignment” problem and how? Whose values or morals should be baked into these systems? What real world problems will these systems actually solve? What problems might they create? If law, now, seems remote and inaccessible to some, will that situation be worsened or improved to the extent that law is intermediated by black-box technologies such as LLMs? Will we see one human-intermediated law for the rich, and an AI-intermediated law for the poor, or vice versa? These are live questions which engage rule of law issues and must therefore be of concern to the profession as a whole.
The views expressed in this article are those of the authors and may not represent the views of any of the organisations with which we are affiliated.
Regulars
Perspectives
Features
Briefings
- Criminal court: Misdirection?
- Employment: Putting a cap on non-competes
- Family: Death and financial provision
- Human rights: Regulating news broadcast impartiality
- Pensions: Fraud protection – a report card
- Scottish Solicitors' Discipline Tribunal: August 2023
- Property: Reservoirs – in on the Act
- In-house: Trust at the top