Bridging the trust gap: RAG and AI drafting

Written by Richard Batstone | Sep 9, 2025 1:22:20 PM

Blind spots and knowledge gaps

Many lawyers are sceptical of AI tools. They need solutions they can trust, and this is something that generative AI (GenAI) tools struggle to reliably deliver. There are very public examples of inaccurate citations, outdated laws, and fabricated judgments. These are more than minor glitches. Inaccurate and hallucinated content presents a serious barrier to adoption when it comes to deploying GenAI in a legal context – no matter how excited we are about this category of tech.Before we explore how RAG can help, let’s first consider some of the circumstances where LLMs might fail. GenAI models (such as GPT-5 and Claude) are trained on vast volumes of internet text and academic literature. But this process has limitations. The most significant in a legal context are:

Knowledge is frozen in time. A model trained in 2023 won’t know about a law passed in 2025. For example, if you ask about the maximum fine for breaking cookie rules, an LLM trained before the Data (Use and Access) Act 2025 might still cite the old maximum of £500,000, ignoring the increase to £17.5 million or 4% of annual worldwide turnover introduced in the new law. Similarly, if you ask about the current rate of employer National Insurance contributions, the model may return 13.8%, despite it rising to 15% in April 2025. Worse still, even for models that are trained after a relevant change of law, the model might cite the out-of-date law rather than the new law if it appears more often in its training data. There is no concept of correct or in-date knowledge inherent within the training of an LLM, particularly where changes in law happen close to the training date.
Knowledge is limited to what’s publicly available. LLMs have never seen what's in your firm’s knowledge bank, your clients’ documents, or your internal precedent library. Consequently, output is generalised. This means while the model might be trained on the technicalities of indemnity clauses, it won’t be able to replicate how your firm specifically drafts them.
Knowledge is unverified, and unverifiable. Whilst AI providers do curate the sources of LLM training data, there is typically no explicit verification of that data. LLMs, therefore, internalise information of varying quality and provenance. Moreover, where LLMs do have access to high quality sources in their training data, they are generally unable to provide accurate citations to those sources, leaving it up to the user to manually check the outputs.

These limitations are particularly exposed when an LLM is unsure of an answer. In this situation, instead of admitting uncertainty, the model typically generates a best guess response. This leads to outputs that will often sound entirely plausible but be factually incorrect or irrelevant. In legal work, this ‘hallucinated’ content is more than just inconvenient – it also undermines client confidence, exposes significant liabilities, and generally feeds lawyers’ reluctance to use AI tools.

This is where RAG steps in. At its core, and as its name suggests, RAG combines three steps – retrieval, augmentation, and generation. In the following sections, we break down how RAG works and what it means for LegalTech providers, like Clarilis, who leverage GenAI in legal drafting.

1. Retrieval: Finding relevant information from internal and trusted sources

While ‘retrieval augmented generation’ sounds complex, the concept behind it is surprisingly simple. Rather than relying solely on what the model has learnt from training, RAG supplements the model’s knowledge by actively searching for and retrieving relevant, up-to-date information from trusted sources – whether this is a firm’s internal databases, precedent libraries, or verified external publications.

The effectiveness of this first ‘retrieval’ step will very heavily impact the overall usefulness of the tool. In particular:

the tool must be connected to the right information sources to be able to respond to user queries (providing internet search results to an LLM will not, for example, help it answer questions about a user’s emails); and
the tool must be capable of quickly and accurately retrieving relevant information from those sources.

2. Augmentation: Enriching the model by feeding it relevant context

Once relevant information has been retrieved, the next step is augmentation. This involves taking the results of the ‘retrieval’ step and providing them to the LLM (usually by inserting them into a prompt). This gives context to the model as it generates a response. In other words, rather than asking the model to respond based on what it has learnt from training (which may be outdated, incomplete, or irrelevant), augmentation enables the model to have additional, often highly relevant, information to hand (in the prompt) to help guide the answer. This significantly reduces the risk of hallucinated output and, in a drafting context, provides:

Grounded responses: The model is informed by content lawyers trust, whether from internal knowledge libraries or validated external sources.
Greater relevance: Outputs are more aligned with a target jurisdiction, firm-wide drafting standards, and specific client needs.
Transparency and traceability: The lawyer can see (and cite) which materials were used to inform the response, making it easier to verify accuracy and authenticity.

Beware, however, augmenting a prompt with relevant examples and information doesn’t necessarily mean that the LLM response is correct. It is still generating content and might misinterpret the information it’s been provided with. You still always need to check the outputs and the cited sources.

3. Generation: Producing a response backed with sources

With relevant information retrieved and embedded into the prompt, the final step is generation. This focuses on how the LLM produces a response based on both its training and the enhanced context. One of the key benefits of using RAG is traceability. Because the response is based on identifiable, retrieved content rather than relying solely on general training data, it’s not only more accurate and relevant, but also relatively straightforward to link back to the documents used in the retrieval step. This offers an enhanced level of transparency enabling lawyers to:

Review the original sources to confirm the AI has drawn from appropriate and trusted materials.
Assess whether the right authorities or precedents were applied.
Verify that the response reflects the standard of reasoning they would expect from a human legal professional.

In summary, when tools are well configured to use RAG, this improves the relevance and reliability of the output, reducing the need to cross-check against multiple systems or repositories. As well as saving time, it also builds user confidence as lawyers are far more likely to trust AI-generated content when grounded in sources they already rely on.

How does RAG fit into the legal drafting process?

At Clarilis, we have launched and continue to develop AI tools to assist at all different stages of the drafting process. One direction for AI-supported drafting is to leverage RAG to provide grounded suggestions for legal drafting. For example, if you need some industry-specific representations for a facility agreement, can you use RAG to have an LLM generate first draft suggestions based on previous transaction precedents? There is certainly potential in this approach, but two key areas need careful consideration.

1. The complexity of 'retrieval'

If an AI tool is drawing on a well-defined, regularly updated resource, configuring retrieval can be relatively straightforward. But legal content is rarely this simple. In most cases, the retrieval step will necessitate significant configuration, experimentation, and ongoing tuning to ensure the AI is drawing from the right material, in the right way. This is because the accuracy and quality of AI-generated output will only ever be as accurate as the retrieval method used to feed it. This depends on:

The quality and provenance of the source data. Regardless of how advanced the generative model is, the content generated is likely to be poor if the source data mostly consists of unreliable documents from the counterparty, unfinished or rough drafts, and junior lawyer notes. For example, in the drafting context it matters whether the draft was written by a knowledge lawyer for widespread use or by a junior in their first week. This means that metadata (information about the source, author, when it was last reviewed), user context (practice area, approval for reuse) and document type and status (approved precedent, working draft) must be incorporated into the search to ensure relevant and reliable results.
The effectiveness of the retrieval process. The different methods used to retrieve data from a knowledge base (e.g. keyword search, semantic search or vector search) each have strengths and weaknesses. Many RAG-based systems favour ‘vector search’, which allows users to phrase a question in natural language and then provide relevant sections of content from documents in the knowledge base. However, a vector search alone will usually not be sufficient in a legal context. Legal information retrieval depends not just on content similarity, but also on understanding the wider document and legal context. For example, if you are drafting some warranties for a share purchase agreement, the relevance of some precedent warranties to your task might also depend on the other elements of the precedent, such as the approach taken to limitation of liability and W&I insurance.[1]

2. Generation vs Search

Sometimes, the most helpful role an AI tool can offer is to simply find the right content quickly and direct the lawyer to it. For example, if your firm has a clear, documented position on what statements are appropriate to offer in a legal opinion, there will be no value (and, on the contrary, quite some risk) in an LLM rephrasing those statements in its own words. A direct extract (or link to the relevant source) is preferable for both clarity and auditability. In contrast, in other scenarios (e.g. when summarising the key points of a recent regulatory change or drafting a first-pass clause based on internal precedents), it makes sense to ask AI to synthesise an answer.

Conclusion: Can RAG bridge the trust gap?

RAG is a fundamental building block of sophisticated, domain-specific AI tools. In a legal context it plays a key part in addressing one of the biggest challenges to the adoption of GenAI – lawyer trust. By grounding generative models in real, up-to-date, traceable and citable information, sourced directly from verified internal or external knowledge repositories, RAG can help to significantly reduce hallucinations. And this, in turn, increases lawyer confidence in AI outputs.

However, while RAG mitigates hallucinations, it doesn’t eliminate them entirely. Fundamentally, RAG depends on the quality of the underlying content. And, even when provided with the highest quality content, models will still generate language freely, which can lead them to misinterpret or misrepresent information. This reinforces the importance of legal expertise and human lawyer oversight both when curating knowledge bases and reviewing AI-generated drafting.

[1] Anthropic’s Contextual Retrieval blog gives some colour to the challenges involved in RAG, and potential approaches for addressing them (https://www.anthropic.com/news/contextual-retrieval).

View full post