What were the results of Enterprise RAG for Legal Documents?

50% Faster Search with Higher Relevance Key metrics: 50% reduction in search time, 85% retrieval accuracy, 10,000+ daily queries processed, 99.9% system uptime.

How long did the Enterprise RAG for Legal Documents project take?

The project took 8 months with a team of 3 engineers. Domain-Specific RAG with Structure-Aware Processing

Who built Enterprise RAG for Legal Documents?

This project was built by Nazmul Hoque Khan (Shuvro), a senior software engineer with 10+ years experience. Role: Senior Backend Engineer & AI Backend Lead at Anaqua. Contact for similar projects: cal.com/nazmul

← All Case Studies

📖 3 min read 507 words

Legal Tech / IP Management 2023-2024 8 months

Enterprise RAG for Legal Documents

Q: What technologies were used in Enterprise RAG for Legal Documents?

PGVector, LangChain, Python / FastAPI. Integrates with existing PostgreSQL infrastructure, supports hybrid queries, production-proven at scale

Anaqua — Senior Backend Engineer & AI Backend Lead

Transforming how legal professionals search millions of patents and IP documents using AI-powered semantic search

50% Faster Search

10K+ Daily AI Queries

99.9% System Uptime

🎯 The Challenge

Generic RAG Failed for Legal Documents

Legal and patent documents contain highly specialized terminology, complex document structures (claims, citations, legal provisions), and reference networks that generic RAG approaches couldn't handle effectively. Standard chunking strategies broke apart critical context, and general-purpose embeddings performed poorly on IP-specific vocabulary.

Key Pain Points

Legal terminology confused standard embedding models; 'claim' in patents means something completely different than common usage
Document structure matters; splitting patents arbitrarily destroyed the relationship between claims and their dependent claims
Citation networks are critical; a relevant document often references 20+ related patents that users also need
Enterprise users expected search results in under 2 seconds with sub-100ms reranking

💡 The Solution

Domain-Specific RAG with Structure-Aware Processing

We built a custom RAG pipeline specifically designed for legal and IP documents, respecting document structure and domain terminology while maintaining enterprise-grade performance.

Technical Approach

Structure-Aware Chunking

Developed a document parser that understands patent and legal document formats. Chunks respect claim boundaries, keep citations intact, and maintain parent-child relationships between document sections.

Domain-Specific Embeddings

Fine-tuned embedding models on a corpus of 500K+ legal documents. The resulting embeddings correctly capture that 'prior art' is related to 'novelty' even though they share no words.

Citation-Aware Retrieval

Built a graph layer on top of vector search that follows citation chains. When retrieving a relevant patent, we also surface the most important documents it references.

Hybrid Search Architecture

Combined semantic vector search with BM25 keyword matching. Legal professionals often search for exact patent numbers or legal terms that benefit from keyword precision.

🛠️ Technology Stack

🚀 Core Technologies

PGVector

Vector storage and similarity search

Integrates with existing PostgreSQL infrastructure, supports hybrid queries, production-proven at scale

LangChain

LLM orchestration and retrieval pipelines

Flexible abstractions for building complex RAG flows with multiple retrievers

Python / FastAPI

Backend API and processing pipelines

Async support for high-throughput AI workloads, excellent ML ecosystem

🔧 Supporting Technologies

PostgreSQL Redis HuggingFace Transformers

☁️ Infrastructure

Google Cloud Platform Docker / Kubernetes GitLab CI/CD

🏗️ Architecture

The system follows a three-stage retrieval architecture:

1
2
3
4
Query → Query Understanding → Hybrid Retrieval → Reranking → Response
          ↓                      ↓                 ↓
     Entity extraction    Vector + BM25     Cross-encoder
     Query expansion      Citation graph    Score fusion

System Components

Document Ingestion Pipeline

Processes new documents through structure parsing, chunking, embedding generation, and citation extraction

Hybrid Retriever

Combines PGVector semantic search with PostgreSQL full-text search (BM25)

Citation Graph Service

Manages document relationships and performs graph traversal for related document discovery

Reranking Service

Cross-encoder model that reorders candidates for maximum relevance

⚙️ Implementation Details

Structure-Aware Chunking Strategy

Patent documents have a specific structure that must be respected:

Abstract: High-level summary, good for initial retrieval
Claims: The legally binding scope (independent + dependent claims)
Description: Detailed explanation with drawings references
Prior Art: Citations to related patents and publications

Our chunking strategy:

Never split claims: each claim becomes its own chunk with metadata linking to parent claims
Preserve section context: every chunk includes section type metadata for filtering
Overlapping windows for description: 512 tokens with 128 token overlap
Citation extraction: all references extracted and stored in graph database

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
class PatentChunker:
    def chunk_patent(self, document: Patent) -> List[Chunk]:
        chunks = []
        # Claims are sacred - never split
        for claim in document.claims:
            chunks.append(Chunk(
                text=claim.text,
                metadata={
                    "section": "claim",
                    "claim_type": claim.type,  # independent/dependent
                    "parent_claim": claim.depends_on
                }
            ))
        # Description uses sliding window
        chunks.extend(self.sliding_window_chunk(
            document.description,
            window_size=512,
            overlap=128
        ))
        return chunks

Custom Embedding Model Training

Standard embedding models (OpenAI, Cohere) struggled with legal terminology. We fine-tuned a model using contrastive learning on patent pairs.

Training Data:

500K patent documents
50K manually labeled similar/dissimilar pairs from patent examiners
Synthetic pairs from citation networks (cited patents are similar)

Key Improvements:

Domain vocabulary: “anticipation” and “novelty rejection” are related
Technical precision: Chemical formulas and patent numbers embedded correctly
Citation context: “See US 9,123,456” correctly links to the referenced patent

The fine-tuned model improved retrieval accuracy from 67% to 85% on our test set.

📊 Results & Impact

50% Faster Search with Higher Relevance

50% Reduction in Search Time From minutes to seconds for complex queries

85% Retrieval Accuracy Measured on held-out legal expert annotations

10,000+ Daily Queries Processed Handling enterprise-scale workload

99.9% System Uptime With zero data loss incidents

Additional Outcomes

Legal professionals reported finding relevant prior art in a single search that previously required multiple attempts
The citation graph feature was adopted as a core workflow by patent analysts
System became a key factor in the RightHub → Anaqua acquisition

📚 Key Takeaways

Domain Expertise Beats Generic Models

Investing time to understand legal document structure paid dividends. The 2 weeks spent interviewing patent attorneys about their search patterns directly informed our chunking strategy.

Hybrid Search is Non-Negotiable for Enterprise

Pure semantic search excites demos but frustrates power users who know exactly what they're looking for. Hybrid search satisfies both exploratory and precision use cases.

Evaluation Requires Domain Experts

Standard IR metrics like NDCG weren't enough. We created a custom evaluation set with legal experts rating relevance, which revealed issues that automated metrics missed.

📝 Additional Details

The Challenge: AI That Understands Legal Language

When I joined RightHub (later acquired by Anaqua), the company had a clear vision: bring AI-powered search to intellectual property management. The challenge? Legal documents are unlike any other content.

A patent attorney searching for prior art needs:

Semantic understanding: finding relevant patents even when different terminology is used
Structural precision: understanding that Claim 1 defines the core invention
Citation awareness: knowing that Patent A citing Patent B means they’re related
Speed: results in seconds, not minutes

Generic RAG solutions failed on all counts.

Our Approach: Domain-First Design

Rather than forcing legal documents into a generic RAG framework, we designed the system around how patent professionals actually work.

Understanding the Domain

I spent the first two weeks interviewing patent attorneys, watching them search, and understanding their mental models. Key insights:

Claims are everything: The legal scope of a patent is defined entirely by its claims
Citation networks are gold: Experienced searchers follow citation chains to find related art
Exact matching still matters: When you know the patent number, you want exact results
Context is critical: A claim only makes sense in the context of its dependent claims

Technical Deep-Dive: The Retrieval Pipeline

Our final architecture processes queries through multiple stages:

Stage 1: Query Understanding

Entity extraction for patent numbers, company names, technical terms
Query expansion using domain synonyms (e.g., “mobile device” → “smartphone”, “cellular phone”, “handheld device”)
Intent classification (prior art search vs. freedom-to-operate vs. general research)

Stage 2: Hybrid Retrieval

Vector search using PGVector with custom legal embeddings
BM25 keyword search for exact matching
Citation graph traversal for related documents
Score fusion to combine results

Stage 3: Reranking

Cross-encoder model for precise relevance scoring
Diversity injection to avoid redundant results
Explanation generation for why each result matched

Results That Mattered

The system’s success was measured not just in technical metrics, but in user adoption:

Power users reported finding relevant prior art in a single search that previously required 3-4 attempts
Time savings translated to real cost reduction; patent searches that took hours now took minutes
Confidence: attorneys trusted the AI results enough to cite them in legal filings

This success was a key factor in Anaqua’s decision to acquire RightHub.

Key Takeaways for RAG Practitioners

Invest in domain understanding before writing code: The chunking strategy that emerged from user interviews was nothing like what I’d have designed in isolation
Hybrid search is essential for enterprise: Don’t get seduced by pure semantic search. Power users need exact matching.
Build evaluation datasets with domain experts: Standard benchmarks won’t tell you if your legal RAG is working
Citation/reference networks are underutilized: If your domain has documents that reference each other, use those relationships
Fine-tuned embeddings are worth the effort: The jump from 67% to 85% accuracy justified the investment in custom training

Want to discuss building a RAG system for your domain? Let’s talk.

Experience: Senior Backend Engineer & AI Lead at Anaqua

Technologies: LangChain, RAG Systems, PGVector, FastAPI, Python, PostgreSQL

Related Case Studies: Multi-LLM Orchestration | Agentic AI Knowledge Systems

Building an Enterprise RAG System?

Let's discuss how I can help solve your engineering challenges.

📅 Schedule a Call 📧 Send Email