System Architecture

RAG retrieval for Australian case-law similarity

The application combines dense vector search, keyword retrieval, rank fusion, reranking, and a grounded explanation layer. The LLM explains retrieved cases; it does not decide what cases exist.

2,288

unique decisions indexed

25.9k

retrievable chunks

997

gold eval queries

Overview Flow

Query path and ingestion path

Next.js 16OpenNext CloudflareD1VectorizeWorkers AI

Query path

online request path

User query

Plain-language legal facts

Next.js UI

Search form and results view

POST /api/search

Validates request with Zod

Workers AI embedding

bge-base-en-v1.5 query vector

Hybrid retrieval

Vectorize semantic + D1 FTS5 keyword

RRF fusion

Rank-level score fusion

Case aggregation

Chunk hits collapse to cases

Workers AI reranker

Cross-encoder relevance scores

Grounded explanation

Gemini or Workers AI over retrieved cases

JSON response

Citations, excerpts, scores

Ingestion path

offline corpus path

Local corpus parquet

Downloaded judgments and metadata

sample/eval-set scripts

Select source cases and gold queries

JSONL ingest files

Batch payloads for upload

POST /api/internal/ingest

Secret-protected ingestion route

D1 storage

cases, chunks, FTS5 keyword index

Vectorize index

Chunk vectors with metadata

RAG Design

Small-to-big retrieval, then grounded explanation

Embed the user query

Workers AI bge-base-en-v1.5 turns the plain-language legal situation into a 768-dimensional query vector.

@cf/baai/bge-base-en-v1.5

Run two first-stage searches

Vectorize returns semantic chunk matches while D1 FTS5 returns keyword/BM25 matches over the same chunk table.

Vectorize top-50 + D1 FTS5 top-50

Fuse ranks with RRF

Reciprocal Rank Fusion combines semantic and keyword lists by rank instead of mixing incompatible raw scores.

RRF k=60

Aggregate chunks to cases

The system searches at chunk level for recall, then keeps the best chunk for each case so lawyers see cases, not fragments.

20 case candidates

Rerank with a cross-encoder

Workers AI bge-reranker-base scores each candidate passage against the query and filters weak matches.

@cf/baai/bge-reranker-base

Explain only retrieved cases

Gemini Flash or Workers AI receives the reranked candidate set and produces grounded JSON explanations.

No retrieval in the LLM layer

Vector Search

Cloudflare Vectorize over judgment chunks

Each judgment is split into paragraph-aware windows of roughly 512 to 1024 tokens with overlap. Every chunk is embedded with Workers AI and stored in Vectorize with metadata filters for jurisdiction, court, year, and paragraph range.

Embedding model

@cf/baai/bge-base-en-v1.5

Dimensions

768

First-stage topK

50 semantic + 50 keyword

Candidate cap

20 cases before rerank

Storage

D1 and Vectorize share the same chunk IDs

D1 cases

Case-level metadata: citation, title, court, jurisdiction, year, catchwords, source URL, and future R2 key.

D1 chunks

Paragraph-aware judgment chunks with passage text and paragraph ranges. This also powers case detail pages.

D1 FTS5

External-content full-text index synchronized from chunks by triggers. Used for BM25 keyword retrieval.

Vectorize

Chunk vectors with indexed metadata for jurisdiction, court, year, and paragraph ranges. Used for semantic retrieval.

Grounding

The LLM is fenced in

Zod validates public search requests, internal ingest payloads, and LLM output shape.
The LLM can only cite cases already returned by retrieval; hallucinated citations are dropped.
Excerpts are kept only if they are verbatim substrings of the retrieved passage.
Relevance scores come from the reranker, never from the explanation model.
If the explanation provider fails, the API falls back to deterministic retrieval-only results.

Evaluation

997-query gold set

The evaluation uses open-australian-legal-qa questions whose source cases are present in the deployed corpus. The internal eval endpoint skips the explanation LLM and scores retrieval rankings directly.

Config	R@5	R@10	MRR	nDCG
Semantic only	0.930	0.950	0.878	0.895
Hybrid	0.975	0.992	0.944	0.956
Hybrid + rerank	0.978	0.989	0.953	0.962