Solutions · Legal
Retrieval for contracts that don't fit in a vector.
A typical M&A agreement runs 80–300 pages. A single embedding per document collapses indemnification, governing law, assignment, and termination into one averaged point — and the clause you actually need disappears into the mean.
ckem keeps the clause. Fine-grained ranking over passages, typed edges between definitions and their use, supersession modelled as a first-class relation. The retrieval engine returns the span — not a thirty-page document with the span buried inside.
The problem
Why flat retrieval gets contracts wrong.
Multi-topic by design.
A single agreement covers payment terms, IP assignment, limitation of liability, governing law, and termination — in one file. A single document embedding mashes them together. A query for “cap on liability” comes back averaged against the other twenty topics.
Defined terms drift from their use.
“Confidential Information” is defined on page 4 and used on pages 11, 23, and 47. ckem walks the typed graph from the use site back to the definition and returns both together — flat retrieval has to hope they land in the same top-K.
Amendments supersede.
Schedule A as amended in the side letter is not the same as Schedule A in the master agreement. ckem models supersession as a first-class edge — queries default to current, but you can pin a date and get the corpus as it stood. Originals stay retrievable with provenance.
The span is the answer.
“Yes, there's a cap” is not the answer. “Section 9.4: liability is capped at the fees paid in the twelve months preceding the claim” is. ckem returns the passage, not a pointer to the document that contains it.
What ckem does
Built for the structure of a contract.
- Passage-level scoring. Indexed and ranked at the clause level. The graph keeps the document boundary visible so the matched clause carries its section heading and parent agreement.
- Typed edges between clauses. References, definitions, supersession, derived_from. The retrieval engine returns the clause and its neighbours, not one without the other.
- Provenance survives every merge. Near-duplicate clauses (the same indemnification language across a portfolio of agreements) auto-merge with derived_from edges back to every original. Audit-grade by default.
- Your contracts never leave. Local sentence-transformer embeddings; no third-party embedding API in the default path. Self-host via Docker, run in your own AWS account with the included Terraform, or managed. Same code path.
Measured on LegalBench-RAG
The benchmark, not a demo.
LegalBench-RAG is a 776-pair retrieval benchmark assembled by ZeroEntropy across four legal QA datasets: PrivacyQA, CUAD, MAUD, and ContractNLI. Every query has a ground-truth answer span inside a specific contract. We run ckem on the upstream corpus and judgement files verbatim — no question filtering, no held-out subsets we picked ourselves. The full benchmark write-up is on arXiv (2408.10343).
- Hit@1071.3%
- Recall@10 (span)67.6%
- F1@10 (span)16.2%
- Precision@10 (span)9.4%
- PrivacyQA100.0%vs.42.5%
- CUAD85.1%vs.51.0%
- ContractNLI71.1%vs.56.8%
- MAUD28.9%vs.13.2%
Baseline: LegalBench-RAG paper (arXiv:2408.10343), Table 4 “Naive Method” Recall@16 — the strongest non- proprietary number the authors publish. ckem uses K=10, a tighter budget. Hit@10 vs. Recall@K differ when a query has multiple gold spans (Hit credits any single match), noted below.
| Subset | What it tests |
|---|---|
| PrivacyQA | Privacy-policy questions; document-level retrieval. |
| CUAD | Commercial contract clauses across 41 categories. |
| ContractNLI | NDA hypotheses against contract text. |
| MAUD | M&A definitive agreements; deal-term retrieval. |
Reading the numbers: LegalBench-RAG's relevance judgements are span-level. Hit@10 credits the system when the right span lands inside one of the returned passages — a generous metric. Precision@10 is the stricter read: of the 10 passages we returned, what fraction was actually relevant span.
Ongoing work: MAUD is the long tail. M&A definitive agreements are dense with deal-specific terminology (escrow, MAE carve-outs, fiduciary outs) where the right clause and a near-twin from a different deal embed close together. Three workstreams target this directly: a contracts-domain LoRA on top of the Qwen3 encoder, a learned reranker trained on the LegalBench-RAG judgements themselves, and graph-aware traversal that follows cross-reference edges to retrieve the cited definition rather than just its mention. The per-subset numbers are published so the tradeoff stays visible as those land.
In practice
Where teams put ckem in their legal stack.
Diligence review.
An agent walks a target's contracts looking for change-of-control, assignment, exclusivity, and limitation clauses. ckem returns the span with its parent agreement and section heading; the agent reads the surrounding two clauses if it needs to.
Playbook compliance.
A new contract gets checked against the firm's standard positions. ckem indexes the playbook and the in-flight draft; the typed graph surfaces every place the draft deviates and links to the relevant playbook entry.
Cross-portfolio question answering.
“Which of our MSAs cap data-breach liability at less than three times annual fees?” That's a clause- level question across thousands of documents. Flat retrieval averages it into noise; the graph keeps the clauses ranked on their own merits.
Run ckem on your contracts.
Bring a folder of agreements and a labeled query set. We'll run ckem against your current retrieval and walk through the graph together over MCP.