ckem

Solutions · Legal

Retrieval for contracts that don't fit in a vector.

A typical M&A agreement runs 80–300 pages. A single embedding per document collapses indemnification, governing law, assignment, and termination into one averaged point — and the clause you actually need disappears into the mean.

ckem keeps the clause. Fine-grained ranking over passages, typed edges between definitions and their use, supersession modelled as a first-class relation. The retrieval engine returns the span — not a thirty-page document with the span buried inside.

The problem

Why flat retrieval gets contracts wrong.

  • Multi-topic by design.

    A single agreement covers payment terms, IP assignment, limitation of liability, governing law, and termination — in one file. A single document embedding mashes them together. A query for “cap on liability” comes back averaged against the other twenty topics.

  • Defined terms drift from their use.

    “Confidential Information” is defined on page 4 and used on pages 11, 23, and 47. ckem walks the typed graph from the use site back to the definition and returns both together — flat retrieval has to hope they land in the same top-K.

  • Amendments supersede.

    Schedule A as amended in the side letter is not the same as Schedule A in the master agreement. ckem models supersession as a first-class edge — queries default to current, but you can pin a date and get the corpus as it stood. Originals stay retrievable with provenance.

  • The span is the answer.

    “Yes, there's a cap” is not the answer. “Section 9.4: liability is capped at the fees paid in the twelve months preceding the claim” is. ckem returns the passage, not a pointer to the document that contains it.

What ckem does

Built for the structure of a contract.

  • Passage-level scoring. Indexed and ranked at the clause level. The graph keeps the document boundary visible so the matched clause carries its section heading and parent agreement.
  • Typed edges between clauses. References, definitions, supersession, derived_from. The retrieval engine returns the clause and its neighbours, not one without the other.
  • Provenance survives every merge. Near-duplicate clauses (the same indemnification language across a portfolio of agreements) auto-merge with derived_from edges back to every original. Audit-grade by default.
  • Your contracts never leave. Local sentence-transformer embeddings; no third-party embedding API in the default path. Self-host via Docker, run in your own AWS account with the included Terraform, or managed. Same code path.

Measured on LegalBench-RAG

The benchmark, not a demo.

LegalBench-RAG is a 776-pair retrieval benchmark assembled by ZeroEntropy across four legal QA datasets: PrivacyQA, CUAD, MAUD, and ContractNLI. Every query has a ground-truth answer span inside a specific contract. We run ckem on the upstream corpus and judgement files verbatim — no question filtering, no held-out subsets we picked ourselves. The full benchmark write-up is on arXiv (2408.10343).

ckem · overall (776 queries)
  • Hit@1071.3%
  • Recall@10 (span)67.6%
  • F1@10 (span)16.2%
  • Precision@10 (span)9.4%
ckem vs. paper baseline (per subset)
ckem · Hit@10paper · Recall@16
  • PrivacyQA100.0%vs.42.5%
  • CUAD85.1%vs.51.0%
  • ContractNLI71.1%vs.56.8%
  • MAUD28.9%vs.13.2%

Baseline: LegalBench-RAG paper (arXiv:2408.10343), Table 4 “Naive Method” Recall@16 — the strongest non- proprietary number the authors publish. ckem uses K=10, a tighter budget. Hit@10 vs. Recall@K differ when a query has multiple gold spans (Hit credits any single match), noted below.

What each subset tests
SubsetWhat it tests
PrivacyQAPrivacy-policy questions; document-level retrieval.
CUADCommercial contract clauses across 41 categories.
ContractNLINDA hypotheses against contract text.
MAUDM&A definitive agreements; deal-term retrieval.

Reading the numbers: LegalBench-RAG's relevance judgements are span-level. Hit@10 credits the system when the right span lands inside one of the returned passages — a generous metric. Precision@10 is the stricter read: of the 10 passages we returned, what fraction was actually relevant span.

Ongoing work: MAUD is the long tail. M&A definitive agreements are dense with deal-specific terminology (escrow, MAE carve-outs, fiduciary outs) where the right clause and a near-twin from a different deal embed close together. Three workstreams target this directly: a contracts-domain LoRA on top of the Qwen3 encoder, a learned reranker trained on the LegalBench-RAG judgements themselves, and graph-aware traversal that follows cross-reference edges to retrieve the cited definition rather than just its mention. The per-subset numbers are published so the tradeoff stays visible as those land.

In practice

Where teams put ckem in their legal stack.

  • Diligence review.

    An agent walks a target's contracts looking for change-of-control, assignment, exclusivity, and limitation clauses. ckem returns the span with its parent agreement and section heading; the agent reads the surrounding two clauses if it needs to.

  • Playbook compliance.

    A new contract gets checked against the firm's standard positions. ckem indexes the playbook and the in-flight draft; the typed graph surfaces every place the draft deviates and links to the relevant playbook entry.

  • Cross-portfolio question answering.

    “Which of our MSAs cap data-breach liability at less than three times annual fees?” That's a clause- level question across thousands of documents. Flat retrieval averages it into noise; the graph keeps the clauses ranked on their own merits.

Run ckem on your contracts.

Bring a folder of agreements and a labeled query set. We'll run ckem against your current retrieval and walk through the graph together over MCP.