ckem
MCP-native graph retrieval

The retrieval graph for long documents.

For teams shipping agents that need to ground in their own documents — contracts, protocols, specs, transcripts, research.

  • Auto-merging graph with full provenance
  • Typed edges, ranked over MCP
  • No SDK glue, no reranker wiring
  • Your documents never leave

98.7% Hit@10 on MultiHop-RAG against a 74.7% baseline — same encoder, same corpus, different result.

MCP-native · CLI enabled · Self-hosted or managed

What ckem is

Three pieces of infrastructure, not three features.

Ranking

Fine-grained matching, not flat lookups.

Long, multi-topic documents stop collapsing into a single noisy point. ckem scores the passage that actually answers the query — not the average of everything else in the document.

Lifecycle

Auto-merge with an audit trail.

The graph bounds itself as the corpus grows: near-duplicates merge on every write, originals soft-archive into derived_from edges. Provenance survives every merge — nothing is destroyed, just superseded.

Privacy

Your documents never leave.

Local sentence-transformer embeddings by default — no third-party API in the default path. Self-host via Docker, run in your own AWS account with the included Terraform, or managed. Same code path.

Where ckem excels

The territory the rest of the industry skips.

End-to-end graph retrieval over long, multi-topic, cross-referencing documents — protocols, contracts, technical specs, transcripts. Built for the corpora where a single embedding per document averages everything that matters into noise.

  • Long, multi-topic documents

    10k+ tokens with several distinct sections. Fine-grained ranking keeps the section that answers the query, where a single document-level embedding would average it into noise.

  • Cross-document reference resolution

    When the answer lives across documents — clauses with their defined terms, amendments with what they supersede — ckem walks the typed graph and returns them together.

  • Version-aware retrieval

    Supersession, amendments, deprecations modeled as first-class edges. Queries default to current; pin a date and get the corpus as it stood. Originals stay retrievable with provenance.

  • Agent-native access

    Your agents call retrieval directly over MCP. Self-hosted via Docker, run in your own AWS account with the included Terraform, or managed by us. Same code path; you pick who operates it.

How it works

Three steps. No magic.

  1. 01
    Ingest

    Any document, any size. Indexed at the passage level. Embeddings produced locally — your text doesn't leave the project.

  2. 02
    Graph

    Typed edges — similarity, supersedes, references, derived_from. Near-duplicate passages auto-merge on write, with provenance preserved on every merge.

  3. 03
    Query

    Fine-grained scoring across passages, optional graph-neighbor expansion. Returns ranked passages with their typed edges, directly to your agent over MCP.

See ckem on your corpus.

Bring a sample of your documents — we'll run ckem on them and walk through the graph and retrieval together over MCP.