MCP-native graph retrieval

The retrieval graph for long documents.

For teams shipping agents that need to ground in their own documents — contracts, protocols, specs, transcripts, research.

Auto-merging graph with full provenance
Typed edges, ranked over MCP
No SDK glue, no reranker wiring
Your documents never leave

98.7% Hit@10 on MultiHop-RAG against a 74.7% baseline — same encoder, same corpus, different result.

See the benchmarks →

MCP-native · CLI enabled · Self-hosted or managed

similarityreferencessupersedesderived_from

What ckem is

Three pieces of infrastructure, not three features.

Ranking

Fine-grained matching, not flat lookups.

Long, multi-topic documents stop collapsing into a single noisy point. ckem scores the passage that actually answers the query — not the average of everything else in the document.

Lifecycle

Auto-merge with an audit trail.

The graph bounds itself as the corpus grows: near-duplicates merge on every write, originals soft-archive into derived_from edges. Provenance survives every merge — nothing is destroyed, just superseded.

Privacy

Your documents never leave.

Local sentence-transformer embeddings by default — no third-party API in the default path. Self-host via Docker, run in your own AWS account with the included Terraform, or managed. Same code path.

Where ckem excels

The territory the rest of the industry skips.

End-to-end graph retrieval over long, multi-topic, cross-referencing documents — protocols, contracts, technical specs, transcripts. Built for the corpora where a single embedding per document averages everything that matters into noise.

Long, multi-topic documents
10k+ tokens with several distinct sections. Fine-grained ranking keeps the section that answers the query, where a single document-level embedding would average it into noise.
Cross-document reference resolution
When the answer lives across documents — clauses with their defined terms, amendments with what they supersede — ckem walks the typed graph and returns them together.
Version-aware retrieval
Supersession, amendments, deprecations modeled as first-class edges. Queries default to current; pin a date and get the corpus as it stood. Originals stay retrievable with provenance.
Agent-native access
Your agents call retrieval directly over MCP. Self-hosted via Docker, run in your own AWS account with the included Terraform, or managed by us. Same code path; you pick who operates it.

How it works

Three steps. No magic.

01
Ingest
Any document, any size. Indexed at the passage level. Embeddings produced locally — your text doesn't leave the project.
02
Graph
Typed edges — similarity, supersedes, references, derived_from. Near-duplicate passages auto-merge on write, with provenance preserved on every merge.
03
Query
Fine-grained scoring across passages, optional graph-neighbor expansion. Returns ranked passages with their typed edges, directly to your agent over MCP.

# Agents call ckem over MCP
hits = await mcp.call("query", {
    "project": "corpus-id",
    "text": "What does section 4 say about termination?",
    "include_neighbors": True,
})

# hits[0].text       → matching passage
# hits[0].score      → 0.88
# hits[0].neighbors  → typed cross-references

See ckem on your corpus.

Bring a sample of your documents — we'll run ckem on them and walk through the graph and retrieval together over MCP.

See benchmarks →

The retrieval graph for long documents.

Three pieces of infrastructure, not three features.

Fine-grained matching, not flat lookups.

Auto-merge with an audit trail.

Your documents never leave.

The territory the rest of the industry skips.

Long, multi-topic documents

Cross-document reference resolution

Version-aware retrieval

Agent-native access

Three steps. No magic.

See ckem on your corpus.