Company
Built because stapling chunks isn't retrieval.
When a vector store returns the top-K chunks across your corpus, document boundaries vanish. Ask a corpus of research papers a question and you get a methodology paragraph from one paper, a results table from another, and a conclusion from a third — fragments an LLM will fuse into an answer no single paper actually supports.
ckem keeps the structure of the document visible: ranked passages tied back to their source by typed edges. The retrieval engine is our own; the engineering around it — local embeddings, soft-archive provenance, MCP-native — is what ckem ships.
Why “ckem”?
Say it like seek 'em— the command you give a dog to go find something. That's what we're asking of the product: go seek the documents.
Short, easy to say, easy to remember. That's the whole story.
How it fits
Where ckem fits in your stack.
- Self-contained retrieval. ckem owns the path from passage to ranked result — local sentence-transformer embeddings, the typed graph, fine-grained scoring, and the lifecycle around them. No third-party encoder or external vector store in the default deployment.
- For documents. Long, static, multi-topic, cross-referencing. ckem is the retrieval graph your agent reaches into when it needs to ground itself in the corpus.
- Bring your own LLM. ckem returns ranked, graph-aware context over MCP. What your agent does with it — generate, summarize, reason — is your call.
- The retrieval layer for vertical AI. Built so teams shipping agents on regulated or long-document corpora can ground them without a months-long retrieval-stack build.
Security & deployment
Built so procurement isn't the bottleneck.
Local embeddings by default. Soft-archive provenance. Project-isolated in the schema. HIPAA BAA available.
Your documents never leave
Local sentence-transformer embeddings by default — no third-party embedding API in the default path. Documents are encoded, indexed, and queried inside your project.
Isolation in the schema
Tenancy is structural: User → Team → Project → Branch → Session with cascading deletes between layers. Every passage, edge, and embedding is scoped to a project. The schema does not allow cross-project reads.
Authentication
API keys are hashed with SHA-256 at rest. RBAC across four roles — viewer, contributor, admin, owner. Every key resolution updates last_used_at for revocation hygiene.
Audit & deletion
Soft-archive by default. Every auto-merge writes derived_from edges to its sources, so originals stay retrievable for audit. Type-aware retention preserves decisions and people; transient nodes age out on policy.
Deployment
Self-hosted via Docker Compose, run in your own AWS account with the included Terraform (ECS Fargate + RDS), or managed by us. Same code path; you pick who operates it.
Looking for pilot teams.
Bring a sample of your documents — we'll run ckem on them and walk through the graph and retrieval together over MCP.