Your documents are source code. CKF is the compiler.
CKF turns PDFs, DOCX and Markdown into structured, schema-stable knowledge packages that LLMs and agents can actually reason about.
Stop pasting raw files into context windows. The Compiled Knowledge Format extracts 22 typed sections — entities, rules, procedures, atomic units, source traceability — into a portable package that is structurally consistent run after run.
Implicit prose vs labeled claims
Source basis labels exist so every claim carries its own evidence type. Compare a human paragraph (sources implied) with a .ckf fragment (sources tagged).
Good for humans
According to the internal manual, refunds above $500 generally require manager approval. The team also believes that responding within 2 hours improves retention, though this hasn't been formally measured. Some agents add that loyal customers tend to forgive delays — anecdotally.
.ckf · Labeled and traceable
claim: refunds > 500 require manager approvalsource_basis: explicitconfidence: 0.96source: internal_manual#sec-4.2claim: response under 2h improves retentionsource_basis: inferredconfidence: 0.55claim: loyal customers forgive delayssource_basis: author_opinionconfidence: 0.40claim: response time correlates with NPSsource_basis: synthesizedconfidence: 0.62claim: agents share this folkloresource_basis: uncertainconfidence: 0.30
Raw documents make agents hallucinate.
PDFs and DOCX were designed for ink and eyes, not for inference. When an agent receives a raw file it spends tokens parsing layout, guessing structure and inventing relationships that were never explicit.
~35% noise
Headers, footers, page numbers and boilerplate that waste context without carrying knowledge.
∞ inference
No schema means the model re-invents the document's structure on every call.
0 relations
Knowledge has no edges. Causes, exceptions and dependencies stay implicit in prose.
? provenance
Generated answers cannot be traced back to a specific source location or excerpt.
The document is not the knowledge.
PDFs render pixels. CKF emits typed structure. Same source, two completely different artifacts — one for humans, one for inference.
- Primary reader: humans
- Structure: visual layout
- Retrieval: full-text search
- Relations: implicit in prose
- Provenance: page numbers
- Updates: re-render whole file
- Hallucination risk: high
- Multi-agent: not portable
- Primary reader: LLMs and agents
- Structure: 22 typed sections
- Retrieval: chunks + atomic units
- Relations: explicit graph
- Provenance: span-level traceability
- Updates: patch a section
- Hallucination risk: schema-stable
- Multi-agent: portable resource
“PDFs were built for humans. CKF is built for Agentic AI.”
MCP and CKF are complementary.
The Model Context Protocol gives agents verbs — tools to call and actions to take. CKF gives them nouns — the knowledge those actions operate on. A .ckf package is a native MCP resource.
Standardizes how agents call tools, take side-effects and interact with services.
Standardizes how knowledge enters the context window: typed, traceable, structurally consistent.
Context is the new compute.
Tokens are the new instructions and context is the new memory bus. Whoever controls the format of context controls the cost, quality and reliability of every agent that depends on it.
How it works
Three steps from a folder of files to a deployable knowledge package.
Drop .pdf, .docx, .md or .txt. Everything runs in your browser — no server upload.
Semantic chunking, LLM extraction and map-reduce produce 22 typed sections with span-level provenance.
Export as .ckf.json, .ckf.yaml or .ckf.md. Drop into RAG, expose as an MCP resource, or paste into any chat.
MIT. Self-hostable. Zero telemetry.
The reference implementation is open source. Clone it, fork it, run it on your own hardware.
FAQ preview
What does "schema-stable" mean?
It means the same source compiles into the same set of typed sections across runs and providers. Not the same exact prose — the same structure, so downstream agents can rely on the shape.
Is CKF replacing RAG?
No. CKF is a better input layer for RAG: typed chunks, atomic units and source traceability instead of arbitrary page splits.
Is CKF only for LLMs?
No. CKF is designed for agents, reasoning systems and cognitive architectures, but a human can read a .ckf.md too.
Is CKF open?
MIT licensed, spec and reference implementation on GitHub.
Compile your first .ckf in under a minute.
Train your AI, assistants, or agents with structured and compiled knowledge.
Inspired by Andrej Karpathy, (2026). LLM Wiki: A pattern for building personal knowledge bases using LLMs. GitHub Gist. https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f
