Experimental · MIT · v0.2 spec

Your documents are source code. CKF is the compiler.

CKF turns PDFs, DOCX and Markdown into structured, schema-stable knowledge packages that LLMs and agents can actually reason about.

Stop pasting raw files into context windows. The Compiled Knowledge Format extracts 22 typed sections — entities, rules, procedures, atomic units, source traceability — into a portable package that is structurally consistent run after run.

.pdf.docx.md.ckf.json.ckf.yamlMCP resource
step 1
Input
.pdf · .docx · .md · .txt
step 2
CKF Compiler
Chunk · LLM · Map-reduce
step 3
CKF Package
22 typed sections
step 4
Deploy
RAG · MCP · Agents
Comparison

Implicit prose vs labeled claims

Source basis labels exist so every claim carries its own evidence type. Compare a human paragraph (sources implied) with a .ckf fragment (sources tagged).

Good for humans

Good for humans

According to the internal manual, refunds above $500 generally require manager approval. The team also believes that responding within 2 hours improves retention, though this hasn't been formally measured. Some agents add that loyal customers tend to forgive delays — anecdotally.

.ckf · Labeled and traceable

.ckf · Labeled and traceable

explicit
inferred
synthesized
author_opinion
uncertain
confidence
claim: refunds > 500 require manager approval
source_basis: explicit
confidence: 0.96
source: internal_manual#sec-4.2
claim: response under 2h improves retention
source_basis: inferred
confidence: 0.55
claim: loyal customers forgive delays
source_basis: author_opinion
confidence: 0.40
claim: response time correlates with NPS
source_basis: synthesized
confidence: 0.62
claim: agents share this folklore
source_basis: uncertain
confidence: 0.30
The Problem

Raw documents make agents hallucinate.

PDFs and DOCX were designed for ink and eyes, not for inference. When an agent receives a raw file it spends tokens parsing layout, guessing structure and inventing relationships that were never explicit.

~35% noise

Headers, footers, page numbers and boilerplate that waste context without carrying knowledge.

∞ inference

No schema means the model re-invents the document's structure on every call.

0 relations

Knowledge has no edges. Causes, exceptions and dependencies stay implicit in prose.

? provenance

Generated answers cannot be traced back to a specific source location or excerpt.

Format

The document is not the knowledge.

PDFs render pixels. CKF emits typed structure. Same source, two completely different artifacts — one for humans, one for inference.

PDF / DOCX
  • Primary reader: humans
  • Structure: visual layout
  • Retrieval: full-text search
  • Relations: implicit in prose
  • Provenance: page numbers
  • Updates: re-render whole file
  • Hallucination risk: high
  • Multi-agent: not portable
.ckf package
  • Primary reader: LLMs and agents
  • Structure: 22 typed sections
  • Retrieval: chunks + atomic units
  • Relations: explicit graph
  • Provenance: span-level traceability
  • Updates: patch a section
  • Hallucination risk: schema-stable
  • Multi-agent: portable resource

PDFs were built for humans. CKF is built for Agentic AI.

Infrastructure

MCP and CKF are complementary.

The Model Context Protocol gives agents verbs — tools to call and actions to take. CKF gives them nouns — the knowledge those actions operate on. A .ckf package is a native MCP resource.

MCP
Verbs · Tools · Actions

Standardizes how agents call tools, take side-effects and interact with services.

CKF
Nouns · Knowledge · Context

Standardizes how knowledge enters the context window: typed, traceable, structurally consistent.

Manifesto

Context is the new compute.

Tokens are the new instructions and context is the new memory bus. Whoever controls the format of context controls the cost, quality and reliability of every agent that depends on it.

How it works

How it works

Three steps from a folder of files to a deployable knowledge package.

01
Upload

Drop .pdf, .docx, .md or .txt. Everything runs in your browser — no server upload.

02
Compile

Semantic chunking, LLM extraction and map-reduce produce 22 typed sections with span-level provenance.

03
Deploy

Export as .ckf.json, .ckf.yaml or .ckf.md. Drop into RAG, expose as an MCP resource, or paste into any chat.

Open Source

MIT. Self-hostable. Zero telemetry.

The reference implementation is open source. Clone it, fork it, run it on your own hardware.

MIT License
Zero telemetry
Static SPA · runs anywhere
git clone && bun dev
FAQ

FAQ preview

What does "schema-stable" mean?

It means the same source compiles into the same set of typed sections across runs and providers. Not the same exact prose — the same structure, so downstream agents can rely on the shape.

Is CKF replacing RAG?

No. CKF is a better input layer for RAG: typed chunks, atomic units and source traceability instead of arbitrary page splits.

Is CKF only for LLMs?

No. CKF is designed for agents, reasoning systems and cognitive architectures, but a human can read a .ckf.md too.

Is CKF open?

MIT licensed, spec and reference implementation on GitHub.

Compile your first .ckf in under a minute.

Train your AI, assistants, or agents with structured and compiled knowledge.

Inspired by Andrej Karpathy, (2026). LLM Wiki: A pattern for building personal knowledge bases using LLMs. GitHub Gist. https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f