Experimental · MIT · v0.2 spec

Your documents are source code. CKF is the compiler.

CKF turns PDFs, DOCX and Markdown into structured, schema-stable knowledge packages that LLMs and agents can actually reason about.

Stop pasting raw files into context windows. The Compiled Knowledge Format extracts 22 typed sections — entities, rules, procedures, atomic units, source traceability — into a portable package that is structurally consistent run after run.

.pdf.docx.md→.ckf.json.ckf.yamlMCP resource

Read the Manifesto Try the Compiler Explore the Protocol

step 1

Input

.pdf · .docx · .md · .txt

step 2

CKF Compiler

Chunk · LLM · Map-reduce

step 3

CKF Package

22 typed sections

step 4

Deploy

RAG · MCP · Agents

Comparison

Implicit prose vs labeled claims

Source basis labels exist so every claim carries its own evidence type. Compare a human paragraph (sources implied) with a .ckf fragment (sources tagged).

Good for humans

According to the internal manual, refunds above $500 generally require manager approval. The team also believes that responding within 2 hours improves retention, though this hasn't been formally measured. Some agents add that loyal customers tend to forgive delays — anecdotally.

.ckf · Labeled and traceable

explicit

inferred

synthesized

author_opinion

uncertain

confidence

claim: refunds > 500 require manager approval
  source_basis: explicit
  confidence: 0.96
  source: internal_manual#sec-4.2
claim: response under 2h improves retention
  source_basis: inferred
  confidence: 0.55
claim: loyal customers forgive delays
  source_basis: author_opinion
  confidence: 0.40
claim: response time correlates with NPS
  source_basis: synthesized
  confidence: 0.62
claim: agents share this folklore
  source_basis: uncertain
  confidence: 0.30

The Problem

Raw documents make agents hallucinate.

PDFs and DOCX were designed for ink and eyes, not for inference. When an agent receives a raw file it spends tokens parsing layout, guessing structure and inventing relationships that were never explicit.

~35% noise

Headers, footers, page numbers and boilerplate that waste context without carrying knowledge.

∞ inference

No schema means the model re-invents the document's structure on every call.

0 relations

Knowledge has no edges. Causes, exceptions and dependencies stay implicit in prose.

? provenance

Generated answers cannot be traced back to a specific source location or excerpt.

Format

The document is not the knowledge.

PDFs render pixels. CKF emits typed structure. Same source, two completely different artifacts — one for humans, one for inference.

PDF / DOCX

Primary reader: humans
Structure: visual layout
Retrieval: full-text search
Relations: implicit in prose
Provenance: page numbers
Updates: re-render whole file
Hallucination risk: high
Multi-agent: not portable

.ckf package

Primary reader: LLMs and agents
Structure: 22 typed sections
Retrieval: chunks + atomic units
Relations: explicit graph
Provenance: span-level traceability
Updates: patch a section
Hallucination risk: schema-stable
Multi-agent: portable resource

“PDFs were built for humans. CKF is built for Agentic AI.”

Infrastructure

MCP and CKF are complementary.

The Model Context Protocol gives agents verbs — tools to call and actions to take. CKF gives them nouns — the knowledge those actions operate on. A .ckf package is a native MCP resource.

MCP

Verbs · Tools · Actions

Standardizes how agents call tools, take side-effects and interact with services.

CKF

Nouns · Knowledge · Context

Standardizes how knowledge enters the context window: typed, traceable, structurally consistent.

Manifesto

Context is the new compute.

Tokens are the new instructions and context is the new memory bus. Whoever controls the format of context controls the cost, quality and reliability of every agent that depends on it.

How it works

Three steps from a folder of files to a deployable knowledge package.

Upload

Drop .pdf, .docx, .md or .txt. Everything runs in your browser — no server upload.

Compile

Semantic chunking, LLM extraction and map-reduce produce 22 typed sections with span-level provenance.

Deploy

Export as .ckf.json, .ckf.yaml or .ckf.md. Drop into RAG, expose as an MCP resource, or paste into any chat.

Open Source

MIT. Self-hostable. Zero telemetry.

The reference implementation is open source. Clone it, fork it, run it on your own hardware.

MIT License

Zero telemetry

Static SPA · runs anywhere

git clone && bun dev

FAQ

FAQ preview

What does "schema-stable" mean?

It means the same source compiles into the same set of typed sections across runs and providers. Not the same exact prose — the same structure, so downstream agents can rely on the shape.

Is CKF replacing RAG?

No. CKF is a better input layer for RAG: typed chunks, atomic units and source traceability instead of arbitrary page splits.

Is CKF only for LLMs?

No. CKF is designed for agents, reasoning systems and cognitive architectures, but a human can read a .ckf.md too.

Is CKF open?

MIT licensed, spec and reference implementation on GitHub.

Compile your first .ckf in under a minute.

Train your AI, assistants, or agents with structured and compiled knowledge.

Try the Compiler Contribute on GitHub

Inspired by Andrej Karpathy, (2026). LLM Wiki: A pattern for building personal knowledge bases using LLMs. GitHub Gist. https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f