ResearchMay 16, 202614 min read

From LLM Wiki to CKF: Karpathy's compiler analogy applied to the agentic layer

Karpathy formulated one of the most important architectural intuitions of the current AI phase: raw documents should be compiled into persistent artifacts before being queried repeatedly by LLMs. CKF extends that intuition to the case where the consumer of the compiled artifact is not a human, but an agent.

Paulo Tomazinho, PhDCKF Research

May 16, 2026

Talk to this article

This post exists as a CKF package. Load it into your favorite LLM and discuss, summarize or apply its ideas.

From LLM Wiki to CKF: Karpathy's compiler analogy applied to the agentic layer

Building on Karpathy's compiler analogy for AI knowledge bases.

Author: Paulo Tomazinho, PhD Affiliation: CKF Research

Abstract

In April 2026, Andrej Karpathy formulated one of the most important architectural intuitions of the current applied-AI phase: raw documents should be compiled into persistent artifacts before being queried repeatedly by LLMs, in the same way that source code is compiled before being executed. The formulation was rapidly adopted as the LLM Wiki Pattern — a Markdown wiki maintained by an LLM, optimized for agent-assisted human reading. This article positions itself explicitly as an extension of that intuition to an adjacent but distinct problem: the consumption of compiled knowledge by machines and agents, not by humans. The Compiled Knowledge Format (CKF) proposes a typed, validatable format with granular provenance, integrable with ecosystems such as MCP, structured RAG, and GraphRAG — operationalizing Karpathy's analogy at a layer he himself left open.

Keywords: Compiled Knowledge Format; LLM Wiki; KnowOps; structured RAG; MCP; Composition Hallucination.

1. Introduction

Generative AI is moving from a model-centric phase to a systems-centric phase. Modern applications are not just isolated LLM calls; they combine retrieval, tools, persistent memory, workflow execution, specialized agents, private knowledge bases, and interoperability protocols. In this scenario, the central question stops being only "which model should we use?" and becomes also "in what format should knowledge exist in order to be used by agents?".

The dominant retrieval-augmented generation (RAG) paradigm, introduced by Lewis et al. (2020), combines a model's parametric memory with non-parametric memory retrieved from an external source, allowing models to access knowledge outside their weights. The original paper already highlighted precise knowledge manipulation, updating, and provenance as central challenges — challenges that common RAG implementations have not yet fully solved.

Many RAG systems in production still operate over raw documents split into chunks. This approach retrieves fragments, not relations, rules, exceptions, procedures, or operational knowledge. With each new question, the model probabilistically rediscovers structure in the context, from prose written for human eyes.

The question that follows is: what if we compile the knowledge once, and then query it repeatedly in structured form?

This question was articulated with remarkable clarity by Karpathy in April 2026.

2. Karpathy's intuition: knowledge that accumulates

Andrej Karpathy published in April 2026 what he described as an "idea file" — a conceptual document on how to use LLMs to maintain personal knowledge bases. The original X post reached more than 16 million views, and the GitHub gist hit around 5,000 stars within a few days. Outlets such as VentureBeat covered the publication in April, and multiple interpretations appeared on Medium, Level Up Coding, and other technical channels during April and May.

The central intuition is simple and powerful: instead of the LLM rediscovering knowledge with every question — fetching chunks, joining fragments, synthesizing relations at runtime — it should compile the knowledge once into a persistent artifact, and then query that artifact repeatedly.

The analogy Karpathy chooses is software compilation. Source code is human-readable but is not directly executed; it is compiled once into an optimized binary, which is then efficiently executed on subsequent calls. Applied to knowledge: raw documents are the source, the LLM is the compiler, and the resulting Markdown wiki is the compiled artifact.

Karpathy's proposed implementation has three layers:

Raw sources — immutable original documents (source of truth)
Wiki — a directory of Markdown files generated and maintained by the LLM
Schema — a conventions file that disciplines how the LLM should maintain the wiki

And three operations:

Ingest — process new sources into the wiki
Query — ask questions to the wiki (with viable answers filed back as new pages)
Lint — periodic health checks (contradictions, orphan pages, missing concepts)

This formulation has multiple qualities. It is simple, local-first, compatible with existing tools (Obsidian, Git, Claude Code), and captures a structural truth: we should not force AI systems to reinterpret raw knowledge on every call if we can precompile a better representation.

But the LLM Wiki, as formulated, is optimized for a specific consumer: the human, with the LLM acting as a maintenance assistant. The pages are human-readable Markdown, navigable in editors such as Obsidian, with cross-references that humans follow visually. The LLM does the heavy bookkeeping, but the final artifact is designed to be read by a person.

The question motivating CKF is: what about when the primary consumer of the compiled knowledge is not human?

3. When the consumer is a machine

Consider a corporate compliance agent. It needs to apply a specific policy to a concrete decision. It does not need to navigate the policy like a reading human; it needs to retrieve the exact rule, check whether applicable exceptions exist, validate the source, and justify the decision auditably.

Consider a clinical prescription agent. It needs to know a drug's standard dose, the exception for renal insufficiency, and the contraindication in pregnancy. It does not need the entire pharmacology textbook; it needs typed, operable, source-traceable units.

Consider an educational tutoring agent. It needs to know which pedagogical strategy to apply at which moment of the lesson, under which condition, with which cognitive goal. It does not need pedagogical narrative; it needs operational relations between principles, strategies, and contexts.

In all these cases, the primary consumer of the compiled knowledge is a machine or agent, not a human. And for that consumer, the ideal format is not necessarily a visually navigable Markdown wiki. It is something closer to a typed, validatable, programmatically consumable intermediate representation.

Karpathy acknowledges the personal, exploratory character of the LLM Wiki in his own description. The natural question that follows is: how do we extend the same architectural intuition — compile once, query many times — to the case where the consumer is the agent itself?

4. CKF: operationalizing the analogy for agents

The Compiled Knowledge Format (CKF) is a proposed format that applies the compiler analogy specifically to the case of knowledge consumed by machines. CKF's current formulation is directly inspired by Karpathy's intuition. Whereas the LLM Wiki is optimized for agent-assisted human consumption, CKF is optimized for agent consumption — with humans in the audit loop.

The difference is not hierarchical (one better than the other). It is one of application scope.

Concretely, a .ckf package is a single file serialized as .ckf.md, .ckf.yaml, or .ckf.json that describes a body of knowledge as a metadata header plus 22 typed sections: entities, concepts, conditional rules, heuristics, procedures, principles, anti-patterns, causal chains, atomic units, retrieval chunks, and others. Each item carries span-level provenance back to the source document, a source-basis label (explicit, inferred, synthesized, author_opinion, uncertain), and a confidence score.

CKF's design properties, described in the open specification, include:

Portability: three interchangeable encodings (Markdown, YAML, JSON) for the same semantic structure
Modularity: each typed section is independently consumable
Versioning: stable identifiers enable incremental diff and patch
Graph compatibility: entity and concept sections can be projected into graphs
Retrieval-ready: atomic units and retrieval chunks are indexable in vector stores
Agent-ready: a stable schema enables programmatic consumption without custom parsers

In one sentence: the LLM Wiki is the human, exploratory form of compiled knowledge; CKF is the protocol-based, agentic form.

5. Concrete differences

The table below contrasts Karpathy's LLM Wiki Pattern with CKF on operational properties.

Property	LLM Wiki Pattern (Karpathy)	CKF
Primary consumer	Human assisted by LLM	Agent/LLM with human audit
Artifact	Directory of Markdown files	`.ckf` package (Markdown, YAML, or JSON)
Internal structure	Schema defined locally in CLAUDE.md	22 typed sections, standardized schema
Navigation	Visual in Obsidian, manual cross-references	Programmatic by type and identifier
Provenance	Textual citations in the wiki	Span-level + source basis + confidence score
Validation	Lint via LLM (contradictions, orphans)	Programmatic validation against schema
Distribution	Personal Git repository	Package portable across systems
Agent integration	Via directory reading	Native MCP resource, indexable in RAG/GraphRAG
Versioning	Git over Markdown	Semantic versioning of the package
Best use case	Exploratory personal/team knowledge	Operational knowledge for production agents

The most important difference is the first row. The LLM Wiki exists so the human can read and maintain knowledge with LLM help. CKF exists so the agent can read knowledge maintained with human help. The two flows are complementary, not competitive.

For a researcher organizing notes on papers, the LLM Wiki is clearly better: Markdown, Obsidian, visual cross-references, non-linear exploration. For an agentic system in production interpreting regulatory policies, CKF offers properties the wiki does not: a validatable schema, stable identifiers, granular auditable provenance, native MCP integration.

6. CKF in the existing ecosystem

The What CKF is not page details CKF's positioning relative to adjacent technologies. In summary: CKF does not replace vector stores, knowledge graphs, RDF ontologies, fine-tuning systems, RAG, GraphRAG, or MCP. It is the content format that can be indexed, ingested, or served by those architectures.

The relationship with MCP deserves additional emphasis. The Model Context Protocol, introduced by Anthropic in November 2024, standardizes how agents connect to external systems — including how they invoke tools and access resources. MCP is deliberately agnostic about the format of the resources it transports.

The proposed framing is:

MCP provides the verbs: tools, actions, and operational capabilities CKF provides the nouns: knowledge, context, and operable units

A .ckf.json package can be exposed as a native MCP resource, allowing distributed multi-agent architectures to consume compliance bases, runbooks, and policies without building custom parsers. The CKF MCP server implements this integration via JSON-RPC over Streamable HTTP.

The relationship with GraphRAG, proposed by Edge et al., is similarly complementary. GraphRAG is strong at global synthesis over private corpora through entity communities. CKF can contain graph layers (entities, concepts, relations) among its 22 typed sections, and its atomic units can be inserted into the GraphRAG index. The two approaches cover partially overlapping but distinct ground: GraphRAG is strong at "what is connected to what"; CKF is strong at "which rule to apply under which condition".

7. KnowOps: knowledge operations

If knowledge becomes a compiled artifact, an adjacent operational discipline emerges: KnowOps, or Knowledge Operations. KnowOps applies to agentic knowledge the principles already familiar from DevOps, DataOps, and MLOps: build, validation, versioning, diff, regression tests, human review, deploy, rollback, observability, and governance.

The concern is consistent with the ML-systems literature. Sculley et al. (2015) showed that ML systems accumulate specific technical debt — entanglement, hidden feedback loops, unstable dependencies, growing maintenance costs. Agentic systems based on compiled knowledge can suffer analogous forms of debt: stale rules, weak claims, contradictory sources, lost human patches, semantic drift, cognitive regressions.

The KnowOps proposal includes:

Stable identity for knowledge units (e.g., RULE-RENAL-DOSE-001)
Incremental compilation when the source document changes
Semantic diff distinguishing paraphrases (preserve human patches) from logically incompatible changes (trigger priority review)
Knowledge regression tests running predefined Q&A against each new package version
Merge blocking when a new version causes regression on previously correct answers

This operational framework is partially implemented in the CKF Science Lab; the rest is design proposal awaiting prototyping.

8. Provenance, security, and governance

Three areas where CKF explicitly extends what the LLM Wiki covers.

Granular provenance

Each item in a CKF package carries not only a reference to the source document but the exact span (line, paragraph, excerpt) and a basis classification. The distinction between an explicit claim ("the dose is 40 mg"), an inference ("therefore, this class of patients needs reduction"), a synthesis ("across the three protocols consulted, there is consensus that..."), an author opinion ("method X is preferable to Y"), and an uncertainty ("this point is not clear in the sources") is treated as first-class data.

This aligns CKF with the FAIR (Findable, Accessible, Interoperable, Reusable) principles proposed by Wilkinson et al. for scientific data management. In high-stakes domains, the difference between an explicitly stated claim and a reasonable inference is decisive. An agent should not treat all these categories as equivalent.

Security as a first-class concern

In corporate or governmental multi-agent systems, knowledge packages become production-grade cognitive dependencies. This creates an attack surface: embedded prompt injection, forged provenance, biased rules, stale knowledge, contaminated package supply chains.

The OWASP Top 10 for Large Language Model Applications catalogs relevant vectors: prompt injection, insecure plugin design, training data poisoning, and others. CKF does not solve these problems in isolation, but the typed format and the explicit separation between knowledge and instructions enable specific mitigations: cryptographic package signing, structural linting, schema validation, segregation of data and directives, and mandatory human review for high-stakes domains.

Governance as a native property

The most consequential operational difference between a personal LLM Wiki and a production CKF package is governance. A personal wiki belongs to one person who decides what goes in. A CKF package feeding agents in regulated domains needs process: who approves changes, who audits compilations, who is accountable when an agent errs based on incorrectly compiled content.

CKF does not solve governance by design; it makes it possible. Stable identifiers make audit viable. Granular provenance makes traceability exact. A standardized schema enables automatic validation tooling. The organization adopting CKF still has to build its own review processes, but it has adequate starting material.

9. Connection to Composition Hallucination

In a previous article, we proposed the term Composition Hallucination to describe a specific class of error: outputs that contradict information present and retrievable in context, because the model failed to correctly integrate the relations between fragments. It is not parametric hallucination (the information is there), it is not a retrieval failure (the chunks were retrieved), and it is not exactly lost-in-the-middle (the context may be short). It is a composition failure.

CKF's central intuition — moving inference structure from runtime to build time — is an architectural proposal to reduce that specific class of error. If the relations between rule and exception, principle and application, procedure and precondition are explicitly encoded in the package, the agent does not need to re-infer them on every query. Each saved inference is an opportunity for error removed.

Whether this hypothesis holds empirically is the central question CKF's research program needs to answer.

10. Limitations and open questions

This article, like the others in the program, is honest about the state of the evidence.

The cost-benefit ratio of compilation is not universal. The amortization argument (pay once at compilation, save on every query) assumes that compiled packages are queried many times. For rarely consulted or continuously updated documents, the math may favor keeping raw text. Characterizing the document profiles for which CKF is and is not cost-effective is a topic of future work.

Semantic-diff mechanisms have unresolved components. Detecting logical equivalence between rewritten knowledge units distinguishing "the rule was paraphrased" from "the rule was substantively altered" is an open research problem. Existing techniques (embedding similarity, structural diff over parse trees, equivalence of logical forms) cover parts of the solution, not the whole. Early CKF implementations will route ambiguous cases to human review rather than attempt fully automatic resolution.

Adoption depends on interoperability that the protocol does not unilaterally control. For CKF to function as an intermediate representation between human documents and machine readers, ingestion tooling needs to mature across vector stores, graph stores, MCP clients, and retrieval frameworks. The CKF specification is open under MIT, but adoption velocity is outside the project's direct control.

If any of these hypotheses is contradicted by experimental evidence, the corresponding claim in this article should be updated or retracted.

11. Conclusion

Karpathy's formulation is one of the most important conceptual contributions of the current applied-AI phase: raw documents should be compiled into persistent artifacts before being queried repeatedly by LLMs.

CKF starts from that intuition and asks: what happens when the consumer of that compiled artifact is not the human reading in Obsidian, but the agent retrieving a rule to apply to a decision? The proposed answer is a typed, validatable format with granular provenance, integrable with MCP and structured RAG pipelines.

The LLM Wiki is the human, exploratory form of compiled knowledge. CKF is the protocol-based, agentic form.

The two forms are complementary, not competitive. Karpathy showed that raw knowledge should be compiled once to be queried many times; CKF proposes a specific format for the case in which the queries are made by machines, not by people.

If the transition documents → compiled wikis → knowledge packages → knowledge operations validates empirically, the next layer of agentic AI will not be only larger models or longer context windows. It will be a new infrastructure for representing, versioning, auditing, and executing knowledge.

The CKF specification and the reference implementation are open under MIT. Contributions, critiques, and replications are welcome via the project repository.

The hypothesis remains under test.

References

ANTHROPIC. (2024). Model Context Protocol Specification. https://modelcontextprotocol.io
EDGE, D., Trinh, H., Cheng, N., et al. (2024). From Local to Global: A Graph RAG Approach to Query-Focused Summarization. arXiv preprint. https://arxiv.org/abs/2404.16130
KARPATHY, A. (2026). LLM Wiki: A pattern for building personal knowledge bases using LLMs. GitHub Gist. https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f
LEWIS, P., Perez, E., Piktus, A., et al. (2020). Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. Advances in Neural Information Processing Systems (NeurIPS), 33, 9459-9474. https://arxiv.org/abs/2005.11401
OWASP. (2025). OWASP Top 10 for Large Language Model Applications. https://owasp.org/www-project-top-10-for-large-language-model-applications/
SCULLEY, D., Holt, G., Golovin, D., et al. (2015). Hidden Technical Debt in Machine Learning Systems. Advances in Neural Information Processing Systems (NeurIPS), 28, 2503-2511. https://papers.nips.cc/paper/2015/hash/86df7dcfd896fcaf2674f757a2463eba-Abstract.html
TOMAZINHO, P. (2026). Composition Hallucination in RAG, GraphRAG, and agents: when having the context is not enough. CKF Research Notes. https://compiledknowledgeformat.org/news/composition-hallucination-em-rag-graphrag-e-agentes
WILKINSON, M. D., Dumontier, M., Aalbersberg, I. J., et al. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data, 3, 160018. https://www.nature.com/articles/sdata201618

ckfllm-wikikarpathyknowopsmcpragcomposition-hallucination

Continue reading

ResearchJune 12, 202612 min read

CKF on the global map: how the Compiled Knowledge Format compares to RAG, Document AI, GraphRAG, and semantic standards

A comparative analysis between CKF and the main global alternatives for structuring documents, preparing data for LLMs, building RAG, creating knowledge graphs, and standardizing APIs.

ConceptJune 12, 202622 min read

CKF Explained at Five Levels: From a 10-Year-Old to an IR Specialist

The same idea — Compiled Knowledge Format — explained five times, each level zooming in: a 10-year-old, a teenager, a non-technical adult, a technical professional, and an Information Retrieval specialist.

ResearchMay 22, 202618 min read

CKF Project Review: From CKF-0.1 to CKF Compiler v1.03.1

A scientific retrospective of the CKF Compiler, tracing the journey from CKF-0.1 (≈10% semantic preservation) to v1.03.1 — the first balanced release that simultaneously preserves meaning, structure, retrieval surface, sanitation, metadata and traceability.