Compiling Knowledge for AI Agents: The CKF Format and Knowledge Operations (KnowOps)
A vision paper proposing CKF — an open format that compiles documents into typed, schema-stable knowledge packages — and KnowOps, a framework that ports software-engineering lifecycle practices to agent-consumed knowledge bases. Empirical efficacy is the subject of a pre-registered confirmatory study currently in preparation.
Talk to this article
This post exists as a CKF package. Load it into your favorite LLM and discuss, summarize or apply its ideas.
Author: Paulo Tomazinho, PhD
Affiliation: CKF Research
Abstract
Large language models and multi-agent systems have made the consumption of unstructured documents by computational inference one of the central bottlenecks of contemporary AI engineering. Retrieval-augmented generation (RAG) addresses part of the problem (Lewis et al., 2020) but inherits an unstated assumption: that human-readable documents are the appropriate input to a machine reader. This vision paper proposes the Compiled Knowledge Format (CKF), an open format for transforming static documents (.pdf, .docx, .md) into typed, modular and schema-stable knowledge packages, and Knowledge Operations (KnowOps), an operational framework that ports software-engineering practices — compilation, semantic versioning, regression testing, CI/CD — to the lifecycle of agent-consumed knowledge bases. Both the format and the framework are presented as proposals under empirical investigation; their efficacy is the subject of a pre-registered confirmatory study currently in preparation.
Keywords: Compiled Knowledge Format; KnowOps; Multi-agent Systems; Structured RAG; AI Governance; Composition Hallucination.
1. Introduction
Since the publication of Vaswani et al.'s transformer architecture (2017), language models have grown both in raw capability and in the diversity of tasks they can address. Among the most consequential downstream developments has been the routinization of retrieval-augmented generation as the dominant pattern for grounding model outputs in external documents (Lewis et al., 2020). RAG works well when the retrieved fragment is the answer. It performs less reliably when the answer depends on integrating information across multiple fragments — particularly when the relations between those fragments (exceptions, scope, precedence, conditional applicability) were left implicit in the source prose.
This failure mode is not a hallucination in the parametric sense: the model has not invented external facts. Nor is it strictly a retrieval failure: the relevant chunks are present in the context. We have characterized it elsewhere as composition hallucination — outputs that contradict information present and retrievable in the context because the model failed to integrate the relations between fragments correctly (Tomazinho, 2026).
The architectural intuition behind CKF is that part of this structural inference can be moved earlier. Documents authored for human readers encode their structure in layout, prose conventions, and implicit reader knowledge. Agents reading those same documents must reconstruct that structure at every inference call, probabilistically, from fragments. CKF proposes to perform that reconstruction once, at compile time, and to serialize the result as a typed package that downstream agents read natively. The original document does not disappear; the compiled package is a parallel artifact, designed for machine readers.
This paper develops the CKF proposal along three dimensions. Section 2 describes the knowledge-compilation paradigm. Section 3 introduces a decoupled teacher-student architecture for compilation and runtime. Section 4 proposes Knowledge Operations (KnowOps) as an operational framework that applies software-engineering lifecycle practices to compiled knowledge bases. Section 5 establishes the relationship between CKF and the Model Context Protocol. Section 6 articulates the limitations of the proposal and the empirical work required to validate it.
2. The Knowledge-Compilation Paradigm
In software engineering, source code is the artifact authored by humans, while intermediate representations and machine code are the artifacts executed by processors. Compilers transform one into the other, optimizing for execution characteristics that source code does not need to express. The CKF proposal extends this distinction to knowledge consumed by machine readers.
Concretely, the CKF compilation process produces a metadata header plus 22 typed sections — entities, concepts, conditional rules, heuristics, procedures, principles, anti-patterns, causal chains, atomic units, retrieval chunks, and others — each with span-level provenance back to the source document. The table below contrasts traditional document processing with the proposed approach.
| Property | Traditional RAG / Raw Text | CKF-based Approach |
|---|---|---|
| Primary reader | Humans (layout and prose) | AI agents and LLMs (semantics) |
| Data structure | Linearly unstructured | Metadata header + 22 typed sections |
| Knowledge relations | Implicit in text flow | Explicit graph with mapped dependencies |
| Provenance | Page number (coarse) | Span-level (line, paragraph, excerpt) |
| Inference cost | Repeated per query | Amortized at compile time |
| Structural consistency | Depends on document layout | Schema-stable across runs |
The economic argument is straightforward. Structural inference performed at compile time runs once per document edit; structural inference performed at retrieval time runs once per query. For documents consulted many times by many agents over their lifetime, the amortization is favorable in principle. Whether the magnitude of the savings — and whether the quality of agent responses on the compiled artifact matches or exceeds quality on raw text — justifies the compilation cost is an empirical question to which we return in Section 6.
3. Decoupled Architecture: Teacher-Student Compilation
One of CKF's proposed operational characteristics is the decoupling of compute used for knowledge extraction from compute used for runtime inference. We describe a three-stage pipeline.
[Raw documents]
│
▼ (high-capacity models)
[Semantic compilation] ──► .ckf.md / .ckf.json (intermediate artifact)
│
▼ (specialist human review)
[Validated, sealed .ckf package]
│
▼ (lightweight runtime models)
[Production inference]
3.1. Compilation with high-capacity models
Compilation is compute-intensive. We propose using frontier-grade models — for example, Claude Opus 4.6, GPT-5, or Gemini 2.5 Pro — via map-reduce and semantic chunking, to read the full source, resolve dependencies, and emit the typed package. Token cost per document is high, but the operation runs only once per document-modification cycle.
This stage is implemented in the current CKF reference compiler, which supports five providers (OpenAI, Anthropic, Google, DeepSeek, OpenRouter) via bring-your-own-key.
3.2. Human validation in the loop
In high-stakes domains — medicine, law, regulated finance — fully automated knowledge ingestion creates the risk of compilation errors: cases where the compiler misreads structural relations in the source and encodes them incorrectly in the package. If the compiler misclassifies a drug-dose threshold or inverts a legal precedence, the error becomes a persistent feature of the knowledge base and propagates to every agent that consumes the package.
The proposed mitigation is to export the compiled artifact as .ckf.md — human-readable structured Markdown — for visual audit by domain specialists (knowledge engineers, clinicians, lawyers) before the package is sealed and distributed to production agents. Corrections are made directly in the Markdown layer and persisted through the rest of the pipeline.
The current CKF compiler produces auditable Markdown output. The structured review-and-approval workflow is proposed as a complete capability, with prototyping under way in the CKF Science Lab.
3.3. Runtime with lightweight models
Once a CKF package is approved and sealed (with a cryptographic hash for integrity), it can be injected into production environments where smaller models — for example, Claude Haiku 4.5, Gemini 2.5 Flash, GPT-5 mini, or comparable open-weight models — consume it for inference. The proposed advantage is that, because the package is structurally predictable and free of layout noise, runtime models can answer faithfully without the parametric capacity required to disambiguate raw prose.
This is the most economically attractive consequence of the proposal: pay once at compile time with a frontier model, save many times at runtime with a fraction of the cost per query. Whether the substitution holds at acceptable quality is the empirical question that gates the entire architecture. It is the central hypothesis of the confirmatory study described in Section 6.
4. KnowOps: CI/CD for Knowledge Bases
To prevent validated knowledge from becoming a monolithic block that is hard to maintain, the CKF protocol proposes Knowledge Operations (KnowOps) — a framework that ports software development lifecycle practices to AI knowledge bases. The proposal draws on the technical-debt literature for ML systems (Sculley et al., 2015) and extends it to knowledge artifacts specifically.
[Source change] ──► [Incremental compile] ──► [Semantic diff]
│
▼
[Deploy to production] ◄── [Regression tests] ◄── [Patch reapply]
4.1. Stable identity for knowledge units
Every elementary component extracted by CKF receives a stable unique identifier (e.g., REG-102-COPYRIGHT, RULE-RENAL-DOSE-001). When a source document changes — a paragraph edited in a corporate policy, a clinical guideline, or a legal text — the proposed KnowOps pipeline performs incremental recompilation rather than reprocessing the entire document. In outline:
- The system computes hashes for the modified excerpts in the source.
- It identifies which knowledge units were derived from the changed regions.
- It re-invokes LLM compilation only for the affected units, preserving the rest of the package and minimizing token cost.
This incremental capability is currently a design proposal. The existing CKF compiler operates on full documents, recompiling everything when the source changes.
4.2. Semantic diff and preservation of human patches
When a unit is recompiled, the result is a candidate version. To avoid blindly overwriting earlier human curation, KnowOps proposes a semantic diff mechanism that distinguishes paraphrastic rewrites from substantive logical changes.
If the recompilation produces changes that preserve logical equivalence — different wording, same meaning — prior human corrections are automatically reinjected as a semantic patch. If the recompilation introduces logical incompatibility — a changed dose threshold, a removed exception, an inverted precedence — the unit is marked stale and routed to a priority human-review queue, preventing conflicting data from reaching production agents.
The mechanism for reliably detecting logical equivalence across paraphrases is an open research problem; we return to it as one of the unsolved components of the proposal in Section 6.
4.3. Regression testing for knowledge
Before merging a new CKF package into the production branch, the proposed KnowOps pipeline runs a battery of regression tests: predefined Q&A pairs are submitted to an evaluator model using the newly compiled package. If the new version causes the system to fail an answer it previously got right, the pipeline blocks the merge.
This regression-testing layer is currently a design proposal. The CKF Science Lab implements a related but more limited capability — running the same question battery across PDF raw, TXT raw, and CKF conditions for benchmarking — which provides the experimental scaffolding from which the production regression suite is being developed.
5. Native Connection to the Model Context Protocol
The Model Context Protocol (MCP), introduced by Anthropic in 2024, standardizes how AI agents communicate with external systems — including how they call tools, dispatch actions, and access resources. MCP is deliberately agnostic about the content format of those resources.
CKF and MCP compose without conflict. MCP standardizes the operational capabilities of agents — the verbs they can invoke against the world. CKF establishes a schema for the knowledge that informs those verbs — the nouns over which decisions are made. A .ckf.json package can be exposed as a native MCP resource, allowing distributed multi-agent architectures to interact with compliance bases, runbooks, and policies without building per-agent parsers.
This composition is implemented in the current CKF MCP server, which exposes tools for compiling, parsing, validating, and searching CKF packages via JSON-RPC over Streamable HTTP. Claude Desktop, Cursor, Windsurf, and the Vercel AI SDK can connect to it directly.
6. Limitations and Open Questions
This is a vision paper. Its proposals describe an architecture and an operational framework. Some components are implemented in the current CKF reference implementation; others are design proposals awaiting prototyping; others are empirical hypotheses awaiting validation. Honesty about the state of evidence matters more than the rhetorical strength of any individual claim.
The empirical efficacy of CKF over raw text is under investigation. An initial pilot, conducted with ten questions across three formats (PDF raw, TXT raw, CKF), produced near-ceiling scores in faithfulness and completeness that did not differentiate between formats. The pilot was instrumented with a single model family acting as both agent and judge, an architectural choice that subsequent analysis identified as introducing a self-evaluation bias. A pre-registered confirmatory study is currently in preparation, incorporating: an independent judge model from a different model family, smaller context budgets to create retrieval pressure, multi-hop questions that require composition across fragments, paired counterfactual questions to distinguish correct composition from over-escalation, and a benchmark called COMPGAP designed specifically to isolate composition hallucinations from adjacent failure modes. Results will be reported regardless of direction.
The teacher-student decoupling is a hypothesis, not a result. Whether lightweight runtime models maintain quality when fed CKF packages compiled by frontier models is one of the questions the confirmatory study is designed to answer. If the hypothesis fails, the economic argument for the proposed architecture weakens substantially, even if other CKF properties remain useful.
The semantic diff mechanism has an unsolved component. Detecting logical equivalence between paraphrased knowledge units — distinguishing "the rule was rewritten with different wording" from "the rule was changed in substance" — is an open research problem. Existing techniques (embedding similarity, structural diff over parse trees, logical-form equivalence) each address part of the problem but none fully. The mechanism described in Section 4.2 will, in its first implementations, route ambiguous cases to human review rather than attempt fully automated resolution.
Compilation cost may not amortize for all document profiles. The amortization argument of Section 2 assumes that compiled artifacts are consulted many times across the document's lifetime. For documents consulted rarely, or updated continuously, the economics may favor leaving raw text in place. Characterizing the document profiles where CKF is and is not cost-effective is a topic for future empirical work.
Adoption requires interoperability that the protocol cannot guarantee unilaterally. For CKF to function as the intermediate representation between human documents and machine readers, ingestion tooling needs to mature across vector databases, graph stores, MCP-compatible clients, and retrieval frameworks. The CKF specification is open under MIT license and the reference implementation is available, but adoption velocity is outside the project's direct control.
If any of these hypotheses is contradicted by experimental evidence, the corresponding claim in this paper should be updated or retracted. The format will remain useful for properties that survive empirical scrutiny — provenance, schema stability, portability across runtimes — even where the more ambitious operational claims do not hold.
7. Conclusion
The proposal sketched in this paper treats context as a governed, versionable, auditable, compiled engineering artifact — and treats the relationship between human documents and machine readers as one where intermediate representation may earn its keep, as it does in software compilation. Whether the proposal holds empirically is a question for the confirmatory work in preparation, not a settled matter.
The CKF specification and reference implementation are open under the MIT license. The COMPGAP benchmark, the pre-registration protocol for the confirmatory study, and the experimental data will be published in dedicated artifacts as the research program advances. Contributions, critiques, and replications are welcome via the project repository and the Discord server.
The hypothesis remains under test.
References
- ANTHROPIC. (2024). Model Context Protocol Specification. https://modelcontextprotocol.io
- LEWIS, P., Perez, E., Piktus, A., et al. (2020). Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. Advances in Neural Information Processing Systems (NeurIPS), 33, 9459-9474. https://arxiv.org/abs/2005.11401
- SCULLEY, D., Holt, G., Golovin, D., et al. (2015). Hidden Technical Debt in Machine Learning Systems. Advances in Neural Information Processing Systems (NeurIPS), 28, 2503-2511. https://papers.nips.cc/paper/2015/hash/86df7dcfd896fcaf2674f757a2463eba-Abstract.html
- TOMAZINHO, P. (2026). Composition Hallucination in RAG, GraphRAG, and Agents: When Having the Context Is Not Enough. CKF Research Notes. https://compiledknowledgeformat.org/news/composition-hallucination-em-rag-graphrag-e-agentes
- VASWANI, A., Shazeer, N., Parmar, N., et al. (2017). Attention Is All You Need. Advances in Neural Information Processing Systems (NeurIPS), 30, 5998-6008. https://arxiv.org/abs/1706.03762