Agentic AI, RAG, MCP and Compiled Knowledge: The Case for a Knowledge Layer
A walk through fourteen foundational papers — from Attention Is All You Need to DSPy and the LLM Wiki — and the case for CKF as a proposed layer of compiled knowledge.
Talk to this article
This post exists as a CKF package. Load it into your favorite LLM and discuss, summarize or apply its ideas.
A walk through fourteen foundational papers — from Attention Is All You Need to DSPy and the LLM Wiki — and the case for CKF as a proposed layer of compiled knowledge.
Author: Paulo Tomazinho, PhD
Affiliation: CKF Research
Abstract
Between 2017 and 2026, artificial intelligence stopped being a story about models and became a story about systems. Transformers gave machines the ability to learn language at scale; retrieval-augmented generation (RAG) connected them to external knowledge; agent frameworks taught them to act; the Model Context Protocol (MCP) standardized how they talk to tools. Yet across this entire stack a single layer remains conspicuously absent: a portable, machine-native representation of compiled knowledge — the operational, structured, traceable substrate that agents reason over. In this article we trace fifteen foundational papers and specifications that brought us here, and argue that the Compiled Knowledge Format (CKF) is a proposed next layer of this emerging infrastructure — a hypothesis under empirical investigation, not a settled conclusion.
1. Introduction — from the era of models to the era of systems
If the 2017 revolution was "models can learn language," the 2025 revolution is "models can operate systems." A modern AI deployment is no longer a single weight checkpoint serving completions. It is a composite of retrievers, vector stores, tool routers, agent loops, memory subsystems, evaluators, and governance layers. Each of these components emerged from a specific paper that solved a specific bottleneck — and each, in solving its problem, exposed the next.
The arc is cumulative. Attention removed sequential bottlenecks. Pre-training removed task-specific bottlenecks. Scale removed capability bottlenecks. RAG removed the freshness bottleneck. ReAct removed the action bottleneck. MCP removed the integration bottleneck. What remains is the knowledge bottleneck: agents are forced to reason over text fragments that were never designed for them. We will revisit each milestone in turn, then make the case for CKF as a proposed response to this gap.
2. Foundations
2.1 Attention Is All You Need
Vaswani, A. et al. Attention Is All You Need. NeurIPS, 2017. arXiv:1706.03762.
Before 2017, sequence modeling was dominated by recurrent architectures that processed text token by token. This sequentiality imposed three structural costs: training did not parallelize well on GPUs, long-range dependencies decayed through hidden-state compression, and a single recurrent vector had to carry the entire history of a sequence. Vaswani and colleagues proposed an architecture that abandoned recurrence entirely: every token could attend to every other token in a single, fully parallel operation.
The mechanism, self-attention, decomposes each token into three projections — Query, Key, and Value — and computes a weighted combination of Values based on dot-product similarity between Queries and Keys. The result is a representation in which any token's meaning is contextualized by every other token in the sequence, and the entire computation collapses into matrix multiplications that GPUs execute efficiently.
The downstream impact is difficult to overstate: GPT, Claude, Gemini, Llama, Mistral, and every modern frontier model descend directly from this design. For our purposes, what matters is what Transformers do not do. They are exquisite pattern compressors, but they have no native mechanism for persistent memory, no explicit knowledge store, and no built-in provenance. The proposed CKF layer begins exactly at this boundary.
2.2 BERT — Bidirectional Pre-training
Devlin, J. et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. NAACL, 2019. arXiv:1810.04805.
BERT generalized the Transformer encoder by training it on a masked language modeling objective: predict a hidden token given both its left and right context. This bidirectionality yielded representations that transferred remarkably well across downstream tasks through lightweight fine-tuning. BERT established the template that still governs modern NLP: pre-train once on enormous unlabeled corpora, then specialize cheaply.
2.3 Scaling Laws
Kaplan, J. et al. Scaling Laws for Neural Language Models. 2020. arXiv:2001.08361.
Kaplan et al. showed that loss decreases as a smooth power law in three quantities — parameters, data, and compute — and that improvements remain predictable across many orders of magnitude. The practical consequence was a license to scale: investing in larger models with proportionally more data was no longer speculative, it was empirically grounded. The paper turned model development from a research bet into an engineering roadmap.
2.4 GPT-3 — Language Models are Few-Shot Learners
Brown, T. B. et al. Language Models are Few-Shot Learners. NeurIPS, 2020. arXiv:2005.14165.
GPT-3 demonstrated that, beyond a certain scale, models acquired the ability to perform new tasks from a handful of in-context examples — without any gradient updates. This emergent in-context learning reframed prompting itself as a programming interface. It also exposed, for the first time at industrial scale, the limits of parametric knowledge: the model knew a great deal, but could not be updated, could not cite sources, and confidently produced fluent falsehoods. The community's response to that limitation is the next chapter.
3. Retrieval and memory
3.1 RAG — Retrieval-Augmented Generation
Lewis, P. et al. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. NeurIPS, 2020. arXiv:2005.11401.
RAG closed the gap between parametric and non-parametric knowledge. Instead of relying solely on weights, the system retrieves relevant documents from an external corpus at inference time and conditions generation on them. The canonical pipeline became standard infrastructure: chunk documents into passages, embed them into a vector space, store the embeddings, embed the query, retrieve nearest neighbors, inject retrieved passages into the prompt.
The architectural breakthrough was conceptual rather than technical: knowledge no longer had to live inside the model. It could live beside it, be updated independently, and be cited explicitly. RAG made enterprise AI possible.
3.2 The RAG Survey
Gao, Y. et al. Retrieval-Augmented Generation for Large Language Models: A Survey. 2023. arXiv:2312.10997.
By late 2023, RAG had fragmented into dozens of variants: naïve, advanced, modular, hybrid, hierarchical, recursive. The survey by Gao et al. made one observation explicit that practitioners had begun to feel intuitively: the bottleneck of modern RAG is no longer retrieval, it is representation. Plain text chunks lose the structural relationships — causality, dependency, exception, hierarchy — that make knowledge actually operational. We have termed the class of failures that follows composition hallucination: outputs that contradict information present and retrievable in context because the relations between fragments were never explicit.
3.3 GraphRAG
Edge, D. et al. From Local to Global: A Graph RAG Approach to Query-Focused Summarization. Microsoft Research, 2024. arXiv:2404.16130.
GraphRAG attacked the representation bottleneck head-on by extracting entities and relationships from source documents and indexing them as a knowledge graph. Retrieval then operates over graph neighborhoods rather than isolated chunks, enabling multi-hop reasoning, global summarization, and structural navigation. The paper's deeper contribution is a thesis: plain text is not the optimal substrate for agent reasoning. Once that thesis is accepted, the question becomes which structured substrate to converge on.
3.4 LightRAG and HippoRAG
Guo, Z. et al. LightRAG. 2024. arXiv:2410.05779. Gutiérrez, B. J. et al. HippoRAG. NeurIPS, 2024. arXiv:2405.14831.
LightRAG simplifies the GraphRAG pipeline for practical deployment, while HippoRAG borrows from neuroscience to design a retrieval system that mimics how human episodic memory is organized. Both reinforce the same direction: memory is a structured artifact, not a flat soup of vectors.
4. The agentic turn
4.1 ReAct
Yao, S. et al. ReAct: Synergizing Reasoning and Acting in Language Models. ICLR, 2023. arXiv:2210.03629.
ReAct introduced the simple, powerful idea that reasoning and acting could be interleaved within a single trajectory: Thought → Action → Observation → Thought. By explicitly separating the model's internal deliberation from its external interactions, ReAct produced agents that could plan, query a tool, observe the result, and revise their plan — all within a single inference loop. Modern agentic frameworks are descendants of this pattern.
ReAct also exposed a quiet truth: agents need actionable knowledge, not text. A reasoning trajectory that asks "which procedure applies here?" needs answers shaped as procedures — not paragraphs.
4.2 Toolformer
Schick, T. et al. Toolformer: Language Models Can Teach Themselves to Use Tools. NeurIPS, 2023. arXiv:2302.04761.
Toolformer demonstrated that a language model could learn, from self-supervised data, when to call external APIs, which to invoke, and how to integrate the response. The result became the conceptual foundation for OpenAI's function calling, Anthropic's tool use, and ultimately MCP itself. The lesson generalized: a model does not need to contain all knowledge — it needs to know how to ask.
4.3 Voyager, Generative Agents, and AutoGen
Wang, G. et al. Voyager. 2023. arXiv:2305.16291. Park, J. S. et al. Generative Agents. UIST, 2023. arXiv:2304.03442. Wu, Q. et al. AutoGen. Microsoft, 2023. arXiv:2308.08155.
These three papers triangulated the agentic frontier. Voyager showed that an embodied agent could autonomously expand its own skill library. Generative Agents simulated LLM-driven personas with episodic memory and emergent social behavior. AutoGen generalized the pattern to multi-agent collaboration. The convergent finding is unambiguous: agents need persistent, structured, interoperable memory.
5. Long memory and long context
5.1 MemGPT
Packer, C. et al. MemGPT: Towards LLMs as Operating Systems. 2023. arXiv:2310.08560.
MemGPT recast the LLM as a kernel and the context window as primary memory, with the rest of the agent's history paged in and out of secondary storage. The analogy clarified an entire class of design decisions: which information lives in working memory, which lives in archival memory, when to summarize, when to evict. MemGPT made it acceptable to say out loud that more context is not the answer; better memory architecture is.
5.2 LongMem and Ring Attention
Wang, W. et al. LongMem. 2023. arXiv:2306.07174. Liu, H. et al. Ring Attention. 2023. arXiv:2310.01889.
Both papers converge on the same architectural verdict as MemGPT: brute-force context expansion is expensive and lossy; structured, retrievable memory is the long-run solution.
6. Interoperability and protocols
6.1 Model Context Protocol
Anthropic. Model Context Protocol Specification. modelcontextprotocol.io.
MCP did for tool integration what USB did for peripherals. Before MCP, every agent framework reinvented its own way of describing tools and threading context across calls. MCP defines a single specification for tools, resources, and prompts, allowing any compliant client to talk to any compliant server. Its adoption by Anthropic, OpenAI, and the broader ecosystem made it the de facto integration layer of the agentic web.
MCP standardized how agents access tools at runtime. The question it does not answer is how agents access knowledge — and that is the layer CKF proposes to address.
6.2 The Survey of Agent Interoperability Protocols
Yang, Y. et al. A Survey of Agent Interoperability Protocols. 2024. arXiv:2505.02279.
This survey maps the emerging agentic internet: MCP (model–tool), ACP (agent–context), A2A (agent–agent), ANP (agent network protocol). We are building infrastructure analogous to the early web. None of these protocols, however, defines a portable, machine-native format for the knowledge that agents reason over. That gap is structural.
7. Knowledge representation — the inheritance from the Semantic Web
W3C. RDF 1.1. w3.org/TR/rdf11-concepts. W3C. JSON-LD 1.1. w3.org/TR/json-ld11. Hogan, A. et al. Knowledge Graphs. arXiv:2003.02320.
Long before LLMs, the Semantic Web articulated a vision in which knowledge would be machine-readable, linkable, and interoperable. RDF expressed knowledge as triples; JSON-LD made those triples web-native; the Knowledge Graphs survey synthesized two decades of structured knowledge at scale. The Semantic Web's original ambition was never fully realized — partly because human authors would not write triples by hand, and partly because reasoning over them required brittle ontologies.
LLMs invert both constraints. They can extract triples from unstructured text at scale, and they can reason over structured knowledge fluently when it is given to them in an appropriate form. The CKF package format inherits the structural rigor of RDF while adding agent-shaped primitives (heuristics, playbooks, contextual triggers) that pure triples never modeled.
8. Safety, governance, and trust
OWASP. Top 10 for LLM Applications. owasp.org/www-project-top-10-for-large-language-model-applications. NIST. AI Risk Management Framework. nist.gov/itl/ai-risk-management-framework. Bai, Y. et al. Constitutional AI. Anthropic, 2022. arXiv:2212.08073.
The agentic stack opens a new attack surface: prompt injection, indirect injection through retrieved content, tool abuse, data exfiltration, supply-chain attacks on knowledge sources. OWASP's LLM Top 10 catalogs the threat model; NIST AI RMF provides the governance framework.
The shared implication: a knowledge artifact consumed by an agent is also a governance artifact. It must carry provenance, confidence, source traceability, and explicit knowledge limits — not as optional metadata, but as first-class fields.
9. The compilation turn
9.1 DSPy
Khattab, O. et al. DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines. ICLR, 2024. arXiv:2310.03714.
DSPy argues — and demonstrates — that the right abstraction for LLM systems is compilation, not prompt engineering. Programs are written declaratively as compositions of modules, and an optimizer compiles those modules into concrete prompts and few-shot demonstrations against measurable metrics. The artisanal era of hand-tuned prompts is rendered obsolete the moment one accepts the analogy.
9.2 LMQL, Guidance, and Semantic Kernel
Beurer-Kellner, L. et al. LMQL. PLDI, 2023. arXiv:2212.06094. Microsoft. Guidance. github.com/guidance-ai/guidance. Microsoft. Semantic Kernel. github.com/microsoft/semantic-kernel.
LMQL adds typed constraints and control flow to model interaction; Guidance introduces grammar-constrained generation; Semantic Kernel provides an orchestration layer for skills and planners. All three reinforce DSPy's thesis from different angles: the future of LLM systems is compiled, not prompted. If reasoning pipelines can be compiled, knowledge can be compiled. CKF is the proposed compiled form of knowledge.
9.3 LLM Wiki Pattern — Karpathy
Karpathy, A. LLM Wiki: A pattern for building personal knowledge bases using LLMs. GitHub Gist, April 2026. gist.github.com/karpathy/442a6bf555914893e9891c11519de94f.
Published in April 2026 and reaching over 16 million views in weeks, Karpathy's LLM Wiki pattern articulated the architectural intuition that anchors the CKF proposal: raw sources should function as "source code"; the LLM should act as "compiler"; and the resulting wiki — a persistent, structured, incrementally updated knowledge artifact — should be what the system consults rather than the raw sources themselves. Karpathy's formulation targets human readers assisted by LLMs: the output is Markdown navigable in Obsidian, maintained by a system that ingests, queries, and lints. CKF extends the same compiler analogy toward machine consumers — agents and retrieval systems reading typed, schema-stable packages. The two approaches are complementary, not competing.
10. Synthesis — the proposed missing layer
Let us inventory what the papers above have given us. We have a substrate (Transformers), scaling laws, in-context learning, external retrieval, structured retrieval, agent loops, tool use, multi-agent coordination, persistent memory, long-context kernels, integration protocols, knowledge-graph traditions, governance frameworks, and compilation paradigms.
What we do not yet have is a portable, machine-native, agent-shaped, traceable representation of compiled knowledge — a self-contained artifact that an agent can ingest, reason over, cite, and pass to another agent without losing meaning, provenance, or operational shape.
The gap is structurally similar to the gap that MCP filled. Before MCP, every agent built its own tool format. Today, every agent builds its own knowledge format. A format will converge; whether CKF is that format is the empirical question.
11. CKF as proposed layer
The Compiled Knowledge Format is a proposal for that format. A .ckf package is a structured, validated artifact that represents knowledge in the shape agents use it: entities, concepts, principles, heuristics, decision rules, procedures, patterns, anti-patterns, causal chains, contextual triggers, if-then rules, exceptions, mental models, retrieval chunks, atomic units, and source traceability — 22 typed sections, each addressing a specific failure mode the papers above identified.
Every claim carries a confidence score and a source_basis (explicit, inferred, synthesized, author opinion, uncertain). Every package carries knowledge_limits declaring what it does not know. Every extracted unit traces back to its source location. The package is human-readable as Markdown, machine-readable as JSON, and queryable as a graph.
CKF does not compete with Transformers, RAG, MCP, or DSPy. It composes with them. Transformers reason. RAG retrieves. MCP transports. DSPy compiles pipelines. CKF proposes to compile knowledge. Each layer occupies a distinct slot; the absence of the knowledge layer forces the rest of the stack to over-perform — coercing language models into being knowledge bases, retrievers into being reasoners, and prompts into being specifications.
12. Limitations and study in progress
This argument is structural, not empirical. The case made above is that a knowledge layer would address a gap that the rest of the stack does not address. Whether the CKF implementation of that layer performs better than alternatives at specific tasks is the empirical question.
An initial pilot, run with ten questions in three conditions (PDF raw, TXT raw, CKF), produced near-ceiling scores that did not differentiate between formats. The pilot had two structural limitations: a question battery not calibrated for retrieval pressure, and a single model family serving as both agent and judge. A pre-registered confirmatory study is in preparation with corrections for both, including an independent judge model, smaller context budgets, multi-hop questions, and the COMPGAP benchmark for composition hallucination.
The format specification, reference compiler, and MCP server are open under MIT license at the project repository. The pre-registration and benchmark will be published when the confirmatory study is submitted. Contributions, critiques, and independent replications are welcome via the repository and Discord.
The hypothesis remains under test.
References
- Vaswani et al. Attention Is All You Need. arXiv:1706.03762
- Devlin et al. BERT. arXiv:1810.04805
- Kaplan et al. Scaling Laws. arXiv:2001.08361
- Brown et al. GPT-3. arXiv:2005.14165
- Lewis et al. RAG. arXiv:2005.11401
- Gao et al. RAG Survey. arXiv:2312.10997
- Edge et al. GraphRAG. arXiv:2404.16130
- Guo et al. LightRAG. arXiv:2410.05779
- Gutiérrez et al. HippoRAG. arXiv:2405.14831
- Yao et al. ReAct. arXiv:2210.03629
- Schick et al. Toolformer. arXiv:2302.04761
- Wang et al. Voyager. arXiv:2305.16291
- Park et al. Generative Agents. arXiv:2304.03442
- Wu et al. AutoGen. arXiv:2308.08155
- Packer et al. MemGPT. arXiv:2310.08560
- Wang et al. LongMem. arXiv:2306.07174
- Liu, Zaharia, Abbeel. Ring Attention. arXiv:2310.01889
- Anthropic. Model Context Protocol. modelcontextprotocol.io
- Yang et al. Survey of Agent Interoperability Protocols. arXiv:2505.02279
- W3C. RDF 1.1. w3.org/TR/rdf11-concepts
- W3C. JSON-LD 1.1. w3.org/TR/json-ld11
- Hogan et al. Knowledge Graphs. arXiv:2003.02320
- OWASP. Top 10 for LLM Applications. owasp.org
- NIST. AI Risk Management Framework. nist.gov
- Bai et al. Constitutional AI. arXiv:2212.08073
- Khattab et al. DSPy. arXiv:2310.03714
- Beurer-Kellner et al. LMQL. arXiv:2212.06094
- Karpathy, A. LLM Wiki. gist.github.com/karpathy
- Tomazinho, P. Composition Hallucination. compiledknowledgeformat.org/news