Back to news
Research22 min read

Agentic AI, RAG, MCP and Compiled Knowledge: Why CKF Is the Inevitable Next Layer

A scientific walk through the fourteen most influential papers behind modern AI — from Attention Is All You Need to DSPy — and the case for CKF as the missing layer of compiled knowledge.

P
Paulo TomazinhoKCP Research
May 15, 2026

Abstract

Between 2017 and 2025, artificial intelligence stopped being a story about models and became a story about systems. Transformers gave machines the ability to learn language at scale; retrieval-augmented generation (RAG) connected them to external knowledge; agent frameworks taught them to act; the Model Context Protocol (MCP) standardized how they talk to tools. Yet across this entire stack a single layer remains conspicuously missing: a portable, machine-native representation of compiled knowledge — the operational, structured, traceable substrate that agents must reason over. In this article we trace the fourteen most influential papers and specifications that brought us here, and argue that the Compiled Knowledge Format (CKF) is the natural, and now inevitable, next layer of this emerging cognitive infrastructure.


1. Introduction — from the era of models to the era of systems

If the 2017 revolution was “models can learn language,” the 2025 revolution is “models can operate systems.” A modern AI deployment is no longer a single weight checkpoint serving completions. It is a composite of retrievers, vector stores, tool routers, agent loops, memory subsystems, evaluators, and governance layers. Each of these components emerged from a specific paper that solved a specific bottleneck — and each, in solving its problem, exposed the next.

The arc is cumulative. Attention removed sequential bottlenecks. Pre-training removed task-specific bottlenecks. Scale removed capability bottlenecks. RAG removed the freshness bottleneck. ReAct removed the action bottleneck. MCP removed the integration bottleneck. What remains, in 2025, is the knowledge bottleneck: agents are forced to reason over text fragments that were never designed for them. We will revisit each milestone in turn, then make the case for CKF as the missing layer.


2. Foundations

2.1 Attention Is All You Need

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. Attention Is All You Need. NeurIPS, 2017. arXiv:1706.03762.

Before 2017, sequence modeling was dominated by recurrent architectures — RNNs, LSTMs, GRUs — that processed text token by token. This sequentiality imposed three structural costs: training did not parallelize well on GPUs, long-range dependencies decayed through hidden-state compression, and a single recurrent vector had to carry the entire history of a sequence. Vaswani and colleagues proposed an architecture that abandoned recurrence entirely: every token could attend to every other token in a single, fully parallel operation.

The mechanism, self-attention, decomposes each token into three projections — Query, Key, and Value — and computes a weighted combination of Values based on the dot-product similarity between Queries and Keys. Conceptually, each token asks "what am I looking for?" (Query), advertises "what do I offer?" (Key), and contributes "this is what I carry" (Value). The result is a representation in which any token's meaning is contextualized by every other token in the sequence, and the entire computation collapses into matrix multiplications that GPUs execute efficiently.

The paper's downstream impact is difficult to overstate: GPT, Claude, Gemini, Llama, Mistral, and every modern frontier model descend directly from this design. For our purposes, however, what matters is what Transformers do not do. They are exquisite pattern compressors, but they have no native mechanism for persistent memory, no explicit knowledge store, and no built-in provenance. The CKF layer begins exactly at this boundary.

2.2 BERT — Bidirectional Pre-training

Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. NAACL, 2019. arXiv:1810.04805.

BERT generalized the Transformer encoder by training it on a masked language modeling objective: predict a hidden token given both its left and right context. This bidirectionality yielded representations that transferred remarkably well across downstream tasks — classification, named-entity recognition, question answering — through lightweight fine-tuning. BERT established the template that still governs modern NLP: pre-train once on enormous unlabeled corpora, then specialize cheaply. It also normalized the idea that language understanding is a representation problem, not a task-engineering problem.

2.3 Scaling Laws

Kaplan, J., McCandlish, S., Henighan, T., Brown, T. B., Chess, B., Child, R., Gray, S., Radford, A., Wu, J., & Amodei, D. Scaling Laws for Neural Language Models. 2020. arXiv:2001.08361.

Kaplan et al. showed that loss decreases as a smooth power law in three quantities — parameters, data, and compute — and that improvements remain predictable across many orders of magnitude. The practical consequence was a license to scale: investing in larger models with proportionally more data was no longer speculative, it was empirically grounded. The paper turned model development from a research bet into an engineering roadmap and made the trillion-parameter regime an explicit design target.

2.4 GPT-3 — Language Models are Few-Shot Learners

Brown, T. B., Mann, B., Ryder, N., Subbiah, M., et al. Language Models are Few-Shot Learners. NeurIPS, 2020. arXiv:2005.14165.

GPT-3 demonstrated that, beyond a certain scale, models acquired the ability to perform new tasks from a handful of in-context examples — without any gradient updates. This emergent in-context learning reframed prompting itself as a programming interface. It also exposed, for the first time at industrial scale, the limits of parametric knowledge: the model knew a great deal, but could not be updated, could not cite sources, and confidently produced fluent falsehoods. The community's response to that limitation is the next chapter.


3. Retrieval and memory

3.1 RAG — Retrieval-Augmented Generation

Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W., Rocktäschel, T., Riedel, S., & Kiela, D. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. NeurIPS, 2020. arXiv:2005.11401.

RAG closed the gap between parametric and non-parametric knowledge. Instead of relying solely on weights, the system retrieves relevant documents from an external corpus at inference time and conditions generation on them. The canonical pipeline became standard infrastructure: chunk documents into passages, embed them into a vector space, store the embeddings in a vector database, embed the user query, retrieve the nearest neighbors, and inject the retrieved passages into the prompt.

The architectural breakthrough was conceptual rather than technical: knowledge no longer had to live inside the model. It could live beside it, be updated independently, and be cited explicitly. RAG made enterprise AI possible.

3.2 The RAG Survey

Gao, Y., Xiong, Y., Gao, X., Jia, K., Pan, J., Bi, Y., Dai, Y., Sun, J., Wang, M., & Wang, H. Retrieval-Augmented Generation for Large Language Models: A Survey. 2023. arXiv:2312.10997.

By late 2023, RAG had fragmented into dozens of variants: naïve, advanced, modular, hybrid, hierarchical, recursive. The survey by Gao et al. systematized this landscape and made one observation explicit that practitioners had begun to feel intuitively: the bottleneck of modern RAG is no longer retrieval, it is representation. Plain text chunks, retrieved by semantic similarity, lose the structural relationships — causality, dependency, exception, hierarchy — that make knowledge actually operational.

3.3 GraphRAG

Edge, D., Trinh, H., Cheng, N., Bradley, J., Chao, A., Mody, A., Truitt, S., & Larson, J. From Local to Global: A Graph RAG Approach to Query-Focused Summarization. Microsoft Research, 2024. arXiv:2404.16130.

GraphRAG attacked the representation bottleneck head-on by extracting entities and relationships from source documents and indexing them as a knowledge graph. Retrieval then operates over graph neighborhoods rather than isolated chunks, enabling multi-hop reasoning, global summarization, and structural navigation. The paper's deeper contribution is a thesis: plain text is not the optimal substrate for agent reasoning. Once that thesis is accepted, the question becomes which structured substrate to converge on — the precise question CKF exists to answer.

3.4 LightRAG and HippoRAG

Guo, Z. et al. LightRAG: Simple and Fast Retrieval-Augmented Generation. 2024. arXiv:2410.05779. Gutiérrez, B. J. et al. HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models. NeurIPS, 2024. arXiv:2405.14831.

LightRAG simplifies the GraphRAG pipeline for practical deployment, while HippoRAG borrows from neuroscience — specifically, the hippocampal indexing theory of memory consolidation — to design a retrieval system that mimics how human episodic memory is organized. Both reinforce the same direction: memory is a structured artifact, not a flat soup of vectors.


4. The agentic turn

4.1 ReAct

Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., & Cao, Y. ReAct: Synergizing Reasoning and Acting in Language Models. ICLR, 2023. arXiv:2210.03629.

ReAct introduced the simple, devastating idea that reasoning and acting could be interleaved within a single trajectory: Thought → Action → Observation → Thought. By explicitly separating the model's internal deliberation from its external interactions, ReAct produced agents that could plan, query a tool, observe the result, and revise their plan — all within a single inference loop. Modern agentic frameworks (LangGraph, CrewAI, OpenAI's Assistants, Anthropic's tool use) are descendants of this pattern.

ReAct also exposed a quiet truth: agents need actionable knowledge, not text. A reasoning trajectory that asks "which procedure applies here?" or "is this a known anti-pattern?" needs answers shaped as procedures and anti-patterns — not as paragraphs.

4.2 Toolformer

Schick, T., Dwivedi-Yu, J., Dessì, R., Raileanu, R., Lomeli, M., Zettlemoyer, L., Cancedda, N., & Scialom, T. Toolformer: Language Models Can Teach Themselves to Use Tools. NeurIPS, 2023. arXiv:2302.04761.

Toolformer demonstrated that a language model could learn, from self-supervised data, when to call external APIs, which API to invoke, and how to integrate the response. The result was a generalizable mechanism that became the conceptual foundation for OpenAI's function calling, Anthropic's tool use, and ultimately MCP itself. The lesson generalized: a model does not need to contain all knowledge — it needs to know how to ask.

4.3 Voyager, Generative Agents, and AutoGen

Wang, G. et al. Voyager: An Open-Ended Embodied Agent with Large Language Models. 2023. arXiv:2305.16291. Park, J. S. et al. Generative Agents: Interactive Simulacra of Human Behavior. UIST, 2023. arXiv:2304.03442. Wu, Q. et al. AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation. Microsoft, 2023. arXiv:2308.08155.

These three papers triangulated the agentic frontier. Voyager showed that an embodied agent in Minecraft could autonomously expand its own skill library through exploration. Generative Agents simulated a small town of LLM-driven personas with episodic memory, daily reflection, and emergent social behavior. AutoGen generalized the pattern to multi-agent collaboration, where specialized agents — researcher, critic, coder, planner — coordinate around a shared task.

The convergent finding is unambiguous: agents need persistent, structured, interoperable memory. A society of agents cannot share a vector database of one another's chat logs and call that knowledge management.


5. Long memory and long context

5.1 MemGPT

Packer, C., Wooders, S., Lin, K., Fang, V., Patil, S. G., Stoica, I., & Gonzalez, J. E. MemGPT: Towards LLMs as Operating Systems. 2023. arXiv:2310.08560.

MemGPT recast the LLM as a kernel and the context window as primary memory, with the rest of the agent's history paged in and out of secondary storage. The analogy — the LLM is an operating system — clarified an entire class of design decisions: which information lives in working memory, which lives in archival memory, when to summarize, when to evict, when to recall. MemGPT made it acceptable to say out loud that more context is not the answer; better memory architecture is.

5.2 LongMem and Ring Attention

Wang, W. et al. Augmenting Language Models with Long-Term Memory (LongMem). 2023. arXiv:2306.07174. Liu, H., Zaharia, M., & Abbeel, P. Ring Attention with Blockwise Transformers for Near-Infinite Context. 2023. arXiv:2310.01889.

Both papers attack the context-length problem from different angles — augmenting models with retrieval-based long-term stores, or scaling attention itself across distributed devices. They converge on the same architectural verdict as MemGPT: brute-force context expansion is expensive and lossy; structured, retrievable memory is the long-run solution.


6. Interoperability and protocols

6.1 Model Context Protocol

Anthropic. Model Context Protocol Specification, version 2025-03-26. modelcontextprotocol.io/specification.

MCP did for tool integration what USB did for peripherals. Before MCP, every agent framework reinvented its own way of describing tools, exposing resources, and threading context across calls — producing a combinatorial mess of bespoke connectors. MCP defines a single specification for tools, resources, and prompts, allowing any compliant client to talk to any compliant server. The protocol's adoption by Anthropic, OpenAI, and the broader ecosystem in 2024-2025 made it the de facto integration layer of the agentic web.

MCP standardized how agents access tools at runtime. The unanswered question is how agents access knowledge — and that is precisely the layer CKF defines.

6.2 The Survey of Agent Interoperability Protocols

Yang, Y. et al. A Survey of Agent Interoperability Protocols: MCP, ACP, A2A and ANP. 2024. arXiv:2505.02279.

This survey maps the emerging agentic internet: MCP (model–tool), ACP (agent–context), A2A (agent–agent), ANP (agent network protocol). The picture that emerges is unmistakable — we are building infrastructure analogous to the early web, complete with its own equivalents of HTTP, DNS, and content negotiation. None of these protocols, however, defines a portable, machine-native format for the knowledge that agents reason over. That gap is structural.


7. Knowledge representation — the inheritance from the Semantic Web

W3C. RDF 1.1 Concepts and Abstract Syntax. w3.org/TR/rdf11-concepts. W3C. JSON-LD 1.1. w3.org/TR/json-ld11. Hogan, A. et al. Knowledge Graphs. ACM Computing Surveys, 2021. arXiv:2003.02320.

Long before LLMs, the Semantic Web articulated a vision in which knowledge would be machine-readable, linkable, and interoperable. RDF expressed knowledge as triples (subject–predicate–object); JSON-LD made those triples web-native; the Knowledge Graphs survey synthesized two decades of work on structured knowledge at scale. The Semantic Web's original ambition was never fully realized — partly because human authors would not write triples by hand, and partly because reasoning over them required brittle ontologies.

LLMs invert both constraints. They can extract triples from unstructured text at scale, and they can reason over structured knowledge fluently when it is given to them in a form they can ingest. The CKF package format is, in this sense, the Semantic Web after the LLM — it inherits the structural rigor of RDF while embracing the agent-shaped primitives (heuristics, playbooks, contextual triggers) that pure triples never modeled.


8. Safety, governance, and trust

OWASP. Top 10 for LLM Applications. genai.owasp.org. NIST. AI Risk Management Framework. nist.gov/itl/ai-risk-management-framework. Bai, Y. et al. Constitutional AI: Harmlessness from AI Feedback. Anthropic, 2022. arXiv:2212.08073.

The agentic stack opens a new attack surface: prompt injection, indirect injection through retrieved content, tool abuse, data exfiltration through agent loops, supply-chain attacks on knowledge sources. OWASP's LLM Top 10 catalogs the threat model; NIST AI RMF provides the governance framework; Constitutional AI proposes a training-time mechanism for embedding behavioral principles into model weights themselves.

The shared implication: a knowledge artifact consumed by an agent is also a governance artifact. It must carry provenance, confidence, source traceability, and explicit knowledge limits — not as optional metadata, but as first-class fields. Any format that cannot encode these fields is unsafe by construction.


9. The compilation turn

9.1 DSPy

Khattab, O., Singhvi, A., Maheshwari, P., Zhang, Z., Santhanam, K., Vardhamanan, S., Haq, S., Sharma, A., Joshi, T. T., Moazam, H., Miller, H., Zaharia, M., & Potts, C. DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines. ICLR, 2024. arXiv:2310.03714.

DSPy is the most consequential paper for the CKF thesis. It argues — and demonstrates — that the right abstraction for LLM systems is compilation, not prompt engineering. Programs are written declaratively as compositions of modules (Predict, ChainOfThought, Retrieve), and an optimizer compiles those modules into concrete prompts and few-shot demonstrations against measurable metrics. The artisanal era of hand-tuned prompts is rendered obsolete the moment one accepts the analogy.

9.2 LMQL, Guidance, and Semantic Kernel

Beurer-Kellner, L., Fischer, M., & Vechev, M. LMQL: A Programming Language for Large Language Models. PLDI, 2023. arXiv:2212.06094. Microsoft. Guidance. github.com/guidance-ai/guidance. Microsoft. Semantic Kernel. github.com/microsoft/semantic-kernel.

LMQL adds typed constraints and control flow to model interaction; Guidance introduces grammar-constrained generation; Semantic Kernel provides an orchestration layer for skills and planners. All three reinforce DSPy's thesis from different angles: the future of LLM systems is compiled, not prompted.

If reasoning pipelines can be compiled, knowledge can be compiled. CKF is the compiled form of knowledge.


10. Synthesis — the missing layer

Let us inventory what the fourteen papers above have given us. We have a substrate (Transformers), scale laws, in-context learning, external retrieval, structured retrieval, agent loops, tool use, multi-agent coordination, persistent memory, long-context kernels, integration protocols, knowledge-graph traditions, governance frameworks, and compilation paradigms.

What we do not have is a portable, machine-native, agent-shaped, traceable representation of compiled knowledge. RAG retrieves text. GraphRAG retrieves triples. MCP transports tool calls. None of them defines what a knowledge package looks like — a self-contained artifact that an agent can ingest, reason over, cite, and pass to another agent without losing meaning, provenance, or operational shape.

The gap is identical in structure to the gap that MCP filled. Before MCP, every agent built its own tool format. Today, every agent builds its own knowledge format. The pattern of resolution is therefore predictable: a format will converge, and the format that converges will be the one that takes all of the lessons above seriously at once.


11. CKF as the inevitable layer

The Compiled Knowledge Format is our proposal for that format. A .kcp package is a structured, validated artifact that represents knowledge in the shape agents actually use it: entities, concepts, principles, heuristics, decision rules, procedures, patterns, anti-patterns, causal chains, contextual triggers, if-then rules, exceptions, mental models, playbooks, Q&A pairs, retrieval chunks, atomic units, agent instructions, knowledge limits, and source traceability — twenty-three fields, each addressing a specific failure mode the papers above identified.

The format is opinionated by design. Every claim carries a confidence score and a source_basis (explicit, inferred, synthesized, author opinion, uncertain). Every package carries knowledge_limits declaring what it does not know. Every extracted unit traces back to its source location. The package is human-readable as Markdown, machine-readable as JSON, and queryable as a graph — without the user having to choose.

For a granular, operational comparison against the document format the world currently misuses for this purpose, see PDF vs .kcp, which contrasts the two across forty-five dimensions of structure, retrieval, agent use, and reasoning. For the philosophical argument behind the protocol, see the Manifesto. For a working compiler that turns unstructured sources into validated .kcp packages, see the v0.1 demo.

We do not claim CKF competes with Transformers, RAG, MCP, or DSPy. It composes with them. Transformers reason. RAG retrieves. MCP transports. DSPy compiles pipelines. CKF compiles knowledge. Each layer occupies a distinct slot in the cognitive stack, and the absence of the knowledge layer is precisely what forces the rest of the stack to over-perform — coercing language models into being knowledge bases, retrievers into being reasoners, and prompts into being specifications.

The history we have just traced converges on a single conclusion. Compiled knowledge is not a feature; it is infrastructure. The format that fills this slot will become as foundational as HTTP, as MCP, as the Transformer itself. We believe .kcp is that format, and we have built the open specification, the reference compiler, and the ecosystem so that the community can challenge, extend, and ratify it together.

The era of models gave us language. The era of systems is giving us agents. The era of compiled knowledge is what makes those agents trustworthy.


12. References

  1. Vaswani et al. Attention Is All You Need. arXiv:1706.03762.
  2. Devlin et al. BERT. arXiv:1810.04805.
  3. Kaplan et al. Scaling Laws for Neural Language Models. arXiv:2001.08361.
  4. Brown et al. Language Models are Few-Shot Learners. arXiv:2005.14165.
  5. Lewis et al. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. arXiv:2005.11401.
  6. Gao et al. Retrieval-Augmented Generation for Large Language Models: A Survey. arXiv:2312.10997.
  7. Edge et al. From Local to Global: A Graph RAG Approach. arXiv:2404.16130.
  8. Guo et al. LightRAG. arXiv:2410.05779.
  9. Gutiérrez et al. HippoRAG. arXiv:2405.14831.
  10. Yao et al. ReAct. arXiv:2210.03629.
  11. Schick et al. Toolformer. arXiv:2302.04761.
  12. Wang et al. Voyager. arXiv:2305.16291.
  13. Park et al. Generative Agents. arXiv:2304.03442.
  14. Wu et al. AutoGen. arXiv:2308.08155.
  15. Packer et al. MemGPT. arXiv:2310.08560.
  16. Wang et al. LongMem. arXiv:2306.07174.
  17. Liu, Zaharia, Abbeel. Ring Attention. arXiv:2310.01889.
  18. Anthropic. Model Context Protocol Specification. modelcontextprotocol.io.
  19. Yang et al. Survey of Agent Interoperability Protocols. arXiv:2505.02279.
  20. W3C. RDF 1.1 Concepts and Abstract Syntax.
  21. W3C. JSON-LD 1.1.
  22. Hogan et al. Knowledge Graphs. arXiv:2003.02320.
  23. OWASP. Top 10 for LLM Applications.
  24. NIST. AI Risk Management Framework.
  25. Bai et al. Constitutional AI. arXiv:2212.08073.
  26. Khattab et al. DSPy. arXiv:2310.03714.
  27. Beurer-Kellner et al. LMQL. arXiv:2212.06094.
transformersraggraphragreactmcpmemgptdspykcpagentsknowledge-representation

Continue reading