Documentation

Specification

Extraction pipeline

The protocol does not mandate a specific extractor, but every conformant pipeline must produce the same shape and obey the same source-traceability rules.

Pipeline stages

  1. Ingestion. Read the source (text, markdown, PDF, transcript) and normalize whitespace, language and encoding.
  2. Chunking. Split the source into semantically coherent passages. Chunk size depends on compression_level.
  3. Lift. For each chunk, extract entities, concepts, principles, heuristics and rules with explicit source_basis.
  4. Cross-link. Resolve related_* references; reject dangling ids.
  5. Score. Assign confidence to every extracted item using the rules in Structuring.
  6. Synthesize retrieval layer. Build retrieval_chunks, atomic_units, agent_instructions and knowledge_limits.
  7. Trace. Emit one source_traceability entry per non-synthesized item.
  8. Validate. Run schema + cross-reference + range checks. Fail closed on errors; warn on soft issues.

Source-basis labels

Every extracted item carries one of five labels. The label is mandatory and drives downstream agent behavior:

  • explicit — the source states it directly.
  • inferred — the extractor combined two or more explicit statements.
  • synthesized — produced by the extractor (e.g. retrieval chunks). Not a claim about the world.
  • author_opinion — the source's stated opinion, not a fact.
  • uncertain — extractor was unsure; agents should treat with caution.

No silent inference

Producing an item without an explicit source quote requires inferred oruncertain — never explicit. Failing this rule is a conformance error.

Compression levels

Four levels control how aggressively the extractor compresses prose into structure:

  • light — preserves most prose; few inferred items; high human_readability.
  • standard — balanced default; recommended for most sources.
  • dense — maximal structure; minimal prose; high ai_utility_score.
  • agentic — optimized for autonomous agents; emphasizes playbooks, decision rules and tool guidance.

Reference contract

All conformant extractors expose the same TypeScript signature:

ts
function compileCkf(rawText: string, options: {
  sourceType: string;
  compressionLevel: "light" | "standard" | "dense" | "agentic";
  outputFormat: "markdown" | "json" | "yaml";
  language?: string;
}): { pkg: CkfPackage; warnings: string[] };