Specification
Extraction pipeline
The protocol does not mandate a specific extractor, but every conformant pipeline must produce the same shape and obey the same source-traceability rules.
Pipeline stages
- Ingestion. Read the source (text, markdown, PDF, transcript) and normalize whitespace, language and encoding.
- Chunking. Split the source into semantically coherent passages. Chunk size depends on
compression_level. - Lift. For each chunk, extract entities, concepts, principles, heuristics and rules with explicit
source_basis. - Cross-link. Resolve
related_*references; reject dangling ids. - Score. Assign
confidenceto every extracted item using the rules in Structuring. - Synthesize retrieval layer. Build
retrieval_chunks,atomic_units,agent_instructionsandknowledge_limits. - Trace. Emit one
source_traceabilityentry per non-synthesized item. - Validate. Run schema + cross-reference + range checks. Fail closed on errors; warn on soft issues.
Source-basis labels
Every extracted item carries one of five labels. The label is mandatory and drives downstream agent behavior:
explicit— the source states it directly.inferred— the extractor combined two or more explicit statements.synthesized— produced by the extractor (e.g. retrieval chunks). Not a claim about the world.author_opinion— the source's stated opinion, not a fact.uncertain— extractor was unsure; agents should treat with caution.
No silent inference
Producing an item without an
explicit source quote requires inferred oruncertain — never explicit. Failing this rule is a conformance error.Compression levels
Four levels control how aggressively the extractor compresses prose into structure:
light— preserves most prose; few inferred items; highhuman_readability.standard— balanced default; recommended for most sources.dense— maximal structure; minimal prose; highai_utility_score.agentic— optimized for autonomous agents; emphasizes playbooks, decision rules and tool guidance.
Reference contract
All conformant extractors expose the same TypeScript signature:
ts
function compileCkf(rawText: string, options: {
sourceType: string;
compressionLevel: "light" | "standard" | "dense" | "agentic";
outputFormat: "markdown" | "json" | "yaml";
language?: string;
}): { pkg: CkfPackage; warnings: string[] };