Documentation

Specification

Structuring rules

Once items are extracted, structuring rules decide how they are named, scored, deduplicated and merged.

ID conventions

  • All ids are kebab-case slugs of ASCII letters, digits and underscores.
  • Each section has a prefix: ent_, con_, pri_, heu_, dec_, proc_, pat_, ant_, cau_, ctx_, ift_, exc_, mm_, pb_, qa_, chk_, atm_.
  • Ids must be unique within their section and stable across re-compilations of the same source.
  • The package_id is global; recommended pattern is <source-slug>-<version>.

Confidence scoring

Confidence is a float in [0, 1] with two decimal places. The protocol defines five bands; see also the Protocol page.

  • 0.90 – 1.00 — the source states the claim directly and unambiguously.
  • 0.75 – 0.89 — the source supports the claim with mild interpretation.
  • 0.50 – 0.74 — inferred by combining two or more explicit statements.
  • 0.25 – 0.49 — weak inference; agents should treat as a hypothesis.
  • 0.00 – 0.24 — extractor uncertain; emit only if useful, mark source_basis: uncertain.

Normalization

  • Strings are NFC-normalized; whitespace collapsed to single spaces.
  • Languages follow BCP-47 (en, pt-BR, …).
  • Domains use the closed enum from the schema; novel topics go in subdomains.
  • Singular labels for entities and concepts; verb-led labels for procedures and playbooks.

Deduplication and merging

When two extractions produce overlapping items:

  • Merge by canonical id. If ids differ but labels are aliases, prefer the higher-confidence record and add the other label to aliases.
  • When confidences disagree, keep the higher value and lower it by the standard deviation between sources, capped at the higher of the two.
  • When source-basis disagrees, downgrade conservatively: explicit + inferredinferred.

Source traceability

Every non-synthesized item must have at least one matching source_traceability entry whose extraction_type equals the item's source_basis. Synthesized items (e.g. retrieval chunks) may omit it.

Auditability is a feature

The traceability section is what makes a CKF package auditable by humans and verifiable by other agents. Treat it as required, not optional.