Specification

Structuring rules

Once items are extracted, structuring rules decide how they are named, scored, deduplicated and merged.

ID conventions

All ids are kebab-case slugs of ASCII letters, digits and underscores.
Each section has a prefix: ent_, con_, pri_, heu_, dec_, proc_, pat_, ant_, cau_, ctx_, ift_, exc_, mm_, pb_, qa_, chk_, atm_.
Ids must be unique within their section and stable across re-compilations of the same source.
The package_id is global; recommended pattern is <source-slug>-<version>.

Confidence scoring

Confidence is a float in [0, 1] with two decimal places. The protocol defines five bands; see also the Protocol page.

0.90 – 1.00 — the source states the claim directly and unambiguously.
0.75 – 0.89 — the source supports the claim with mild interpretation.
0.50 – 0.74 — inferred by combining two or more explicit statements.
0.25 – 0.49 — weak inference; agents should treat as a hypothesis.
0.00 – 0.24 — extractor uncertain; emit only if useful, mark source_basis: uncertain.

Normalization

Strings are NFC-normalized; whitespace collapsed to single spaces.
Languages follow BCP-47 (en, pt-BR, …).
Domains use the closed enum from the schema; novel topics go in subdomains.
Singular labels for entities and concepts; verb-led labels for procedures and playbooks.

Deduplication and merging

When two extractions produce overlapping items:

Merge by canonical id. If ids differ but labels are aliases, prefer the higher-confidence record and add the other label to aliases.
When confidences disagree, keep the higher value and lower it by the standard deviation between sources, capped at the higher of the two.
When source-basis disagrees, downgrade conservatively: explicit + inferred ⇒ inferred.

Source traceability

Every non-synthesized item must have at least one matching source_traceability entry whose extraction_type equals the item's source_basis. Synthesized items (e.g. retrieval chunks) may omit it.

Auditability is a feature

The traceability section is what makes a CKF package auditable by humans and verifiable by other agents. Treat it as required, not optional.