Specification
Structuring rules
Once items are extracted, structuring rules decide how they are named, scored, deduplicated and merged.
ID conventions
- All ids are kebab-case slugs of ASCII letters, digits and underscores.
- Each section has a prefix:
ent_,con_,pri_,heu_,dec_,proc_,pat_,ant_,cau_,ctx_,ift_,exc_,mm_,pb_,qa_,chk_,atm_. - Ids must be unique within their section and stable across re-compilations of the same source.
- The
package_idis global; recommended pattern is<source-slug>-<version>.
Confidence scoring
Confidence is a float in [0, 1] with two decimal places. The protocol defines five bands; see also the Protocol page.
0.90 – 1.00— the source states the claim directly and unambiguously.0.75 – 0.89— the source supports the claim with mild interpretation.0.50 – 0.74— inferred by combining two or more explicit statements.0.25 – 0.49— weak inference; agents should treat as a hypothesis.0.00 – 0.24— extractor uncertain; emit only if useful, marksource_basis: uncertain.
Normalization
- Strings are NFC-normalized; whitespace collapsed to single spaces.
- Languages follow BCP-47 (
en,pt-BR, …). - Domains use the closed enum from the schema; novel topics go in
subdomains. - Singular labels for entities and concepts; verb-led labels for procedures and playbooks.
Deduplication and merging
When two extractions produce overlapping items:
- Merge by canonical
id. If ids differ but labels are aliases, prefer the higher-confidence record and add the other label toaliases. - When confidences disagree, keep the higher value and lower it by the standard deviation between sources, capped at the higher of the two.
- When source-basis disagrees, downgrade conservatively:
explicit+inferred⇒inferred.
Source traceability
Every non-synthesized item must have at least one matching source_traceability entry whose extraction_type equals the item's source_basis. Synthesized items (e.g. retrieval chunks) may omit it.
Auditability is a feature
The traceability section is what makes a CKF package auditable by humans and verifiable by other agents. Treat it as required, not optional.