SKI conformance methodology¶

License: CC BY 4.0. See LICENSE-docs.md. Status: Provenance + Durability runnable today. Sovereignty: all six checks runnable — the tamper-resistance rig needs a throwaway Postgres (SKI_L3_LEDGER_DSN), and the air-gapped boot rig needs Docker plus the SKI_L3_AIRGAP=1 opt-in. Both skip cleanly when their infrastructure is absent.

This document defines what it means for an implementation to claim SKI Framework v3.0 conformance. It complements the runnable test suite under conformance/.

Why conformance matters¶

The SKI Framework's business model depends on the conformance levels being operationally meaningful. Without an executable, third-party verifiable test suite, "Provenance conformant" is marketing copy. With one, it is a contract: a regulator, a procurement officer, or an auditor can run pytest conformance -q -m provenance against a deployment and get a binary answer.

The W3C uses conformance test suites for every spec. Khronos uses one for OpenGL. The Apache Foundation uses one for Cassandra. SKI uses one for the same reason: it is the only way to keep the spec honest.

Three levels of verifiable provenance¶

v3 reorganises the conformance levels around what the audit trail actually proves. The three levels are cumulative — Durability requires Provenance, Sovereignty requires both.

Level 1 — Provenance¶

Every verdict carries a complete, verifier-checked provenance record. The envelope shape matches the spec contract, the Symbolic Verifier ran and emitted one of the four canonical statuses, the LLM's formalizable assertions are tied to KG citations, and the agreement monitor is mounted.

Requirement	Spec	Test
`V3VerdictEnvelope` declares all seven required fields	§4.2	`provenance/test_v3_envelope_shape.py`
`ModelProvenance` declares all six required hash fields	§4.6	`provenance/test_v3_envelope_shape.py`
`VerifierStatus` enum has all four spec values	§4.5	`provenance/test_v3_envelope_shape.py`
`KGCitationRole` lists the five spec roles	§4.3	`provenance/test_v3_envelope_shape.py`
`SymbolicVerifier` module exists and handles the minimum predicates	§5.3	`provenance/test_v3_verifier_contract.py`
Risk-tier policy module exists with the three tiers	§5.4	`provenance/test_v3_verifier_contract.py`
Five-verdict taxonomy in schema + V3Verdict enum	§4.1	`provenance/test_verdict_taxonomy.py`
No `confidence_level` column anywhere	Axiom 2	`provenance/test_no_confidence.py`
Agreement monitor is mounted and exposes the snapshot keys	§7.2	`provenance/test_agreement_monitor.py`
`NULL_STALE` is produced when the freshness gate fails	§4.1	`provenance/test_null_stale_routing.py`
Window predicates (count / sum / avg) produce correct verdicts	§5.3	`provenance/test_window_predicates.py`

Level 2 — Durability¶

Provenance is signed, replayable, and audit-chained. The KG carries an ed25519 signature that the loader requires; the risk-tier governor reads tiers from the KG (strict, no caller influence); the audit ledger is append-only at the DB layer; the hash chain verifies via entry recomputation, not just linkage; and the replay primitive can reproduce historical verdicts.

Requirement	Spec	Test
KG loader refuses unsigned KGs (no escape hatch)	§3	`durability/test_signed_kg_required.py`
Tag Registry package is a pure dict lookup (no fuzzy/LLM)	§5.4	`durability/test_risk_tier_governor.py`
`RiskTierGovernor` is the authoritative tier source	§5.4	`durability/test_risk_tier_governor.py`
`MeasurementRecord` has NO caller-settable `risk_tier`	§5.4	`durability/test_risk_tier_governor.py`
Demo KGs route every subject via `applies_to` edges; telemetry has no `rule_id`	B4.3 (v3 form)	`durability/test_risk_tier_governor.py`
Append-only DB triggers (UPDATE / DELETE / TRUNCATE)	§6	`durability/test_append_only.py`
Hash-chain integrity recomputes entry hash from canonical payload	§6	`durability/test_ledger_integrity.py`
Replay primitive ships with strict mode + CLI flags	§6	`durability/test_replay_determinism.py`
Three identical evaluations against a live runtime match	Axiom 2	`durability/test_replay_three_evaluations.py`
`coverage_register` view scopes to NULL verdicts only	§6	`durability/test_coverage_register.py`

Level 3 — Sovereignty¶

The runtime is operable air-gapped, tamper-evident, and end-to-end signed. All six checks below are runnable. Four are black-box structural checks (with matching functional proofs in the runtime test suite); the remaining two carry their own infrastructure rigs: tamper-resistance seeds and attacks a throwaway Postgres (SKI_L3_LEDGER_DSN), and air-gapped boot runs the full runtime + ledger in a loopback-only --network=none namespace (Docker + SKI_L3_AIRGAP=1).

Requirement	Spec	Test	Status
No outbound HTTP during CLEAR-path evaluation (local LLM)	Pillar S	`sovereignty/test_no_outbound_calls.py` (+ runtime `v3/tests/test_no_egress.py`)	✅ runnable
Runtime boots, serves, and persists with `--network=none`	Pillar S	`sovereignty/test_air_gapped.py`	✅ runnable (Docker + `SKI_L3_AIRGAP=1`)
Modified ledger row fails `verify_integrity` even after chain forward	§6	`sovereignty/test_tamper_resistance.py`	✅ runnable (throwaway Postgres via `SKI_L3_LEDGER_DSN`)
Runtime refuses to start with `SKI_MODEL_WORKERS != 1`	Concurrency	`sovereignty/test_single_worker.py`	✅ runnable
Recorded transcript carries the snapshot's `scope` block	§3.6 + §6	`sovereignty/test_jurisdiction_scope_captured.py`	✅ runnable
Recorded `LLMTranscript` ed25519 signature verifies	§4.7	`sovereignty/test_signed_llm_transcript.py` (+ runtime `test_signing.py`, `test_transcript.py`)	✅ runnable

Running the tests¶

bash pip install -r requirements-dev.txt pytest conformance/ -m provenance pytest conformance/ -m durability pytest conformance/ -m sovereignty # all 6 checks; the two rig-backed ones # skip without their infrastructure: # SKI_L3_LEDGER_DSN=... tamper rig (throwaway Postgres) # SKI_L3_AIRGAP=1 air-gap rig (Docker)

Against a live deployment:

bash pytest conformance/ -m provenance \ --ski-endpoint https://localhost:8000 \ --api-key "$SKI_API_KEY" \ --ledger-dsn "$LEDGER_DSN"

Each test prints the spec section it validates on failure. Run results can be exported as JUnit XML and rendered into a conformance badge in your project's README.

Attestation flow¶

For self-asserted conformance:

Run pytest conformance -q -m provenance (and -m durability).
Publish the JUnit XML alongside your release.
Include a Conformance: SKI Provenance (self-asserted) or Conformance: SKI Durability (self-asserted) line in the release notes.

For attested conformance (Sovereignty level), engage KpiFinity for a third-party audit. The audit process replays the conformance suite against your deployment under controlled conditions and issues a signed certificate referencing the spec version, the suite revision, and the deployment artefacts. The certificate is the asset regulators recognise.

Versioning¶

The conformance suite is versioned with the spec. The mapping between spec version and test suite revision is documented in conformance/CHANGELOG.md. An implementation claiming "SKI v3.0 Provenance conformant" must pass the test suite at the revision matching the spec.

Contributing tests¶

New tests are among the highest-leverage contributions you can make, especially hardening the Sovereignty rigs (tamper-resistance, air-gapped boot) against new attack shapes. See conformance/README.md for the conventions (one spec citation per test; black-box only; no dependency on the reference implementation's internals).