P‑AIProject‑AI
Reviewer surface

Threat model

Adversary classes, trust boundaries, assumed-broken components, and explicit non-goals for the Project-AI governance kernel.

posture

Posture

Project-AI assumes the model is untrusted, the operator may be coerced, and the network is hostile. The governance kernel is the only component permitted to emit ALLOW. Every other component is treated as a potential attacker until proven otherwise by a signed verdict.

adversaries

Adversary model

We model adversaries by capability, not by intent. Each class below describes what the adversary can do, not who they are.

A1 — Compromised model

IN SCOPE

The model can produce arbitrary output, including outputs crafted to evade post-hoc filters. The kernel must adjudicate the action, not the text.

A2 — Compromised client / prompt injector

IN SCOPE

Untrusted input can reach the model via any channel. Capability tokens are bound to the request envelope, not to the prompt.

A3 — Compromised operator workstation

IN SCOPE

An operator endpoint may be malware-controlled. AC4/AC5 actions require multi-party signed approval; a single operator key cannot escalate.

A4 — Network-level adversary

IN SCOPE

TLS termination, MITM, and replay are assumed possible. All decisions carry signed envelopes and TSA timestamps; replay is bounded by nonce + freshness.

A5 — Insider with kernel signing key

IN SCOPE

A single insider holding a kernel signing key can forge ALLOW verdicts. Mitigated by key splitting, rotation, and the public key transparency log at /keys.

A6 — Nation-state with hardware supply-chain access

OUT OF SCOPE

Pre-compromised silicon or firmware is not defended against by the current kernel. Documented as a known gap.

trust-boundaries

Trust boundaries

Each boundary below is enforced by a signed envelope. Crossing a boundary without a valid signature is treated as DENY.

Client → Portal

TLS + request signature. Portal does not hold capability authority; it forwards intent to the kernel.

Portal → Kernel

mTLS + envelope signature. The portal cannot synthesize a verdict; it can only request one.

Kernel → Runtime

Capability token issued per action, bound to action hash, expiry, and authority class. Runtime rejects any action whose hash does not match the token.

Runtime → External effects

Each external effect (network, storage, payment, code execution) is its own AC-scoped capability. There is no ambient authority.

assumed-broken

Assumed broken

Components we treat as compromised by default. The kernel must remain safe even when all of these are hostile.

Model weights

ASSUMED BROKEN

Treated as adversarial output generator.

Prompt channel

ASSUMED BROKEN

Treated as attacker-controlled.

Frontend JavaScript

ASSUMED BROKEN

Treated as observable and tamperable.

Operator email / chat

ASSUMED BROKEN

Treated as phishable. Out-of-band signing required for AC4+.

invariants

Invariants

Properties that must hold under every adversary in scope. Violation of any invariant triggers SAFE_HALT.

I1 — No silent ALLOW

Every ALLOW produces a signed, externally verifiable receipt.

I2 — No ambient authority

Every capability is explicit, scoped, expiring, and bound to an action hash.

Absence of an ALLOW is DENY. There is no permissive fallback.

I4 — Continuity gate

STATE_REGISTER mismatch produces SAFE_HALT, not silent recovery.

I5 — Public verifiability

Every signing key used for live verdicts is published at /keys with a rotation log.

out-of-scope

Explicit non-goals

Things this threat model deliberately does not address. Listed so reviewers do not assume coverage we have not claimed.

Hardware supply chain

OUT OF SCOPE

Pre-compromised CPUs, TPMs, or HSMs.

Side-channel attacks on the kernel host

OUT OF SCOPE

Spectre-class and power-analysis attacks.

Model alignment

OUT OF SCOPE

Project-AI does not claim the model is aligned. It claims the model cannot execute unsafe actions.

Legal admissibility in every jurisdiction

OUT OF SCOPE

Admissibility is claimed against the published frame, not against arbitrary courts.

disclosure

Disclosure

Vulnerabilities affecting any IN SCOPE adversary class are eligible for coordinated disclosure. Use /disclosure for the coordinated vulnerability policy and /.well-known/security.txt for machine-readable RFC 9116 contact metadata.