Substrate Error Registry — Incident Substrate Model
Document: substrate_errors.md
Path: /docs/Incident_Substrate_Model/substrate_errors.md
Revision: RTT/1 · Canon Edition
Status: Authoritative
Companion: operator_grammar.md
Issued: 2026-05-20
Preamble#
This document is the single canonical source of truth for all fault tokens
emitted by operators in the Incident Substrate Model (ISM). Every FAULTS →
entry in operator_grammar.md resolves to exactly one entry here.
Implementors MUST:
- Treat unrecognized fault tokens as
GEN-005 PARTIAL_EXECUTIONequivalents (i.e., assume worst-case substrate contamination and halt). - Never swallow a fault silently. Every fault must produce an
ExecutionRecordor anIngestionStatus == REJECTEDwhere applicable. - Expose the fault code verbatim to the calling substrate layer — do not translate, generalize, or redact fault codes in runtime logs.
How to Read This Registry#
Each entry follows this structure:
### FAULT_TOKEN_NAME
**Code:** DOMAIN-NNN
**Severity:** FATAL | ERROR | WARNING
**Recoverability:** RECOVERABLE | OPERATOR_ACTION_REQUIRED | UNRECOVERABLE
**Emitted by:** Comma-separated operator list
**Condition:** Precise trigger condition
**State effect:** What happens to IncidentRecord state on fault
**Handler MUST:** Required runtime behavior
**Notes:** Implementation guidance; edge cases
Severity Tiers#
| Tier | Meaning |
|---|---|
FATAL |
Record transitions to FAULTED; substrate execution halts for this record. All queued steps are cancelled. |
ERROR |
Operator aborts; substrate state is unchanged (as if operator was never called). Caller may retry after correction. |
WARNING |
Operator completed but with degraded guarantees. An ExecutionRecord is created; the step is marked STEP_EXECUTED with a warning annotation. |
Recoverability Tiers#
| Tier | Meaning |
|---|---|
RECOVERABLE |
Caller corrects the input and retries the operator. No human escalation required. |
OPERATOR_ACTION_REQUIRED |
Human or privileged system intervention is needed before retry (e.g., registry update, authorization grant, manual resolution). |
UNRECOVERABLE |
The current IncidentRecord cannot be repaired. A new record must be created via incident.ingest if re-processing is needed. |
Quick Reference Table#
48 unique fault tokens across 8 domains.
| Code | Token | Severity | Recoverability | Domain |
|---|---|---|---|---|
| GEN-001 | RECORD_NOT_FOUND | ERROR | RECOVERABLE | Cross-operator |
| GEN-002 | INVALID_STATE_TRANSITION | ERROR | OPERATOR_ACTION_REQUIRED | Cross-operator |
| GEN-003 | PLAN_NOT_FOUND | ERROR | RECOVERABLE | Cross-operator |
| GEN-004 | PLAN_STEP_MISMATCH | ERROR | OPERATOR_ACTION_REQUIRED | Cross-operator |
| GEN-005 | PARTIAL_EXECUTION | FATAL | UNRECOVERABLE | Cross-operator |
| GEN-006 | CHECKSUM_MISMATCH | ERROR | OPERATOR_ACTION_REQUIRED | Cross-operator |
| GEN-007 | EMPTY_DETAIL | ERROR | RECOVERABLE | Cross-operator |
| GEN-008 | ACCESS_DENIED | ERROR | OPERATOR_ACTION_REQUIRED | Cross-operator |
| ING-001 | UNAUTHORIZED_EMITTER | ERROR | OPERATOR_ACTION_REQUIRED | Ingestion |
| ING-002 | PAYLOAD_TOO_LARGE | ERROR | RECOVERABLE | Ingestion |
| ING-003 | MALFORMED_SIGNAL | ERROR | RECOVERABLE | Ingestion |
| ING-004 | UNSUPPORTED_CONTENT_TYPE | ERROR | RECOVERABLE | Ingestion |
| CLS-001 | INVALID_CATEGORY | ERROR | RECOVERABLE | Classification |
| CLS-002 | CONFIDENCE_BELOW_THRESHOLD | ERROR | RECOVERABLE | Classification |
| SRF-001 | EMPTY_SURFACE_LIST | ERROR | RECOVERABLE | Surface Mapping |
| SRF-002 | SURFACE_REF_INVALID | ERROR | RECOVERABLE | Surface Mapping |
| SRF-003 | SURFACE_LIMIT_EXCEEDED | ERROR | OPERATOR_ACTION_REQUIRED | Surface Mapping |
| SRF-004 | HASH_MISMATCH | ERROR | RECOVERABLE | Surface Mapping |
| PLN-001 | SURFACE_MAP_MISMATCH | ERROR | OPERATOR_ACTION_REQUIRED | Planning |
| PLN-002 | STEP_INDEX_INVALID | ERROR | RECOVERABLE | Planning |
| PLN-003 | UNKNOWN_OPERATOR_REF | ERROR | RECOVERABLE | Planning |
| PLN-004 | TARGET_NOT_IN_SURFACE_MAP | ERROR | RECOVERABLE | Planning |
| PLN-005 | PLAN_STEP_LIMIT_EXCEEDED | ERROR | OPERATOR_ACTION_REQUIRED | Planning |
| PLN-006 | PLAN_ID_MISMATCH | ERROR | RECOVERABLE | Planning |
| PLN-007 | UNSUPPORTED_FORMAT | ERROR | RECOVERABLE | Planning |
| UNC-001 | UNKNOWN_UNCERTAINTY_CODE | ERROR | RECOVERABLE | Uncertainty |
| UNC-002 | INSUFFICIENT_OTHER_DETAIL | ERROR | RECOVERABLE | Uncertainty |
| APV-001 | EMPTY_APPROVER_SET | ERROR | RECOVERABLE | Approval Flow |
| APV-002 | UNKNOWN_APPROVER | ERROR | OPERATOR_ACTION_REQUIRED | Approval Flow |
| APV-003 | BLOCKING_UNCERTAINTY_FLAGS | ERROR | OPERATOR_ACTION_REQUIRED | Approval Flow |
| APV-004 | INVALID_APPROVAL_POLICY | ERROR | RECOVERABLE | Approval Flow |
| APV-005 | UNKNOWN_HOLD_REASON | ERROR | RECOVERABLE | Approval Flow |
| APV-006 | HOLD_UNAUTHORIZED | ERROR | OPERATOR_ACTION_REQUIRED | Approval Flow |
| EXE-001 | FILE_NOT_IN_SURFACE_MAP | ERROR | OPERATOR_ACTION_REQUIRED | Bounded Execution |
| EXE-002 | PATH_TRAVERSAL_DETECTED | FATAL | UNRECOVERABLE | Bounded Execution |
| EXE-003 | SECRET_NOT_IN_SURFACE_MAP | ERROR | OPERATOR_ACTION_REQUIRED | Bounded Execution |
| EXE-004 | ROTATION_UNAUTHORIZED | ERROR | OPERATOR_ACTION_REQUIRED | Bounded Execution |
| EXE-005 | ROTATION_PROVIDER_ERROR | ERROR | RECOVERABLE | Bounded Execution |
| EXE-006 | DEPENDENT_NOTIFICATION_FAILED | WARNING | OPERATOR_ACTION_REQUIRED | Bounded Execution |
| EXE-007 | DEPENDENCY_NOT_IN_SURFACE_MAP | ERROR | OPERATOR_ACTION_REQUIRED | Bounded Execution |
| EXE-008 | VERSION_MISMATCH | ERROR | RECOVERABLE | Bounded Execution |
| EXE-009 | TARGET_VERSION_INVALID | ERROR | OPERATOR_ACTION_REQUIRED | Bounded Execution |
| EXE-010 | PACKAGE_MANAGER_ERROR | ERROR | RECOVERABLE | Bounded Execution |
| EXE-011 | UNKNOWN_FOLLOWUP_CODE | ERROR | RECOVERABLE | Bounded Execution |
| EXE-012 | INVALID_PRIORITY | ERROR | RECOVERABLE | Bounded Execution |
| EXE-013 | EMPTY_ASSIGNEE_LIST | ERROR | RECOVERABLE | Bounded Execution |
| EXE-014 | UNRESOLVABLE_ASSIGNEE | ERROR | OPERATOR_ACTION_REQUIRED | Bounded Execution |
| EXE-015 | INSUFFICIENT_RISK_DETAIL | ERROR | RECOVERABLE | Bounded Execution |
Domain GEN — Cross-Operator Faults#
These faults may be emitted by any operator. Implementations MUST handle them at the substrate layer rather than per-operator.
RECORD_NOT_FOUND#
Code: GEN-001
Severity: ERROR
Recoverability: RECOVERABLE
Emitted by: incident.classify, incident.map_surface_area,
incident.derive_rectification_steps, incident.generate_readonly_plan,
incident.flag_uncertainty, incident.request_operator_approval,
incident.hold_for_review, incident.execute.remove_file,
incident.execute.rotate_secret, incident.execute.patch_dependency,
incident.execute.flag_for_followup
Condition: The supplied record_id does not resolve to any
IncidentRecord in the substrate store.
State effect: None. No state is modified.
Handler MUST:
- Abort operator immediately.
- Return fault token to caller with the unresolved
record_id. - Do not create a new record as a side effect.
Notes: Callers should verify
record_idprovenance before retry. Ifrecord_idwas obtained from a priorincident.ingestOUT, the ingest may have returnedstatus == REJECTEDand no record was created — checkIngestionStatusin the ingest result.
INVALID_STATE_TRANSITION#
Code: GEN-002
Severity: ERROR
Recoverability: OPERATOR_ACTION_REQUIRED
Emitted by: incident.classify, incident.map_surface_area,
incident.derive_rectification_steps, incident.request_operator_approval,
incident.hold_for_review
Condition: The current IncidentRecord.state is not a member of the
operator's declared PRE[...] legal state set.
State effect: None. Operator aborts before any write.
Handler MUST:
- Abort operator immediately.
- Log current state and the state set the operator expected.
- Expose both values in the fault payload to the caller.
- Do not attempt to force-advance or repair the record's state.
Notes: This fault almost always indicates a race condition (concurrent
operator invocations on the same record) or a missed preceding operator
in the pipeline. Callers MUST use the state machine in
operator_grammar.mdSection 7 to determine the correct remediation path. Never retry without first querying the current record state.
PLAN_NOT_FOUND#
Code: GEN-003
Severity: ERROR
Recoverability: RECOVERABLE
Emitted by: incident.generate_readonly_plan,
incident.request_operator_approval
Condition: The supplied plan_id does not resolve to any
RectificationPlan in the substrate store.
State effect: None.
Handler MUST:
- Abort operator immediately.
- Return the unresolved
plan_idin the fault payload. Notes: Verify theplan_idwas emitted by a successfulincident.derive_rectification_stepscall on the same record. APLAN_NOT_FOUNDon a record withstate == PLAN_DERIVEDindicates a substrate store inconsistency — escalate to substrate operations.
PLAN_STEP_MISMATCH#
Code: GEN-004
Severity: ERROR
Recoverability: OPERATOR_ACTION_REQUIRED
Emitted by: incident.execute.remove_file,
incident.execute.rotate_secret, incident.execute.patch_dependency,
incident.execute.flag_for_followup
Condition: The operator_ref declared for step_index in the approved
RectificationPlan does not match the executing operator, OR target_ref
at that step does not match the input target, OR the step at step_index
has already been executed.
State effect: None. Execution is refused before any target mutation.
Handler MUST:
- Abort operator immediately.
- Log the expected
operator_ref/target_reffrom the plan and the actual values supplied. - Do not advance
step_index. Notes: This is the primary enforcement mechanism for plan scope confinement. A mismatch onoperator_refmay indicate plan tampering or incorrect step routing in the execution layer. Treat with the same urgency as a security boundary violation. The fix requires re-examining the plan and correcting the execution invocation — the plan itself is immutable at this stage.
PARTIAL_EXECUTION#
Code: GEN-005
Severity: FATAL
Recoverability: UNRECOVERABLE
Emitted by: incident.execute.remove_file,
incident.execute.rotate_secret, incident.execute.patch_dependency
Condition: The execution operator began mutating the target but did not
complete atomically — the target was left in an intermediate state (e.g.,
file partially deleted, secret rotation started but not committed, package
manifest updated but lock file not regenerated).
State effect: IncidentRecord.state transitions to FAULTED.
All remaining queued steps are cancelled.
Handler MUST:
- Immediately halt all further execution steps on this record.
- Emit an
ExecutionRecordwith statusPARTIAL_EXECUTIONincluding the last known target state and the point of failure. - Transition the record to
FAULTED. - Alert operators via the substrate notification channel.
- Do NOT attempt automatic rollback — rollback is a manual operator action.
Notes: This is the most critical fault in the registry. A
PARTIAL_EXECUTIONmeans the substrate surface is in an unknown and potentially dangerous state. TheFAULTEDrecord MUST be reviewed by a human operator before any newincident.ingestsignal for the same surfaces is processed. Execution operators MUST use atomic transactions or rollback-capable primitives wherever the target system supports them to minimize exposure to this fault.dry_run == trueinvocations are immune to this fault.
CHECKSUM_MISMATCH#
Code: GEN-006
Severity: ERROR
Recoverability: OPERATOR_ACTION_REQUIRED
Emitted by: incident.execute.remove_file,
incident.execute.patch_dependency
Condition: A checksum or verify_checksum was supplied and the computed
checksum of the target (pre-removal or post-install) does not match the
declared value.
State effect: None. The operator aborts before any mutation is committed
when checksum is a pre-condition. For post-install verification failure in
patch_dependency, the patch is rolled back if possible; if rollback
fails, PARTIAL_EXECUTION supersedes this fault.
Handler MUST:
- Abort without mutating the target.
- Log both the expected and computed checksums.
- Never proceed with a mismatched checksum, even under operator override
at the call site.
Notes: A checksum mismatch on a pre-removal file may indicate the file
was modified between surface mapping and execution — this is a security
signal. Callers should consider re-running
incident.map_surface_areaandincident.derive_rectification_stepsbefore retrying. A mismatch onpatch_dependencypost-install indicates a compromised package registry or a supply chain integrity failure — do not retry without investigating.
EMPTY_DETAIL#
Code: GEN-007
Severity: ERROR
Recoverability: RECOVERABLE
Emitted by: incident.flag_uncertainty, incident.hold_for_review,
incident.execute.flag_for_followup
Condition: The detail field is present but empty, whitespace-only, or
below the minimum required length for the given context.
State effect: None.
Handler MUST:
- Abort operator.
- Return the minimum length requirement in the fault payload.
Notes: "Empty" includes strings containing only whitespace, newlines, or
null bytes. Implementations MUST trim the
detailvalue before length evaluation. The minimum length forUncertaintyCode.OTHERandFollowupCode.RISK_ACCEPTEDis governed by substrate constantsMIN_OTHER_DETAIL_LENGTHandMIN_RISK_ACCEPTANCE_DETAIL_LENGTHrespectively (seeoperator_grammar.mdSection 9).
ACCESS_DENIED#
Code: GEN-008
Severity: ERROR
Recoverability: OPERATOR_ACTION_REQUIRED
Emitted by: incident.execute.remove_file
Condition: The executing agent does not hold the required permission to
perform the declared operation on the target resource.
State effect: None. No mutation is attempted.
Handler MUST:
- Abort operator immediately.
- Log the identity of the executing agent and the target resource.
- Do not retry with elevated permissions automatically — permission grants
require explicit operator action.
Notes: Implementations MUST NOT cache or re-use permissions across
execution steps. Each
incident.execute.*invocation MUST re-validate its authorization at execution time. If access was valid during planning but denied at execution, emitincident.flag_uncertaintywith codeAUTHORIZATION_AMBIGUOUSon the record before escalating.
Domain ING — Ingestion Faults#
Emitted exclusively by incident.ingest. These faults result in
IngestionStatus == REJECTED; no IncidentRecord is created.
UNAUTHORIZED_EMITTER#
Code: ING-001
Severity: ERROR
Recoverability: OPERATOR_ACTION_REQUIRED
Emitted by: incident.ingest
Condition: The source value is not present in the substrate's
allowed_emitter_set registry, or the emitter's authorization token
is absent, expired, or revoked.
State effect: No record created. IngestionStatus == REJECTED.
Handler MUST:
- Reject the signal without partial processing.
- Log the unauthorized
sourceidentifier and the emission timestamp. - Do not expose the contents of
allowed_emitter_setin the fault payload. - Rate-limit repeated unauthorized attempts from the same
source. Notes: This fault is also the correct response when an emitter's credentials are valid but its scope does not include the ISM substrate endpoint. Substrate operators must register new emitters via the emitter registry management interface — not by modifying this document.
PAYLOAD_TOO_LARGE#
Code: ING-002
Severity: ERROR
Recoverability: RECOVERABLE
Emitted by: incident.ingest
Condition: raw_payload byte length exceeds MAX_PAYLOAD_BYTES
(default: 10 MiB; see operator_grammar.md Section 9).
State effect: No record created. IngestionStatus == REJECTED.
Handler MUST:
- Reject immediately without reading the full payload into memory.
- Return
MAX_PAYLOAD_BYTESand the actual received size in the fault payload. Notes: Emitters SHOULD compress or chunk payloads exceeding the limit before re-submission. The substrate does not support streaming ingestion — the entire payload must fit within the declared limit. If the limit is consistently exceeded for legitimate signals,MAX_PAYLOAD_BYTESmay be increased via the ISM configuration layer.
MALFORMED_SIGNAL#
Code: ING-003
Severity: ERROR
Recoverability: RECOVERABLE
Emitted by: incident.ingest
Condition: The raw_payload cannot be parsed according to the declared
content_type, or required top-level fields are absent or of the wrong type,
or signal_id is not a syntactically valid RFC 4122 v4 UUID, or
emitted_at is not a valid ISO-8601 UTC timestamp.
State effect: No record created. IngestionStatus == REJECTED.
Handler MUST:
- Reject the signal.
- Return the specific field or structural issue that caused the fault.
- Do not attempt partial parsing or best-effort normalization.
Notes: The substrate MUST NOT attempt to infer or correct malformed
field values. Silent correction masks emitter-side bugs and produces
unreliable
IncidentRecorddata downstream. Emitter implementors should run signals through schema validation before submission.
UNSUPPORTED_CONTENT_TYPE#
Code: ING-004
Severity: ERROR
Recoverability: RECOVERABLE
Emitted by: incident.ingest
Condition: The content_type value is not present in the substrate's
list of supported MIME types.
State effect: No record created. IngestionStatus == REJECTED.
Handler MUST:
- Return the list of supported MIME types in the fault payload.
- Do not attempt content-type sniffing or fallback parsing.
Notes: The supported MIME type list is defined in the ISM configuration
layer and is substrate-specific. Common supported types are
application/jsonandtext/plain. Binary formats require explicit registration. Do not add MIME types to this document — update the configuration layer.
Domain CLS — Classification Faults#
Emitted exclusively by incident.classify.
INVALID_CATEGORY#
Code: CLS-001
Severity: ERROR
Recoverability: RECOVERABLE
Emitted by: incident.classify
Condition: The category value is not a member of the IncidentCategory
type registry (see operator_grammar.md Section 8).
State effect: None. Record remains in its current state.
Handler MUST:
- Abort classification.
- Return the invalid value and the full
IncidentCategoryenum in the fault payload. Notes: Classifiers MUST validatecategoryagainst the type registry before invoking this operator. If the appropriate category does not exist, useUNKNOWNand document the rationale insubcategory. Do not invent category tokens outside the registry — classification consistency depends on the closed taxonomy.
CONFIDENCE_BELOW_THRESHOLD#
Code: CLS-002
Severity: ERROR
Recoverability: RECOVERABLE
Emitted by: incident.classify
Condition: The confidence value is below MIN_CLASSIFICATION_CONFIDENCE
(default: 0.70; see operator_grammar.md Section 9).
State effect: None. Record remains in its current state.
Handler MUST:
- Abort classification.
- Return the supplied
confidencevalue and the current threshold in the fault payload. Notes: Classifiers that cannot reach threshold confidence SHOULD invokeincident.flag_uncertaintywith codeCLASSIFICATION_AMBIGUOUSbefore surfacing the result for manual review. Do not lowerMIN_CLASSIFICATION_CONFIDENCEto bypass this fault — doing so degrades all downstream surface mapping and planning accuracy. The threshold may be legitimately adjusted via the ISM configuration layer for specific substrate deployments.
Domain SRF — Surface Mapping Faults#
Emitted exclusively by incident.map_surface_area.
EMPTY_SURFACE_LIST#
Code: SRF-001
Severity: ERROR
Recoverability: RECOVERABLE
Emitted by: incident.map_surface_area
Condition: The surfaces list is empty (zero entries).
State effect: None.
Handler MUST:
- Abort operator.
- Return a fault indicating that at least one surface entry is required.
Notes: A zero-surface submission is almost always a scanner bug or a
misconfigured scanner scope. If the incident genuinely touches no
enumerable surfaces, operators should consider whether the classification
was correct. A
SURFACE_INCOMPLETEuncertainty flag is more appropriate than submitting zero surfaces.
SURFACE_REF_INVALID#
Code: SRF-002
Severity: ERROR
Recoverability: RECOVERABLE
Emitted by: incident.map_surface_area
Condition: One or more surface_ref values in the surfaces list are
syntactically invalid for their declared surface_type (e.g., a FILE
entry with a relative path, a SECRET entry with a malformed ARN, a
DEPENDENCY entry missing the package@version format).
State effect: None.
Handler MUST:
- Abort operator.
- Return all invalid
surface_refvalues and theirsurface_typein the fault payload, not just the first one. Notes: The substrate MUST validate all entries before accepting any. Partial surface maps with some valid and some invalid entries MUST be rejected in full — partial acceptance would produce a surface map that silently omits surfaces, violating scope completeness.
SURFACE_LIMIT_EXCEEDED#
Code: SRF-003
Severity: ERROR
Recoverability: OPERATOR_ACTION_REQUIRED
Emitted by: incident.map_surface_area
Condition: The surfaces list contains more entries than
MAX_SURFACE_ENTRIES (default: 500; see operator_grammar.md Section 9).
State effect: None.
Handler MUST:
- Abort operator.
- Return
MAX_SURFACE_ENTRIESand the actual submitted count in the fault payload. Notes: Exceeding the surface limit almost always indicates either an overly broad scanner scope or an incident with an unusually large blast radius. Operators SHOULD consider splitting the incident into multiple child records via separateincident.ingestcalls scoped to bounded surface clusters. RaisingMAX_SURFACE_ENTRIESis a configuration change requiring explicit operator approval — it is not a per-call override.
HASH_MISMATCH#
Code: SRF-004
Severity: ERROR
Recoverability: RECOVERABLE
Emitted by: incident.map_surface_area
Condition: The surface_snapshot_hash value does not match the SHA-256
hash computed by the substrate over the submitted surfaces list.
State effect: None.
Handler MUST:
- Abort operator.
- Return the expected hash (computed server-side) and the submitted hash in the fault payload.
- Do NOT store the submitted surfaces, even temporarily.
Notes: This fault is the primary defense against in-transit surface list
corruption or truncation. Callers MUST recompute the hash client-side
immediately before submission using the same serialization order used to
build the
surfaceslist. Hash computation must be over the canonical wire-format representation of the list, not an in-memory object graph.
Domain PLN — Planning Faults#
Emitted by incident.derive_rectification_steps and
incident.generate_readonly_plan.
SURFACE_MAP_MISMATCH#
Code: PLN-001
Severity: ERROR
Recoverability: OPERATOR_ACTION_REQUIRED
Emitted by: incident.derive_rectification_steps
Condition: The surface_map_id supplied does not match
IncidentRecord(record_id).surface_map_id, or the referenced surface map
has been superseded by a newer mapping on this record.
State effect: None.
Handler MUST:
- Abort planning.
- Return both the supplied
surface_map_idand the currentIncidentRecord.surface_map_idin the fault payload. Notes: Plans MUST be derived against the current surface map only. Stalesurface_map_idvalues indicate a race where the surface was re-mapped between the planner reading the record and submitting the plan. The planner must re-fetch the record, obtain the currentsurface_map_id, and re-derive all steps.
STEP_INDEX_INVALID#
Code: PLN-002
Severity: ERROR
Recoverability: RECOVERABLE
Emitted by: incident.derive_rectification_steps
Condition: The steps list contains one or more of: a non-zero-based
index (first index is not 0), duplicate indices, gaps in the index
sequence, or non-integer index values.
State effect: None.
Handler MUST:
- Abort planning.
- Return all invalid indices in the fault payload.
Notes: Steps MUST form a complete, gapless, 0-based integer sequence.
The substrate uses
step_indexfor ordered sequential execution — gaps or duplicates would produce ambiguous or skipped execution steps. Planners generating steps programmatically MUST sort and renumber before submission.
UNKNOWN_OPERATOR_REF#
Code: PLN-003
Severity: ERROR
Recoverability: RECOVERABLE
Emitted by: incident.derive_rectification_steps
Condition: One or more operator_ref values in the steps list do not
resolve to a known incident.execute.* operator, or reference an operator
outside the incident.execute.* namespace.
State effect: None.
Handler MUST:
- Abort planning.
- Return all unresolvable
operator_refvalues in the fault payload. Notes: Planners MUST validateoperator_refvalues against the canonical operator registry before submission. Typos, version-suffixed refs, and references to deprecated operators are all invalid. References to operators outsideincident.execute.*(e.g.,incident.classify) are explicitly forbidden in plan steps — this is a grammar-level constraint, not a permissions boundary.
TARGET_NOT_IN_SURFACE_MAP#
Code: PLN-004
Severity: ERROR
Recoverability: RECOVERABLE
Emitted by: incident.derive_rectification_steps
Condition: One or more target_ref values in the steps list do not
match any surface_ref in SurfaceMap(surface_map_id).
State effect: None.
Handler MUST:
- Abort planning.
- Return all unmatched
target_refvalues alongside the availablesurface_refvalues in the fault payload. Notes: Plans MUST NOT introduce targets that were not declared in the surface map. This constraint enforces that execution operators cannot exceed the scanned and approved incident surface. If a required target is absent from the surface map, operators must re-runincident.map_surface_areawith an updated surface list before re-planning.
PLAN_STEP_LIMIT_EXCEEDED#
Code: PLN-005
Severity: ERROR
Recoverability: OPERATOR_ACTION_REQUIRED
Emitted by: incident.derive_rectification_steps
Condition: The steps list contains more entries than MAX_PLAN_STEPS
(default: 50; see operator_grammar.md Section 9).
State effect: None.
Handler MUST:
- Abort planning.
- Return
MAX_PLAN_STEPSand the actual submitted step count in the fault payload. Notes: Incidents requiring more than 50 rectification steps SHOULD be decomposed into multiple bounded incidents viaincident.ingest, each with its own surface map and plan. A single plan with 50+ steps is a strong signal that the incident scope is too broad to remediate safely in one approval cycle.
PLAN_ID_MISMATCH#
Code: PLN-006
Severity: ERROR
Recoverability: RECOVERABLE
Emitted by: incident.generate_readonly_plan
Condition: The supplied plan_id does not match
IncidentRecord(record_id).plan_id.
State effect: None. This is a READONLY operator; no state is
modified regardless.
Handler MUST:
- Abort operator.
- Return both the supplied
plan_idand the currentIncidentRecord.plan_idin the fault payload. Notes: This fault typically indicates a caller holding a stale plan reference. Re-fetch the record to obtain the currentplan_idbefore retrying. UnlikeSURFACE_MAP_MISMATCH, this fault carries lower urgency since the operator is read-only — but the caller must still correct its reference before calling any downstream operators.
UNSUPPORTED_FORMAT#
Code: PLN-007
Severity: ERROR
Recoverability: RECOVERABLE
Emitted by: incident.generate_readonly_plan
Condition: The format value is not a member of {MARKDOWN, JSON, TEXT}.
State effect: None.
Handler MUST:
- Abort operator.
- Return the list of supported
PlanFormatvalues in the fault payload. Notes: Format negotiation should happen at the call site before invoking this operator. Do not default to any format silently — if the requested format is not supported, fault and surface the constraint.
Domain UNC — Uncertainty Faults#
Emitted exclusively by incident.flag_uncertainty.
UNKNOWN_UNCERTAINTY_CODE#
Code: UNC-001
Severity: ERROR
Recoverability: RECOVERABLE
Emitted by: incident.flag_uncertainty
Condition: The uncertainty_code value is not a member of the
UncertaintyCode registry.
State effect: None. No uncertainty flag is attached to the record.
Handler MUST:
- Abort operator.
- Return the full
UncertaintyCoderegistry in the fault payload. Notes: If no registered code adequately describes the uncertainty, useOTHERwith a detaileddetailfield. TheUncertaintyCoderegistry is closed — codes are not added at call time. Extension requests must go through the ISM grammar revision process.
INSUFFICIENT_OTHER_DETAIL#
Code: UNC-002
Severity: ERROR
Recoverability: RECOVERABLE
Emitted by: incident.flag_uncertainty
Condition: uncertainty_code == OTHER and detail.length is below
MIN_OTHER_DETAIL_LENGTH (default: 80 characters; see
operator_grammar.md Section 9).
State effect: None.
Handler MUST:
- Abort operator.
- Return the minimum required length and the actual submitted length
in the fault payload.
Notes: The elevated minimum for
OTHERexists becauseOTHERis the catch-all code and provides no structural information by itself. Thedetailfield must carry sufficient context for a human reviewer to understand the uncertainty without any other context. Generic strings like "unknown error" or "see logs" are structurally valid but semantically insufficient — reviewers SHOULD be instructed to treat minimalOTHERflags as low-signal.
Domain APV — Approval Flow Faults#
Emitted by incident.request_operator_approval and
incident.hold_for_review.
EMPTY_APPROVER_SET#
Code: APV-001
Severity: ERROR
Recoverability: RECOVERABLE
Emitted by: incident.request_operator_approval
Condition: The approver_set list is empty (zero entries).
State effect: None. Record remains in PLAN_DERIVED.
Handler MUST:
- Abort operator.
- Require the caller to supply at least one
ApproverRef. Notes: Approval requests with no approvers would create aPENDING_APPROVALrecord that can never be resolved — a deadlock state. This fault prevents that condition at the grammar level.
UNKNOWN_APPROVER#
Code: APV-002
Severity: ERROR
Recoverability: OPERATOR_ACTION_REQUIRED
Emitted by: incident.request_operator_approval
Condition: One or more ApproverRef values in approver_set do not
resolve in the approver registry.
State effect: None. Record remains in PLAN_DERIVED.
Handler MUST:
- Abort operator.
- Return all unresolvable
ApproverRefvalues in the fault payload. - Do not notify any approvers in the partial set. Notes: Partial approval sets are never accepted — either all approvers resolve or none are notified. This prevents phantom approval requests where only some approvers receive notification. Unresolvable approvers must be registered in the approver registry by a substrate administrator before retry.
BLOCKING_UNCERTAINTY_FLAGS#
Code: APV-003
Severity: ERROR
Recoverability: OPERATOR_ACTION_REQUIRED
Emitted by: incident.request_operator_approval
Condition: The record has one or more attached UncertaintyFlag entries
with severity BLOCKING that have not been resolved or explicitly
acknowledged in context_note.
State effect: None. Record remains in PLAN_DERIVED.
Handler MUST:
- Abort operator.
- Return all unresolved blocking flag IDs and their
uncertainty_codevalues in the fault payload. Notes: Blocking uncertainty flags exist to prevent approval requests from proceeding when the plan cannot be safely evaluated. Operators must either resolve the underlying uncertainty (by re-running classification or surface mapping) or explicitly acknowledge each flag incontext_noteusing the format:"ACKNOWLEDGED: <flag_id> — <rationale>". Acknowledgment without rationale is not accepted.
INVALID_APPROVAL_POLICY#
Code: APV-004
Severity: ERROR
Recoverability: RECOVERABLE
Emitted by: incident.request_operator_approval
Condition: The approval_policy value is not a member of
{ANY_ONE, MAJORITY, ALL}.
State effect: None.
Handler MUST:
- Abort operator.
- Return the valid
ApprovalPolicyvalues in the fault payload. Notes:MAJORITYrequires an odd-numberedapprover_setto avoid tie deadlocks. Implementations SHOULD warn (but not fault) whenapproval_policy == MAJORITYandapprover_set.countis even. The substrate resolves majority ties in favor of rejection.
UNKNOWN_HOLD_REASON#
Code: APV-005
Severity: ERROR
Recoverability: RECOVERABLE
Emitted by: incident.hold_for_review
Condition: The reason_code value is not a member of the
HoldReason registry.
State effect: None. Record is NOT placed on hold.
Handler MUST:
- Abort operator.
- Return the full
HoldReasonregistry in the fault payload. Notes: A failed hold attempt is particularly dangerous because the caller intended to stop execution but the hold was not placed. The caller MUST treatUNKNOWN_HOLD_REASONas equivalent to a failed safety brake and escalate immediately. Do not fall through to the next operation on the assumption that the hold succeeded.
HOLD_UNAUTHORIZED#
Code: APV-006
Severity: ERROR
Recoverability: OPERATOR_ACTION_REQUIRED
Emitted by: incident.hold_for_review
Condition: The held_by identity does not hold the required authorization
to place a hold on this record. Authorization may be scoped by record
classification, surface type, or organizational policy.
State effect: None. Record is NOT placed on hold.
Handler MUST:
- Abort operator.
- Log the unauthorized
held_byidentity and the record ID. - Escalate to a substrate administrator immediately if this occurs during
an active execution sequence.
Notes: Same urgency note as
UNKNOWN_HOLD_REASON— a failed hold during execution means the safety brake did not engage. The caller must not continue with execution steps and must escalate to obtain an authorized hold-placing identity before retrying.
Domain EXE — Bounded Execution Faults#
Emitted by incident.execute.* operators. All execution faults produce an
ExecutionRecord regardless of whether the step succeeded or failed — the
ExecutionRecord is the permanent audit trail.
FILE_NOT_IN_SURFACE_MAP#
Code: EXE-001
Severity: ERROR
Recoverability: OPERATOR_ACTION_REQUIRED
Emitted by: incident.execute.remove_file
Condition: The supplied file_path does not match any FILE-typed
surface_ref in SurfaceMap(IncidentRecord(record_id).surface_map_id).
State effect: None. File is not removed.
Handler MUST:
- Abort operator.
- Return the unmatched
file_pathand the list of FILE-typed surface entries in the fault payload. - Treat this as a scope boundary violation. Notes: This fault is the execution-layer enforcement of Grammar Invariant 3 (surface scope enforcement). A target absent from the surface map means the removal was not approved as part of the incident. If the file genuinely needs to be removed, re-map the surface, re-derive the plan, and re-seek approval.
PATH_TRAVERSAL_DETECTED#
Code: EXE-002
Severity: FATAL
Recoverability: UNRECOVERABLE
Emitted by: incident.execute.remove_file
Condition: The canonical resolution of file_path (after resolving
symlinks, .. components, and environment variable expansions) exits
the declared substrate boundary, or targets a path that was not
submitted as a surface entry.
State effect: IncidentRecord.state transitions to FAULTED.
All remaining steps are cancelled.
Handler MUST:
- Abort operator immediately. Do not access the file at any point.
- Transition the record to
FAULTED. - Log the submitted path and its resolved canonical form.
- Alert substrate security operations immediately — this may indicate an adversarial plan or a compromised planning agent.
- Preserve the submitted
file_pathvalue as forensic evidence. Notes: This is a security-critical fault. Path traversal in an automated remediation system is a high-severity attack vector. The record is immediately terminal. A new investigation should be opened — potentially targeting the planning agent that submitted the traversal path — before any new ingestion is processed for related surfaces.
SECRET_NOT_IN_SURFACE_MAP#
Code: EXE-003
Severity: ERROR
Recoverability: OPERATOR_ACTION_REQUIRED
Emitted by: incident.execute.rotate_secret
Condition: The supplied secret_ref does not match any SECRET-typed
surface_ref in SurfaceMap(IncidentRecord(record_id).surface_map_id).
State effect: None. No rotation is initiated.
Handler MUST:
- Abort operator.
- Return the unmatched
secret_refin the fault payload. - Do not expose the list of known secret refs in the fault payload
(enumeration risk).
Notes: Unlike
FILE_NOT_IN_SURFACE_MAP, the fault payload MUST NOT enumerate the full set of SECRET-typed surface entries — doing so would leak information about the substrate's secret topology to the caller log. Callers should re-fetch the surface map directly to identify valid targets.
ROTATION_UNAUTHORIZED#
Code: EXE-004
Severity: ERROR
Recoverability: OPERATOR_ACTION_REQUIRED
Emitted by: incident.execute.rotate_secret
Condition: The executing agent does not hold rotation authorization for
the target secret_ref in the secret management layer (e.g., AWS Secrets
Manager, HashiCorp Vault, GCP Secret Manager).
State effect: None. No rotation is initiated.
Handler MUST:
- Abort operator.
- Log the executing agent identity and the
secret_ref(reference only, never the secret value). - Do not retry with a different agent identity automatically.
Notes: Rotation authorization is granted at the secret management layer,
not within ISM. If the executing agent lacks authorization, a substrate
administrator must grant the appropriate IAM role, Vault policy, or
equivalent before retry. This fault SHOULD trigger an
incident.flag_uncertaintywith codeAUTHORIZATION_AMBIGUOUSfor human awareness.
ROTATION_PROVIDER_ERROR#
Code: EXE-005
Severity: ERROR
Recoverability: RECOVERABLE
Emitted by: incident.execute.rotate_secret
Condition: The secret management provider returned an error during the
rotation attempt (e.g., provider unavailable, rotation plugin failure,
transient API error). The rotation was not completed.
State effect: None. The old secret version remains active.
Handler MUST:
- Abort operator. Confirm with the provider that the rotation was not applied before allowing retry.
- Log the provider error code and message in the
ExecutionRecord. - Apply exponential backoff before retry.
Notes: Implementations MUST verify the rotation state with the provider
before retrying — a provider error does not guarantee the rotation was
not partially applied. If the provider cannot confirm the rotation state,
treat as
PARTIAL_EXECUTION(GEN-005) and transition the record toFAULTED.
DEPENDENT_NOTIFICATION_FAILED#
Code: EXE-006
Severity: WARNING
Recoverability: OPERATOR_ACTION_REQUIRED
Emitted by: incident.execute.rotate_secret
Condition: notify_dependents == true and the rotation completed
successfully, but one or more registered secret consumers could not be
notified of the new secret version.
State effect: None. The rotation itself succeeded; the record step is
marked STEP_EXECUTED with a warning annotation. The ExecutionRecord
is created.
Handler MUST:
- Complete the step as
STEP_EXECUTED(rotation was successful). - Annotate the
ExecutionRecordwith the list of consumers that failed to receive notification. - Create a follow-up flag automatically via
incident.execute.flag_for_followupwith codeMANUAL_REMEDIATION_REQUIREDand assign it to the substrate operations team. Notes: This is the onlyWARNING-severity fault in the registry. The rotation is complete and the old secret is invalidated — consumers that were not notified may begin failing. The follow-up flag is non-optional; unnotified consumers represent a live operational risk.
DEPENDENCY_NOT_IN_SURFACE_MAP#
Code: EXE-007
Severity: ERROR
Recoverability: OPERATOR_ACTION_REQUIRED
Emitted by: incident.execute.patch_dependency
Condition: The supplied package_ref does not match any
DEPENDENCY-typed surface_ref in the approved surface map.
State effect: None. No patch is applied.
Handler MUST:
- Abort operator.
- Return the unmatched
package_refand the list of DEPENDENCY-typed surface entries in the fault payload. Notes: SeeEXE-001(FILE_NOT_IN_SURFACE_MAP) — identical scope enforcement rationale applies. Package refs MUST use the canonical formatecosystem:package@version(e.g.,npm:lodash@4.17.20) to enable unambiguous matching against surface entries.
VERSION_MISMATCH#
Code: EXE-008
Severity: ERROR
Recoverability: RECOVERABLE
Emitted by: incident.execute.patch_dependency
Condition: The current_version declared in IN(...) does not match
the version of the package actually installed at package_ref in the
target environment at execution time.
State effect: None. No patch is applied.
Handler MUST:
- Abort operator.
- Return
current_versionas supplied and the actual installed version discovered at execution time in the fault payload. Notes: Version drift between plan derivation and execution time is the primary cause of this fault. If the installed version is already at or beyondtarget_version, the operator returnsALREADY_AT_TARGETrather than this fault. If the installed version is different from bothcurrent_versionandtarget_version, the caller must re-derive the plan step with updated version values.
TARGET_VERSION_INVALID#
Code: EXE-009
Severity: ERROR
Recoverability: OPERATOR_ACTION_REQUIRED
Emitted by: incident.execute.patch_dependency
Condition: The target_version is marked deprecated, yanked, or
retracted in the package_manager registry at execution time, OR
target_version does not satisfy the semver constraint declared in
the target's manifest, OR target_version is not a syntactically
valid semver string.
State effect: None. No patch is applied.
Handler MUST:
- Abort operator.
- Return the reason for invalidity (deprecated, yanked, semver-invalid, manifest-constraint-violation) in the fault payload.
- Never install a yanked or deprecated package.
Notes: A yanked
target_versionat execution time that was valid at plan derivation time indicates a supply chain event that occurred during the incident response window. Treat this as a signal that the plan needs to be re-derived with a new target version. Substrate operators SHOULD verify the new target version's provenance before re-approval.
PACKAGE_MANAGER_ERROR#
Code: EXE-010
Severity: ERROR
Recoverability: RECOVERABLE
Emitted by: incident.execute.patch_dependency
Condition: The package manager returned a non-zero exit code or error
response during package installation, resolution, or lock file generation.
The patch was not successfully applied.
State effect: The package manager should have rolled back; if it did
not, PARTIAL_EXECUTION (GEN-005) supersedes this fault.
Handler MUST:
- Abort operator.
- Capture and log the full package manager error output in the
ExecutionRecord. - Verify the package manager performed its own rollback before allowing retry.
- Apply backoff before retry if the error is transient (e.g., registry timeout). Notes: Common transient causes: package registry downtime, DNS failure, rate limiting. Common non-transient causes: dependency conflict, incompatible platform, missing system library. Non-transient errors require plan re-derivation with a compatible target package.
UNKNOWN_FOLLOWUP_CODE#
Code: EXE-011
Severity: ERROR
Recoverability: RECOVERABLE
Emitted by: incident.execute.flag_for_followup
Condition: The followup_code value is not a member of the
FollowupCode registry.
State effect: None. No follow-up ticket is created.
Handler MUST:
- Abort operator.
- Return the full
FollowupCoderegistry in the fault payload. Notes: TheFollowupCoderegistry is closed. UseMANUAL_REMEDIATION_REQUIREDas the general-purpose code when no more specific code applies.
INVALID_PRIORITY#
Code: EXE-012
Severity: ERROR
Recoverability: RECOVERABLE
Emitted by: incident.execute.flag_for_followup
Condition: The priority value is not a member of
{CRITICAL, HIGH, MEDIUM, LOW}.
State effect: None. No follow-up ticket is created.
Handler MUST:
- Abort operator.
- Return the valid
FollowupPriorityvalues in the fault payload. Notes: Implementations MUST NOT default to any priority value silently. Priority must be explicitly supplied by the caller for every follow-up flag.
EMPTY_ASSIGNEE_LIST#
Code: EXE-013
Severity: ERROR
Recoverability: RECOVERABLE
Emitted by: incident.execute.flag_for_followup
Condition: The assigned_to list is empty (zero entries).
State effect: None. No follow-up ticket is created.
Handler MUST:
- Abort operator.
- Require the caller to supply at least one assignee identifier. Notes: Unassigned follow-up tickets are operationally dead — they will never be actioned. The substrate requires at least one assignee to ensure accountability. If the correct assignee is unknown, use a team or role identifier from the operator registry.
UNRESOLVABLE_ASSIGNEE#
Code: EXE-014
Severity: ERROR
Recoverability: OPERATOR_ACTION_REQUIRED
Emitted by: incident.execute.flag_for_followup
Condition: One or more identifiers in assigned_to do not resolve in
the operator registry.
State effect: None. No follow-up ticket is created.
Handler MUST:
- Abort operator.
- Return all unresolvable identifiers in the fault payload.
- Do not create a partial ticket with only resolved assignees. Notes: All assignees must be resolvable before the ticket is created. Partial assignment creates accountability gaps. Unresolvable identifiers must be registered in the operator registry by a substrate administrator.
INSUFFICIENT_RISK_DETAIL#
Code: EXE-015
Severity: ERROR
Recoverability: RECOVERABLE
Emitted by: incident.execute.flag_for_followup
Condition: followup_code == RISK_ACCEPTED and detail.length is below
MIN_RISK_ACCEPTANCE_DETAIL_LENGTH (default: 150 characters; see
operator_grammar.md Section 9).
State effect: None. No follow-up ticket is created.
Handler MUST:
- Abort operator.
- Return the minimum required length and the actual submitted length in the fault payload. Notes: Risk acceptance is a formal act with audit implications. The elevated minimum detail length ensures that risk acceptance decisions are substantively documented — not rubber