Launch
Mar 31, 2026
umamimind.ai icon
Export pack

Incident communications pack (Pilot)

Printable view intended for “Save as PDF”. For machine-readable export use /api/exports/incident-comms.json.

Tip: Use your browser print dialog → Save as PDF.

Pilot-stage incident communications policy (summary)

  • Pilot participants receive direct notifications for SEV-1/2 incidents impacting availability, security, or data integrity.
  • Initial notice targets: within 60 minutes for SEV-1, within 4 hours for SEV-2 (best-effort during pilot).
  • Post-incident summary provided for material incidents; full RCA available under NDA where applicable.
  • Maintenance windows may be communicated with shorter lead times during pilot.

Templates

Initial incident notice (template)
  • Incident ID
  • Start time (UTC)
  • Current status
  • Affected services
  • Customer impact
  • Mitigation in progress
  • Next update ETA
  • Contact channel
Incident update (template)
  • Timestamp (UTC)
  • Status
  • What changed
  • Current impact
  • Mitigation progress
  • Next update ETA
Resolution notice (template)
  • Resolved time (UTC)
  • Duration
  • Root cause (summary)
  • Customer impact (summary)
  • Remediations
  • Follow-ups
Example payload
{
  "Incident ID": "INC-2026-002",
  "Start time (UTC)": "2026-01-12 09:40",
  "Current status": "Investigating",
  "Affected services": "Runs API, Console",
  "Customer impact": "Some run retrieval requests may time out.",
  "Mitigation in progress": "Scaling API workers; isolating cache hot keys.",
  "Next update ETA": "Within 60 minutes",
  "Contact channel": "pilot-support@umamimind.ai"
}

Incident reports

2026-01-08 · SEV-3 · Resolved

INC-2026-001: Runs API elevated errors (pilot)

Elevated error rates for run retrieval in one pilot tenant due to upstream cache invalidation.

Impact
Some requests returned stale run metadata or transient failures for ~17 minutes in one tenant.
Root cause
Invalidation job ran with an overly broad key prefix in pilot environment, impacting shared cache namespace.
Timeline (high level)
  • 09:12 UTCAlert triggered for elevated 5xx on run retrieval.
  • 09:18 UTCScoped to single tenant; cache invalidation job suspected.
  • 09:25 UTCMitigation: flush affected cache keys; traffic normalizes.
  • 09:40 UTCGuardrails added for invalidation prefix; additional monitoring enabled.
Remediations
  • Key-space isolation tightened per tenant.
  • Added safeguard to prevent broad invalidation without approval.
  • Added alerting on invalidation rate and cache churn.
Next steps
  • Finalize cache isolation as GA requirement (GA-001).
  • Document runbook for cache incidents.
NDA note: This report is a pilot-stage summary; detailed evidence may be shared under NDA via Trust Pack.
2025-12-19 · SEV-4 · Resolved

INC-2025-12-19: Console login redirect loop (pilot)

Intermittent login redirect loop affecting a subset of users with stale cookies.

Impact
Some users needed to clear cookies to sign in; no data loss.
Root cause
Client session cache did not invalidate correctly after config change, preserving stale state.
Timeline (high level)
  • 14:05 UTCPilot user reports redirect loop after config update.
  • 14:22 UTCIdentified stale cookie version mismatch.
  • 14:40 UTCFix deployed: cookie version bump + safe fallback redirect.
  • 15:10 UTCConfirmed resolution with affected users.
Remediations
  • Introduced cookie versioning.
  • Added client-side guard for loop detection.
Next steps
  • Add automated regression test for auth redirect flows.
  • Improve release notes distribution to pilots.
NDA note: This report is a pilot-stage summary; detailed evidence may be shared under NDA via Trust Pack.
PilotsDemoTour