Production Ops

Production Operations Weekly Review

Jun 21 to Jun 28, 2026 / cadence: weekly
Overall statusGREEN
Degraded days0
RED components0
AMBER components0
Actions needing disposition0

Daily Production Health

No daily health artifacts were available for this period.

Latest daily health issues
  • No daily health issues listed on the latest collected day.

Daily Findings

No chartable points.

Reliability Movement

Customer visible failures0
0 current / 0 previous
Observed incidents0
0 current / 0 previous
Application alerts-105
91 current / 196 previous
Infrastructure alerts+18
69 current / 51 previous
Pingdom events0
0 current / 0 previous

Latency Trend

No chartable points.

Component Heatmap

No component trend data was available.

Production Dependency Map

ComponentStatusCurrent periodSourceReferencesDisposition ID
Customer Edge4 components
Pingdom public checksGREENNo degraded signal in collected evidencepingdomcomponent-pingdom-public-checks
DNS resolutionGREENNo degraded signal in collected evidencedns_issue_checkcomponent-dns-resolution
Ingress / TraefikGREENNo degraded signal in collected evidenceservice_5xx_rate_pct, service_5xx_rps, service_p95_topcomponent-ingress-traefik
Public 5xx and latencyGREENNo degraded signal in collected evidenceservice_5xx_rate_pct, latency_triage, latency_signalcomponent-public-5xx-and-latency
Application Runtime5 components
PHP-FPMGREENNo degraded signal in collected evidenceweb_probe_failures_5m, web_restarts_5m, latency_triagecomponent-php-fpm
API/app podsGREENNo degraded signal in collected evidenceweb_ready_pods, web_running_pods, web_unavailable_replicascomponent-api-app-pods
Worker podsGREENNo degraded signal in collected evidencetop_restarts_24h, top_memory_containers_24hcomponent-worker-pods
Runtime memory and restartsGREENNo degraded signal in collected evidencetop_memory_containers_24h, top_restarts_24hcomponent-runtime-memory-and-restarts
Slow request familiesGREENNo degraded signal in collected evidencelatency_triage, slow_tracescomponent-slow-request-families
Data Layer5 components
MySQL catalogGREENNo degraded signal in collected evidencerds:mysql-catalog, slowquery:mysql-catalogcomponent-mysql-catalog
MySQL catalog2GREENNo degraded signal in collected evidencerds:mysql-catalog2, slowquery:mysql-catalog2component-mysql-catalog2
MySQL masterGREENNo degraded signal in collected evidencerds:mysql-master, slowquery:mysql-mastercomponent-mysql-master
Postgres billingGREENNo degraded signal in collected evidencerds:postgres-billing, slowquery:postgres-billingcomponent-postgres-billing
Slow queriesGREENNo degraded signal in collected evidenceslowquerycomponent-slow-queries
Cache & Messaging3 components
Redis / SentinelGREENNo degraded signal in collected evidenceredis_issue_check, redis_evicted_keys_5m, redis_rejected_connections_5mcomponent-redis-sentinel
RabbitMQ / queuesGREENNo degraded signal in collected evidencerabbitmq, queue_depth, consumer_lagcomponent-rabbitmq-queues
Consumer lag or delayed processingGREENNo degraded signal in collected evidenceconsumer_lag, cron_activecomponent-consumer-lag-or-delayed-processing
Batch & Scheduled Work3 components
Watched CronJobsGREENNo degraded signal in collected evidencecron_activecomponent-watched-cronjobs
Kubernetes JobsGREENNo degraded signal in collected evidenceKubeJobFailed, job_failedcomponent-kubernetes-jobs
Long-running scheduled tasksGREENNo degraded signal in collected evidencecron_active, slow_tracescomponent-long-running-scheduled-tasks
Kubernetes & Capacity5 components
ReadinessGREENNo degraded signal in collected evidenceweb_ready_pods, web_unavailable_replicascomponent-readiness
Probe failuresGREENNo degraded signal in collected evidenceweb_probe_failures_5mcomponent-probe-failures
HPA saturationGREENNo degraded signal in collected evidencehpa_current, hpa_maxcomponent-hpa-saturation
Pod/node churnGREENNo degraded signal in collected evidencetop_restarts_24hcomponent-pod-node-churn
Top memory containersGREENNo degraded signal in collected evidencetop_memory_containers_24hcomponent-top-memory-containers
Observability & Alerting5 components
Grafana / PrometheusGREENNo degraded signal in collected evidenceprometheuscomponent-grafana-prometheus
LokiGREENNo degraded signal in collected evidenceloki, latency_triagecomponent-loki
TempoGREENNo degraded signal in collected evidenceslow_tracescomponent-tempo
Alert rule health/noiseGREENNo degraded signal in collected evidenceslack_alerts, alert_tablecomponent-alert-rule-health-noise
AWS alarm emailsGREENNo degraded signal in collected evidenceaws_email_alerts, email_tablecomponent-aws-alarm-emails
External Dependencies5 components
Email providerGREENNo degraded signal in collected evidenceemail_providercomponent-email-provider
SMS providerGREENNo degraded signal in collected evidencesms_providercomponent-sms-provider
Payment providerGREENNo degraded signal in collected evidencepayment_providercomponent-payment-provider
Identity/login providerGREENNo degraded signal in collected evidenceidentity_providercomponent-identity-login-provider
Webhooks / third-party APIsGREENNo degraded signal in collected evidencethird_party_apicomponent-webhooks-third-party-apis

Previous period

MetricCurrentPreviousDelta
active_aws_alarms02-2
application_alerts91196-105
customer_incidents_confirmed000
customer_incidents_observed000
customer_visible_failures000
impacted_services2035-15
infrastructure_alerts6951+18
pingdom_downtime_minutes000
pingdom_events000

ADS Action Queue

Missing dispositions: none

StatusActionDomainID

Source Coverage

SourceStatusDetail
Daily health JSONmissing0 daily artifact(s) found
Production reliability dashboardokWeekly alerts/reliability artifact
Team weekly reportokDelivery/deploy evidence
Engineering council test reportokTest/smoke evidence
AWS posture evidencewarningCost/security/recommendation evidence
Action registerwarningADS/accepted-risk/false-positive dispositions

Evidence References

Reliability

application alerts
91
infrastructure alerts
69
active aws alarms
0
degraded days
0
red components
0
amber components
0

Customer Impact

customer visible failures
0
customer incidents observed
0
customer incidents confirmed
0
pingdom events
0
pingdom downtime minutes
0

Delivery Health

production bugs closed
38
delivery items
42
deployments
12
test runs
7
test pass rate
0
smoke attempts
16
smoke failed
1

Cost

total
0
currency
USD
forecast
0

Security

security hub score
0
critical findings
0
high findings
0
guardduty findings
0
inspector critical
0
iam external access
0

Aws Recommendations

trusted advisor red
0
trusted advisor yellow
0
compute optimizer savings
0
cost optimization savings
0
well architected high risk issues
0
well architected medium risk issues
0

Backup

failed jobs
0
protected resources
0