Pingdom customer impact
External signal1 active item(s) in this window.
Generated 2026-05-05 01:00 for 2026-04-25 07:00 to 2026-05-05 12:00 from Pingdom checks, Slack #_alerts_prod, and AWS SNS alerts.
Bottom line: Pingdom observed recent customer-facing glitches (email unconfirmed) and application-level critical paths are present.
1 active item(s) in this window.
5 critical, 9 non-critical active item(s).
1 active item(s) in this window.
| Pingdom Check | Status | Events | Downtime | Last Seen | Likely Services | Correlated Evidence |
|---|---|---|---|---|---|---|
| https://www.adservio.ro/api/v2/status | Recovered recently | 9 | 9m | 2026-05-05 00:00 | adservio-ro-api-v2-status | adservio-rds-mysql-catalog-disk-queue-high |
| Adservio Ro | No recent customer-visible issue | 0 | 0m | 2026-05-05 00:00 | adservio-ro | Pingdom-only evidence so far |
Pingdom rows show externally visible signal first. The correlated evidence column helps tie the failing check back to services, Slack alert families, or AWS alarms when those links exist.
This view attributes alerts to the workload or resource named in the alert text. Grafana, Loki, and Tempo are treated as observability components and are excluded when a more specific impacted target is also present.
| Impacted Service / Resource | Highest Severity | Count | Last Seen | Status | Top Alert Types | Discussion Signal | Latest Thread Note |
|---|---|---|---|---|---|---|---|
| accommodations-api | Critical | 10 | 2026-05-04 15:06 | Seen today | TraefikServiceHighErrorRate (10) | None | No thread note |
| uni-api | Critical | 7 | 2026-05-04 12:07 | Seen today | TraefikServiceHighErrorRate (7) | General investigationObservability storage | uni-api is throwing 500s because the same academic-structure save is being submitted twice second insert hits a unique-key constraint (stud… |
| web-80 | Critical | 10 | 2026-05-01 10:45 | Seen this week | TraefikServiceHighErrorRate (8)TraefikServiceHighLatency (2) | None | No thread note |
| grafana | Critical | 27 | 2026-05-04 13:12 | Recent but likely noise | KubeAPIErrorBudgetBurn (4)KubeCPUOvercommit (20)NodeSystemSaturation (1)KubeMemoryOvercommit (1)KubeClientErrors (1) | General investigationAlert tuning / noiseResource limits | Node 10.66.121.122 is at 79% CPU / 50% memory, no pressure conditions load is 2.27 because 5 of the web pods landed on the same node plus s… | Will clear when traffic dies down or HPA rebalances | Stabilized load on 10.66.121.122 dropped from 2.27 to 1.68 per core (15m avg), CPU from 79% to 55%. HPA scaled down + web… |
| core-grafana-80 | Critical | 1 | 2026-04-25 20:57 | No recent signal | TraefikServiceHighErrorRate (1) | None | No thread note |
| metrics-server | Warning | 43 | 2026-05-04 18:19 | Seen today | KubeAggregatedAPIDown (43) | General investigation | False alarm metrics-server pod is healthy (1/1 Running, 0 restarts) and kubectl top nodes works. Alert was a brief flap during a node rotat… |
| minicrm-sync | Warning | 23 | 2026-05-04 19:21 | Seen today | KubeJobFailed (23) | None | No thread note |
| colecteaza-sms-note-abs | Warning | 21 | 2026-05-04 19:21 | Seen today | KubeJobFailed (21) | None | No thread note |
| accommodations-sync-users Grouped 2 variantsVariant mentions 19Active variants 2 | Warning | 19 | 2026-05-04 19:21 | Seen today | KubeJobFailed (19) | None | No thread note |
| social-api | Warning | 10 | 2026-05-04 12:39 | Seen today | TraefikServiceHighLatency (10) | General investigation | Two outbound calls to web dominated: GET /api/v2/utilizatori/profile?cuPermisiuni=1&withRoles=1 (6.6s) and GET /api/v2/ani/scolari (6.4s) —… |
| docgen2-api | Warning | 5 | 2026-05-04 12:17 | Seen today | KubeHpaMaxedOut (5) | Resource limits | HPA already scaled back down (now 4/6). Pods are barely using CPU, so the scale-up was driven by a KEDA cron trigger, not real load. |
| rooms-api | Warning | 26 | 2026-05-03 07:14 | Recent (72h) | KubePodCrashLooping (13)KubeDeploymentReplicasMismatch (13) | General investigation | danny Valentin Pal something is breaking on the Tuiasi cluster which i am not able to connect | | i would need a kubeconfig for tuiasi and… |
| ai-api | Warning | 7 | 2026-05-04 10:00 | Recent (72h) | TraefikServiceHighLatency (7) | None | No thread note |
| core-scheduled-events-worker | Warning | 5 | 2026-05-01 10:19 | Seen this week | KubePodCrashLooping (3)KubeDeploymentReplicasMismatch (2) | General investigation | danny Valentin Pal something is breaking on the Tuiasi cluster which i am not able to connect | | i would need a kubeconfig for tuiasi and… |
| subscriptions-api | Warning | 2 | 2026-04-28 09:11 | No recent signal | TraefikServiceHighLatency (2) | None | No thread note |
| Alert | Severity | Count | Last Seen | Status | Threads | Top Impacted Services | Discussion Signal | Latest Thread Note |
|---|---|---|---|---|---|---|---|---|
| TraefikServiceHighErrorRate | Critical | 26 | 2026-05-04 15:06 | Seen today | 2 | accommodations-api (10)web-80 (8)uni-api (7)core-grafana-80 (1) | General investigationObservability storage | uni-api is throwing 500s because the same academic-structure save is being submitted twice second insert hits a unique-key constraint (stud… |
| KubeAPIErrorBudgetBurn | Critical | 3 | 2026-04-27 12:48 | Likely noise / resolved | 2 | grafana (4) | General investigationAlert tuning / noise | Nothing is down apiserver healthy (/readyz ok), all pods Running, HPAs scaled back down (web 9, docgen2 4). Alert is the rolling 1h SLO win… | you can silence it in alertmanger | Okay i will try a bit later thanks for the info |
| KubeAggregatedAPIDown | Warning | 43 | 2026-05-04 18:19 | Seen today | 1 | metrics-server (43) | General investigation | False alarm metrics-server pod is healthy (1/1 Running, 0 restarts) and kubectl top nodes works. Alert was a brief flap during a node rotat… |
| KubeJobFailed | Warning | 23 | 2026-05-04 19:21 | Seen today | 0 | minicrm-sync (23)colecteaza-sms-note-abs (21)accommodations-sync-users (19) | None | |
| KubeCPUOvercommit | Warning | 20 | 2026-05-04 13:12 | Seen today | 0 | grafana (20) | None | |
| TraefikServiceHighLatency | Warning | 18 | 2026-05-04 12:39 | Seen today | 1 | social-api (10)ai-api (7)subscriptions-api (2)web-80 (2) | General investigation | Two outbound calls to web dominated: GET /api/v2/utilizatori/profile?cuPermisiuni=1&withRoles=1 (6.6s) and GET /api/v2/ani/scolari (6.4s) —… |
| KubeHpaMaxedOut | Warning | 5 | 2026-05-04 12:17 | Seen today | 1 | docgen2-api (5) | Resource limits | HPA already scaled back down (now 4/6). Pods are barely using CPU, so the scale-up was driven by a KEDA cron trigger, not real load. |
| KubePodCrashLooping | Warning | 14 | 2026-05-03 07:14 | Recent (72h) | 1 | rooms-api (13)core-scheduled-events-worker (3) | General investigation | danny Valentin Pal something is breaking on the Tuiasi cluster which i am not able to connect | | i would need a kubeconfig for tuiasi and… |
| KubeDeploymentReplicasMismatch | Warning | 13 | 2026-05-03 07:09 | Recent (72h) | 0 | rooms-api (13)core-scheduled-events-worker (2) | None | |
| NodeSystemSaturation | Warning | 1 | 2026-04-30 12:30 | Seen this week | 1 | grafana (1) | Resource limits | Node 10.66.121.122 is at 79% CPU / 50% memory, no pressure conditions load is 2.27 because 5 of the web pods landed on the same node plus s… | Will clear when traffic dies down or HPA rebalances | Stabilized load on 10.66.121.122 dropped from 2.27 to 1.68 per core (15m avg), CPU from 79% to 55%. HPA scaled down + web… |
| KubeMemoryOvercommit | Warning | 1 | 2026-04-29 09:26 | Seen this week | 0 | grafana (1) | None | |
| KubeAPIErrorBudgetBurn | Warning | 1 | 2026-04-27 15:37 | Likely noise / resolved | 2 | grafana (4) | General investigationAlert tuning / noise | Nothing is down apiserver healthy (/readyz ok), all pods Running, HPAs scaled back down (web 9, docgen2 4). Alert is the rolling 1h SLO win… | you can silence it in alertmanger | Okay i will try a bit later thanks for the info |
| KubeClientErrors | Warning | 1 | 2026-04-27 12:09 | No recent signal | 1 | grafana (1) | General investigation | aiserver is healthy (/readyz ok). The 2% error rate is from admission webhook timeouts during the HPA scaling burst (web 11 replicas, docge… |
Status is heuristic. Slack rarely posts explicit resolutions, so “Seen today” or “Recent” means the alert family still appeared in production recently, not that it is definitely unresolved.
| AWS Alarm | Emails | ALARM | OK | State Flips | First Seen | Last Seen | Latest State | Status |
|---|---|---|---|---|---|---|---|---|
| adservio-rds-mysql-catalog-disk-queue-high | 14 | 7 | 7 | 13 | 2026-04-27 09:13 | 2026-04-30 10:21 | OK | Flapping, latest OK |
| adservio-rds-postgres-billing-cpu-high | 2 | 1 | 1 | 1 | 2026-05-01 10:12 | 2026-05-01 10:13 | OK | Latest OK |
“Flapping, latest OK” means the most recent email was an OK, but the alarm toggled repeatedly and is still a reliability concern.
| Thread Date | Alert | Severity | Services | Signal | Key Notes |
|---|---|---|---|---|---|
| 2026-05-04 12:39 | TraefikServiceHighLatency | Warning | social-api | General investigation | Two outbound calls to web dominated: GET /api/v2/utilizatori/profile?cuPermisiuni=1&withRoles=1 (6.6s) and GET /api/v2/ani/scolari (6.4s) —… |
| 2026-05-04 11:28 | TraefikServiceHighErrorRate | Critical | uni-api | Observability storage | uni-api is throwing 500s because the same academic-structure save is being submitted twice second insert hits a unique-key constraint (stud… |
| 2026-05-01 06:14 | KubePodCrashLooping | Warning | core-scheduled-events-worker, rooms-api | General investigation | danny Valentin Pal something is breaking on the Tuiasi cluster which i am not able to connect | | i would need a kubeconfig for tuiasi and… |
| 2026-04-30 12:30 | NodeSystemSaturation | Warning | grafana | Resource limits | Node 10.66.121.122 is at 79% CPU / 50% memory, no pressure conditions load is 2.27 because 5 of the web pods landed on the same node plus s… | Will clear when traffic dies down or HPA rebalances | Stabilized load on 10.66.121.122 dropped from 2.27 to 1.68 per core (15m avg), CPU from 79% to 55%. HPA scaled down + web… |
| 2026-04-30 11:11 | TraefikServiceHighErrorRate | Critical | uni-api | General investigation | No infra issue uni-api pod is healthy (1/1 Running, 0 restarts). The 6.67% errors are MySQL duplicate-key exceptions in GroupServiceImpl.sa… | Andrei Alexandru - can you please check? | | JDBC exception executing SQL [INSERT INTO studiu(...) VALUES(...,1511,1519)] |
| 2026-04-27 12:48 | KubeAPIErrorBudgetBurn | Critical | grafana | Alert tuning / noise | Nothing is down apiserver healthy (/readyz ok), all pods Running, HPAs scaled back down (web 9, docgen2 4). Alert is the rolling 1h SLO win… | you can silence it in alertmanger | Okay i will try a bit later thanks for the info |
| 2026-04-27 12:09 | KubeClientErrors | Warning | grafana | General investigation | aiserver is healthy (/readyz ok). The 2% error rate is from admission webhook timeouts during the HPA scaling burst (web 11 replicas, docge… |
| 2026-04-27 11:56 | KubeHpaMaxedOut | Warning | docgen2-api | Resource limits | HPA already scaled back down (now 4/6). Pods are barely using CPU, so the scale-up was driven by a KEDA cron trigger, not real load. |
| 2026-04-27 11:53 | KubeAPIErrorBudgetBurn | Critical | grafana | General investigation | apiserver is healthy now. The alert was triggered by a brief spike in pod status updates ~15 min ago when several web pods went NotReady at… |
| 2026-04-27 09:59 | KubeAggregatedAPIDown | Warning | metrics-server | General investigation | False alarm metrics-server pod is healthy (1/1 Running, 0 restarts) and kubectl top nodes works. Alert was a brief flap during a node rotat… |