Engineering Council Test Reliability Report

Scope aligned with Slack channel #dezvoltare, covering 2026-04-25 07:00 to 2026-05-05 12:00. Metrics and timings are sourced from GitLab pipelines, jobs, and test-report artifacts for the daily 6 PM regression suite and the production smoke suite. Trend charts use daily buckets across this window.

Executive Snapshot

10
Daily Runs
6/10
Daily Green
50m 32s
Avg Daily Runtime
17
Smoke Attempts
10/17
Smoke Green
3m 18s
Avg Smoke Runtime
3m 10s
Median Smoke Time
1
Current Green Streak

Executive Analysis

Bottom line: release confidence is unstable in both the broad regression path and the deploy smoke path. The immediate job is to separate real product regressions from execution noise, then burn down the concentrated failure clusters.

What Matters

  • Daily regression passed 6 of 10 runs (60.0%), with a current green streak of 1 and a best streak of 3 in this window.
  • Smoke passed 10 of 17 attempts (58.8%) across 13 production pipelines.
  • Failure concentration is not random: Frontend has the highest strict failure ratio at 0.22%, while Frontend has the broadest non-pass footprint at 0.22%.
  • University is the weakest smoke surface in this window at 1/4 green (25.0%).
  • Daily-suite runtime averaged 50m 32s.

Engineering Analysis

  • The failure profile is concentrated enough to act on. Frontend and Frontend are carrying the strongest signal, which means reliability work should be assigned by category ownership instead of treating the suite as one undifferentiated problem.
  • Smoke is lagging the broader regression suite, so deploy readiness is probably being constrained more by environment/setup stability and narrow critical-path checks than by overall test volume.
  • The daily suite is now large enough that runtime itself is becoming a management variable at 50m 32s average duration. At that size, every additional flaky or redundant test has a measurable cost on feedback speed.

Recommended Actions

  • Assign one owner to Frontend for the next cycle and expect a short written burn-down: top failing tests, suspected root causes, flake versus regression breakdown, and what gets fixed or quarantined first.
  • Treat the daily regression suite like an operations queue until it is calm again: triage failures after each red run, close known-noise items fast, and avoid letting multiple unrelated red signals pile up between runs.
  • Put University smoke under closer guardrails for the next release cycle. It is the best place to improve first-pass deploy confidence quickly.

Improvement Ideas

  • Introduce a small reliability budget for tests: every flaky or quarantined case needs an owner and an expiry, and the team should review that budget weekly the same way it reviews bugs or incidents.
  • Track first-fail to root-cause time as a core metric. Fast diagnosis is as important as raw pass rate because the practical value of a test gate depends on how quickly it helps the team recover.
  • Define a runtime budget per suite and require justification when test count or duration grows. Reliable feedback systems stay trusted when they remain both stable and proportionate.

Category Execution Ratios

How computed

Category total executions means the sum of that category's observed test executions across every daily-suite run in the selected window.

Strict Failure Ratio = failed executions for that category divided by total executions for that category across the window.

Non-pass Ratio = (failed + pending + skipped) executions for that category divided by total executions for that category across the window.

Example: if Billing executed 800 times across the week and 2 of those executions failed, Billing strict failure ratio is 0.25%. That does not mean 0.25% of pipelines failed; it means 0.25% of observed Billing executions ended in failed.

How computed

Category total executions means the sum of that category's observed test executions across every daily-suite run in the selected window.

Strict Failure Ratio = failed executions for that category divided by total executions for that category across the window.

Non-pass Ratio = (failed + pending + skipped) executions for that category divided by total executions for that category across the window.

Example: if Billing executed 800 times across the week and 2 of those executions failed, Billing strict failure ratio is 0.25%. That does not mean 0.25% of pipelines failed; it means 0.25% of observed Billing executions ended in failed.

Daily Daily Suite Status0000104-2504-2704-2905-0105-0305-04
Daily Smoke Attempts0246904-2504-2704-2905-0105-0305-04
Daily Average Daily Suite Runtime22m 04s91m 59s161m 54s231m 48s301m 43s04-2504-2704-2905-0105-0305-04
Daily Average Smoke Runtime0m 00s0m 54s1m 48s2m 43s3m 37s04-2504-2704-2905-0105-0305-04
Daily Suite Total Test Growth (Recent 10 Runs)1244124412441244124504-2504-2704-2905-0105-0305-04
Smoke Suite Total Test Growth (Latest Run Per Day)
FrontendUniversity
60728597110Frontend 04-27: 110Frontend 04-28: 110Frontend 04-29: 110Frontend 04-30: 110University 04-28: 6004-2704-2804-2904-30

Category Aggregate Table

How computed

Category total executions means the sum of that category's observed test executions across every daily-suite run in the selected window.

Strict Failure Ratio = failed executions for that category divided by total executions for that category across the window.

Non-pass Ratio = (failed + pending + skipped) executions for that category divided by total executions for that category across the window.

Example: if Billing executed 800 times across the week and 2 of those executions failed, Billing strict failure ratio is 0.25%. That does not mean 0.25% of pipelines failed; it means 0.25% of observed Billing executions ended in failed.

How computed

Category total executions means the sum of that category's observed test executions across every daily-suite run in the selected window.

Strict Failure Ratio = failed executions for that category divided by total executions for that category across the window.

Non-pass Ratio = (failed + pending + skipped) executions for that category divided by total executions for that category across the window.

Example: if Billing executed 800 times across the week and 2 of those executions failed, Billing strict failure ratio is 0.25%. That does not mean 0.25% of pipelines failed; it means 0.25% of observed Billing executions ended in failed.

CategoryTotalFailedPendingSkippedFailure RatioNon-pass RatioRuns With Failures
Billing10800000.00%0.00%0
Web78001000.01%0.01%1
Frontend27006000.22%0.22%4
Library8600000.00%0.00%0
CatFailF%NP%Tot
Billing
Pend 0Skip 0Runs 0
0
0.00%
0.00%
1080
Web
Pend 0Skip 0Runs 1
1
0.01%
0.01%
7800
Frontend
Pend 0Skip 0Runs 4
6
0.22%
0.22%
2700
Library
Pend 0Skip 0Runs 0
0
0.00%
0.00%
860

Recent Runs

Recent Daily Suite Runs

DatePipelineSuitesStatusSummary
2026-04-27 18:25153750BillingWebFrontendLibraryPASSEDTotal 1244 | Passed 1244 | Failed 0
2026-04-28 18:26153968BillingWebFrontendLibraryFAILEDTotal 1244 | Passed 1240 | Failed 4
2026-04-29 18:25154112BillingWebFrontendLibraryFAILEDTotal 1244 | Passed 1243 | Failed 1
2026-04-30 23:05154297BillingWebFrontendLibraryFAILEDTotal 1244 | Passed 1243 | Failed 1
2026-05-01 18:25154317BillingWebFrontendLibraryPASSEDTotal 1244 | Passed 1244 | Failed 0
2026-05-02 18:25154322BillingWebFrontendLibraryPASSEDTotal 1244 | Passed 1244 | Failed 0
2026-05-03 18:25154324BillingWebFrontendLibraryFAILEDTotal 1244 | Passed 1243 | Failed 1
2026-05-04 18:26154462BillingWebFrontendLibraryPASSEDTotal 1244 | Passed 1244 | Failed 0
2026-04-27 18:25Pipeline 153750BillingWebFrontendLibrary
PASSED
T 1244 | P 1244 | F 0 | Pend 0
2026-04-28 18:26Pipeline 153968BillingWebFrontendLibrary
FAILED
T 1244 | P 1240 | F 4 | Pend 0
2026-04-29 18:25Pipeline 154112BillingWebFrontendLibrary
FAILED
T 1244 | P 1243 | F 1 | Pend 0
2026-04-30 23:05Pipeline 154297BillingWebFrontendLibrary
FAILED
T 1244 | P 1243 | F 1 | Pend 0
2026-05-01 18:25Pipeline 154317BillingWebFrontendLibrary
PASSED
T 1244 | P 1244 | F 0 | Pend 0
2026-05-02 18:25Pipeline 154322BillingWebFrontendLibrary
PASSED
T 1244 | P 1244 | F 0 | Pend 0
2026-05-03 18:25Pipeline 154324BillingWebFrontendLibrary
FAILED
T 1244 | P 1243 | F 1 | Pend 0
2026-05-04 18:26Pipeline 154462BillingWebFrontendLibrary
PASSED
T 1244 | P 1244 | F 0 | Pend 0

Recent Smoke Attempts

DateSuitePipelineJobStatusPassedFailedDuration
2026-04-27 11:26Frontend153582Frontend smokePASSED11003m 12s
2026-04-27 15:29Frontend153624Frontend smokePASSED11003m 09s
2026-04-28 14:10University153904University smokePASSED6002m 21s
2026-04-28 14:14Frontend153904Frontend smokePASSED11003m 00s
2026-04-28 14:36University153915University smokeFAILED5912m 29s
2026-04-28 14:39University153915University smokeFAILED5912m 58s
2026-04-28 14:40Frontend153915Frontend smokeFAILED10824m 09s
2026-04-28 15:47Frontend153940Frontend smokeFAILED10823m 46s
2026-04-28 18:27Frontend153967Frontend smokeFAILED10824m 04s
2026-04-28 18:40University153970University smokeFAILED5912m 56s
2026-04-28 18:45Frontend153970Frontend smokeFAILED10823m 57s
2026-04-29 13:02Frontend154041Frontend smokePASSED11003m 37s
2026-04-30 06:47Frontend154117Frontend smokePASSED11003m 27s
2026-04-30 14:43Frontend154215Frontend smokePASSED11003m 34s
2026-04-30 15:59Frontend154270Frontend smokePASSED11003m 09s
2026-04-30 16:14Frontend154276Frontend smokePASSED11003m 20s
2026-04-30 17:10Frontend154296Frontend smokePASSED11003m 04s

Smoke Suite Breakdown

Frontend
13 attempts across 13 pipelines
69% green
Passed9
Failed4
Incomplete0
Avg runtime3m 30s
Median passing runtime3m 12s
Pipelines13
University
4 attempts across 3 pipelines
25% green
Passed1
Failed3
Incomplete0
Avg runtime2m 41s
Median passing runtime2m 21s
Pipelines3
Generated from GitLab project adservio/helm2. Times are shown in Europe/Bucharest. Daily-suite runtime is measured from GitLab pipeline and job timestamps. Category counts come from GitLab test-report JSON artifacts, with job-trace fallback when older artifacts have expired.