Engineering Council Test Reliability Report

Scope aligned with Slack channel #dezvoltare, covering 2026-04-18 07:00 to 2026-04-25 07:00. Metrics and timings are sourced from GitLab pipelines, jobs, and test-report artifacts for the daily 6 PM regression suite and the production smoke suite. Trend charts use daily buckets across this window.

Executive Snapshot

7
Daily Runs
2/7
Daily Green
22m 06s
Avg Daily Runtime
19
Smoke Attempts
17/19
Smoke Green
2m 55s
Avg Smoke Runtime
3m 03s
Median Smoke Time
2
Current Green Streak

Executive Analysis

Bottom line: the regression system is informative but not calm. The data suggest repeatable problem areas rather than random breakage, which means focused ownership should move the needle quickly.

What Matters

  • Daily regression passed 2 of 7 runs (28.6%), with a current green streak of 2 and a best streak of 2 in this window.
  • Smoke passed 17 of 19 attempts (89.5%) across 14 production pipelines. 1 pipeline(s) recovered on rerun, which is useful for continuity but also a sign that first-pass deploy signal is noisier than it should be. 1 failed attempt(s) never reached test execution counts at all.
  • Failure concentration is not random: Web has the highest strict failure ratio at 0.27%, while Web has the broadest non-pass footprint at 0.27%.
  • Frontend is the weakest smoke surface in this window at 12/14 green (85.7%).
  • Daily-suite runtime averaged 22m 06s.

Engineering Analysis

  • A release gate should fail loudly for product regressions and quietly for infrastructure noise. Rerun recoveries plus incomplete daily or smoke attempts suggest those two failure modes are still partially mixed together.
  • The failure profile is concentrated enough to act on. Web and Web are carrying the strongest signal, which means reliability work should be assigned by category ownership instead of treating the suite as one undifferentiated problem.
  • The broader daily suite is carrying more instability than smoke, which usually means product regressions are escaping into wider coverage areas even when the narrow deploy gate looks acceptable.

Recommended Actions

  • Split incomplete execution failures from real assertion failures in the report narrative. Setup breakage should stay visible, but it should not look identical to a product regression in the executive readout.
  • Assign one owner to Web for the next cycle and expect a short written burn-down: top failing tests, suspected root causes, flake versus regression breakdown, and what gets fixed or quarantined first.
  • Treat the daily regression suite like an operations queue until it is calm again: triage failures after each red run, close known-noise items fast, and avoid letting multiple unrelated red signals pile up between runs.
  • Put Frontend smoke under closer guardrails for the next release cycle. It is the best place to improve first-pass deploy confidence quickly.

Improvement Ideas

  • Introduce a small reliability budget for tests: every flaky or quarantined case needs an owner and an expiry, and the team should review that budget weekly the same way it reviews bugs or incidents.
  • Track first-fail to root-cause time as a core metric. Fast diagnosis is as important as raw pass rate because the practical value of a test gate depends on how quickly it helps the team recover.
  • Define a runtime budget per suite and require justification when test count or duration grows. Reliable feedback systems stay trusted when they remain both stable and proportionate.

Category Execution Ratios

How computed

Category total executions means the sum of that category's observed test executions across every daily-suite run in the selected window.

Strict Failure Ratio = failed executions for that category divided by total executions for that category across the window.

Non-pass Ratio = (failed + pending + skipped) executions for that category divided by total executions for that category across the window.

Example: if Billing executed 800 times across the week and 2 of those executions failed, Billing strict failure ratio is 0.25%. That does not mean 0.25% of pipelines failed; it means 0.25% of observed Billing executions ended in failed.

How computed

Category total executions means the sum of that category's observed test executions across every daily-suite run in the selected window.

Strict Failure Ratio = failed executions for that category divided by total executions for that category across the window.

Non-pass Ratio = (failed + pending + skipped) executions for that category divided by total executions for that category across the window.

Example: if Billing executed 800 times across the week and 2 of those executions failed, Billing strict failure ratio is 0.25%. That does not mean 0.25% of pipelines failed; it means 0.25% of observed Billing executions ended in failed.

Daily Daily Suite Status0000104-1804-2004-2204-24
Daily Smoke Attempts0123504-1804-2004-2204-24
Daily Average Daily Suite Runtime21m 15s21m 36s21m 58s22m 19s22m 41s04-1804-2004-2204-24
Daily Average Smoke Runtime0m 00s0m 51s1m 41s2m 32s3m 23s04-1804-2004-2204-24
Daily Suite Total Test Growth (Recent 7 Runs)1244124412441244124504-1804-2004-2204-24
Smoke Suite Total Test Growth (Latest Run Per Day)
FrontendUniversity
0275582110Frontend 04-20: 110Frontend 04-21: 0Frontend 04-22: 110Frontend 04-23: 110Frontend 04-24: 110University 04-20: 60University 04-21: 60University 04-23: 60University 04-24: 6004-2004-2104-2204-2304-24

Category Aggregate Table

How computed

Category total executions means the sum of that category's observed test executions across every daily-suite run in the selected window.

Strict Failure Ratio = failed executions for that category divided by total executions for that category across the window.

Non-pass Ratio = (failed + pending + skipped) executions for that category divided by total executions for that category across the window.

Example: if Billing executed 800 times across the week and 2 of those executions failed, Billing strict failure ratio is 0.25%. That does not mean 0.25% of pipelines failed; it means 0.25% of observed Billing executions ended in failed.

How computed

Category total executions means the sum of that category's observed test executions across every daily-suite run in the selected window.

Strict Failure Ratio = failed executions for that category divided by total executions for that category across the window.

Non-pass Ratio = (failed + pending + skipped) executions for that category divided by total executions for that category across the window.

Example: if Billing executed 800 times across the week and 2 of those executions failed, Billing strict failure ratio is 0.25%. That does not mean 0.25% of pipelines failed; it means 0.25% of observed Billing executions ended in failed.

CategoryTotalFailedPendingSkippedFailure RatioNon-pass RatioRuns With Failures
Billing7562000.26%0.26%2
Web546015000.27%0.27%5
Frontend18905000.26%0.26%5
Library6020000.00%0.00%0
CatFailF%NP%Tot
Billing
Pend 0Skip 0Runs 2
2
0.26%
0.26%
756
Web
Pend 0Skip 0Runs 5
15
0.27%
0.27%
5460
Frontend
Pend 0Skip 0Runs 5
5
0.26%
0.26%
1890
Library
Pend 0Skip 0Runs 0
0
0.00%
0.00%
602

Recent Runs

Recent Daily Suite Runs

DatePipelineSuitesStatusSummary
2026-04-18 18:25152618BillingWebFrontendLibraryFAILEDTotal 1244 | Passed 1239 | Failed 5
2026-04-19 18:25152660BillingWebFrontendLibraryFAILEDTotal 1244 | Passed 1239 | Failed 5
2026-04-20 18:25152815BillingWebFrontendLibraryFAILEDTotal 1244 | Passed 1240 | Failed 4
2026-04-21 18:25152943BillingWebFrontendLibraryFAILEDTotal 1244 | Passed 1240 | Failed 4
2026-04-22 18:25153127BillingWebFrontendLibraryFAILEDTotal 1244 | Passed 1240 | Failed 4
2026-04-23 18:25153307BillingWebFrontendLibraryPASSEDTotal 1244 | Passed 1244 | Failed 0
2026-04-24 18:24153522BillingWebFrontendLibraryPASSEDTotal 1244 | Passed 1244 | Failed 0
2026-04-18 18:25Pipeline 152618BillingWebFrontendLibrary
FAILED
T 1244 | P 1239 | F 5 | Pend 0
2026-04-19 18:25Pipeline 152660BillingWebFrontendLibrary
FAILED
T 1244 | P 1239 | F 5 | Pend 0
2026-04-20 18:25Pipeline 152815BillingWebFrontendLibrary
FAILED
T 1244 | P 1240 | F 4 | Pend 0
2026-04-21 18:25Pipeline 152943BillingWebFrontendLibrary
FAILED
T 1244 | P 1240 | F 4 | Pend 0
2026-04-22 18:25Pipeline 153127BillingWebFrontendLibrary
FAILED
T 1244 | P 1240 | F 4 | Pend 0
2026-04-23 18:25Pipeline 153307BillingWebFrontendLibrary
PASSED
T 1244 | P 1244 | F 0 | Pend 0
2026-04-24 18:24Pipeline 153522BillingWebFrontendLibrary
PASSED
T 1244 | P 1244 | F 0 | Pend 0

Recent Smoke Attempts

DateSuitePipelineJobStatusPassedFailedDuration
2026-04-20 13:44Frontend152756Frontend smokeFAILED10913m 51s
2026-04-20 14:17Frontend152769Frontend smokePASSED11003m 10s
2026-04-20 16:27Frontend152805Frontend smokePASSED11003m 10s
2026-04-20 17:10University152805University smokePASSED6002m 34s
2026-04-20 18:27Frontend152817Frontend smokePASSED11003m 22s
2026-04-21 14:08Frontend152911Frontend smokeFAILEDn/an/a0m 02s
2026-04-21 14:15University152911University smokePASSED6002m 50s
2026-04-22 12:12Frontend153013Frontend smokePASSED11003m 43s
2026-04-22 16:54Frontend153119Frontend smokePASSED11003m 03s
2026-04-23 15:06University153256University smokePASSED6002m 51s
2026-04-23 15:08Frontend153256Frontend smokePASSED11003m 43s
2026-04-23 17:40Frontend153296Frontend smokePASSED11003m 06s
2026-04-23 17:46University153296University smokePASSED6002m 14s
2026-04-23 18:01Frontend153305Frontend smokePASSED11003m 02s
2026-04-24 13:04Frontend153427Frontend smokePASSED11003m 16s
2026-04-24 13:20Frontend153452Frontend smokePASSED11002m 59s
2026-04-24 14:51Frontend153482Frontend smokePASSED11003m 21s
2026-04-24 16:17University153512University smokePASSED6002m 07s
2026-04-24 16:21Frontend153512Frontend smokePASSED11002m 59s

Smoke Suite Breakdown

Frontend
14 attempts across 14 pipelines
86% green
Passed12
Failed2
Incomplete1
Avg runtime3m 03s
Median passing runtime3m 10s
Pipelines14
University
5 attempts across 5 pipelines
100% green
Passed5
Failed0
Incomplete0
Avg runtime2m 31s
Median passing runtime2m 34s
Pipelines5
Generated from GitLab project adservio/helm2. Times are shown in Europe/Bucharest. Daily-suite runtime is measured from GitLab pipeline and job timestamps. Category counts come from GitLab test-report JSON artifacts, with job-trace fallback when older artifacts have expired.