Engineering Council Test Reliability Report

Scope aligned with Slack channel #dezvoltare, covering 2026-04-04 07:00 to 2026-04-11 07:00. Metrics and timings are sourced from GitLab pipelines, jobs, and test-report artifacts for the daily 6 PM regression suite and the production smoke suite. Trend charts use daily buckets across this window.

Executive Snapshot

7
Daily Runs
2/7
Daily Green
27m 24s
Avg Daily Runtime
18
Smoke Attempts
17/18
Smoke Green
3m 13s
Avg Smoke Runtime
3m 12s
Median Smoke Time
0
Current Green Streak

Executive Analysis

Bottom line: the regression system is informative but not calm. The data suggest repeatable problem areas rather than random breakage, which means focused ownership should move the needle quickly.

What Matters

  • Daily regression passed 2 of 7 runs (28.6%), with a current green streak of 0 and a best streak of 2 in this window. The latest daily run (152052) failed, so the system is ending the week under tension rather than in a clean state.
  • Smoke passed 17 of 18 attempts (94.4%) across 16 production pipelines. 1 failed attempt(s) never reached test execution counts at all.
  • Failure concentration is not random: Frontend has the highest strict failure ratio at 0.27%, while Frontend has the broadest non-pass footprint at 0.27%.
  • Frontend is the weakest smoke surface in this window at 15/16 green (93.8%).
  • Daily-suite runtime averaged 27m 24s, while observed daily test volume moved from 1,233 to 1,244.

Engineering Analysis

  • A release gate should fail loudly for product regressions and quietly for infrastructure noise. Rerun recoveries and incomplete smoke attempts suggest those two failure modes are still partially mixed together.
  • The failure profile is concentrated enough to act on. Frontend and Frontend are carrying the strongest signal, which means reliability work should be assigned by category ownership instead of treating the suite as one undifferentiated problem.
  • The broader daily suite is carrying more instability than smoke, which usually means product regressions are escaping into wider coverage areas even when the narrow deploy gate looks acceptable.
  • The daily suite is now large enough that runtime itself is becoming a management variable at 27m 24s average duration. At that size, every additional flaky or redundant test has a measurable cost on feedback speed.

Recommended Actions

  • Split smoke failures into two explicit classes: product regressions versus execution/setup failures. Incomplete attempts should never look identical to real assertion failures in the executive readout.
  • Assign one owner to Frontend for the next cycle and expect a short written burn-down: top failing tests, suspected root causes, flake versus regression breakdown, and what gets fixed or quarantined first.
  • Treat the daily regression suite like an operations queue until it is calm again: triage failures after each red run, close known-noise items fast, and avoid letting multiple unrelated red signals pile up between runs.
  • Put Frontend smoke under closer guardrails for the next release cycle. It is the best place to improve first-pass deploy confidence quickly.

Improvement Ideas

  • Introduce a small reliability budget for tests: every flaky or quarantined case needs an owner and an expiry, and the team should review that budget weekly the same way it reviews bugs or incidents.
  • Track first-fail to root-cause time as a core metric. Fast diagnosis is as important as raw pass rate because the practical value of a test gate depends on how quickly it helps the team recover.
  • Define a runtime budget per suite and require justification when test count or duration grows. Reliable feedback systems stay trusted when they remain both stable and proportionate.

Category Execution Ratios

How computed

Category total executions means the sum of that category's observed test executions across every daily-suite run in the selected window.

Strict Failure Ratio = failed executions for that category divided by total executions for that category across the window.

Non-pass Ratio = (failed + pending + skipped) executions for that category divided by total executions for that category across the window.

Example: if Billing executed 800 times across the week and 2 of those executions failed, Billing strict failure ratio is 0.25%. That does not mean 0.25% of pipelines failed; it means 0.25% of observed Billing executions ended in failed.

How computed

Category total executions means the sum of that category's observed test executions across every daily-suite run in the selected window.

Strict Failure Ratio = failed executions for that category divided by total executions for that category across the window.

Non-pass Ratio = (failed + pending + skipped) executions for that category divided by total executions for that category across the window.

Example: if Billing executed 800 times across the week and 2 of those executions failed, Billing strict failure ratio is 0.25%. That does not mean 0.25% of pipelines failed; it means 0.25% of observed Billing executions ended in failed.

Daily Daily Suite Status0000104-0404-0604-0804-10
Daily Smoke Attempts0134604-0404-0604-0804-10
Daily Average Daily Suite Runtime20m 38s30m 37s40m 36s50m 35s60m 34s04-0404-0604-0804-10
Daily Average Smoke Runtime0m 00s0m 52s1m 44s2m 36s3m 28s04-0404-0604-0804-10
Daily Suite Total Test Growth (Recent 7 Runs)4686628561050124404-0404-0604-0804-10
Smoke Suite Total Test Growth (Latest Run Per Day)
FrontendUniversity
10356085110Frontend 04-07: 110Frontend 04-08: 110Frontend 04-09: 110Frontend 04-10: 110University 04-07: 1004-0704-0804-0904-10

Category Aggregate Table

How computed

Category total executions means the sum of that category's observed test executions across every daily-suite run in the selected window.

Strict Failure Ratio = failed executions for that category divided by total executions for that category across the window.

Non-pass Ratio = (failed + pending + skipped) executions for that category divided by total executions for that category across the window.

Example: if Billing executed 800 times across the week and 2 of those executions failed, Billing strict failure ratio is 0.25%. That does not mean 0.25% of pipelines failed; it means 0.25% of observed Billing executions ended in failed.

How computed

Category total executions means the sum of that category's observed test executions across every daily-suite run in the selected window.

Strict Failure Ratio = failed executions for that category divided by total executions for that category across the window.

Non-pass Ratio = (failed + pending + skipped) executions for that category divided by total executions for that category across the window.

Example: if Billing executed 800 times across the week and 2 of those executions failed, Billing strict failure ratio is 0.25%. That does not mean 0.25% of pipelines failed; it means 0.25% of observed Billing executions ended in failed.

CategoryTotalFailedPendingSkippedFailure RatioNon-pass RatioRuns With Failures
Billing7562000.26%0.26%1
Web46847020.15%0.19%3
Frontend18405000.27%0.27%2
Library6020000.00%0.00%0
CatFailF%NP%Tot
Billing
Pend 0Skip 0Runs 1
2
0.26%
0.26%
756
Web
Pend 0Skip 2Runs 3
7
0.15%
0.19%
4684
Frontend
Pend 0Skip 0Runs 2
5
0.27%
0.27%
1840
Library
Pend 0Skip 0Runs 0
0
0.00%
0.00%
602

Recent Runs

Recent Daily Suite Runs

DatePipelineSuitesStatusSummary
2026-04-04 18:24151342BillingWebFrontendLibraryFAILEDTotal 1233 | Passed 1230 | Failed 3
2026-04-05 18:24151350BillingWebFrontendLibraryPASSEDTotal 1233 | Passed 1233 | Failed 0
2026-04-06 18:24151439BillingWebFrontendLibraryPASSEDTotal 1233 | Passed 1233 | Failed 0
2026-04-07 18:23151598BillingWebFrontendLibraryFAILEDTotal 1233 | Passed 1230 | Failed 3
2026-04-08 18:33151799BillingWebFrontendLibraryFAILEDTotal 1238 | Passed 1234 | Failed 4
2026-04-09 19:03152037BillingWebFrontendLibraryFAILEDTotal 468 | Passed 465 | Failed 1
2026-04-10 18:25152052BillingWebFrontendLibraryFAILEDTotal 1244 | Passed 1241 | Failed 3
2026-04-04 18:24Pipeline 151342BillingWebFrontendLibrary
FAILED
T 1233 | P 1230 | F 3 | Pend 0
2026-04-05 18:24Pipeline 151350BillingWebFrontendLibrary
PASSED
T 1233 | P 1233 | F 0 | Pend 0
2026-04-06 18:24Pipeline 151439BillingWebFrontendLibrary
PASSED
T 1233 | P 1233 | F 0 | Pend 0
2026-04-07 18:23Pipeline 151598BillingWebFrontendLibrary
FAILED
T 1233 | P 1230 | F 3 | Pend 0
2026-04-08 18:33Pipeline 151799BillingWebFrontendLibrary
FAILED
T 1238 | P 1234 | F 4 | Pend 0
2026-04-09 19:03Pipeline 152037BillingWebFrontendLibrary
FAILED
T 468 | P 465 | F 1 | Pend 0
2026-04-10 18:25Pipeline 152052BillingWebFrontendLibrary
FAILED
T 1244 | P 1241 | F 3 | Pend 0

Recent Smoke Attempts

DateSuitePipelineJobStatusPassedFailedDuration
2026-04-07 14:23University151482University smokePASSED1005m 10s
2026-04-07 15:39Frontend151482Frontend smokePASSED11003m 01s
2026-04-07 17:04University151576University smokePASSED1002m 50s
2026-04-07 17:07Frontend151576Frontend smokePASSED11003m 08s
2026-04-07 18:30Frontend151600Frontend smokePASSED11003m 11s
2026-04-08 14:28Frontend151693Frontend smokePASSED11004m 08s
2026-04-08 15:02Frontend151712Frontend smokePASSED11003m 12s
2026-04-08 17:12Frontend151787Frontend smokeFAILEDn/an/a0m 02s
2026-04-08 18:13Frontend151798Frontend smokePASSED11003m 31s
2026-04-08 22:10Frontend151810Frontend smokePASSED11003m 21s
2026-04-08 23:44Frontend151820Frontend smokePASSED11003m 07s
2026-04-09 00:48Frontend151824Frontend smokePASSED11003m 11s
2026-04-09 14:30Frontend151954Frontend smokePASSED11003m 48s
2026-04-09 14:41Frontend151974Frontend smokePASSED11003m 24s
2026-04-09 16:38Frontend152031Frontend smokePASSED11003m 13s
2026-04-09 23:04Frontend152040Frontend smokePASSED11003m 30s
2026-04-10 10:20Frontend152043Frontend smokePASSED11003m 10s
2026-04-10 12:18Frontend152051Frontend smokePASSED11003m 03s

Smoke Suite Breakdown

Frontend
16 attempts across 16 pipelines
94% green
Passed15
Failed1
Incomplete1
Avg runtime3m 08s
Median passing runtime3m 12s
Pipelines16
University
2 attempts across 2 pipelines
100% green
Passed2
Failed0
Incomplete0
Avg runtime4m 00s
Median passing runtime4m 00s
Pipelines2
Generated from GitLab project adservio/helm2. Times are shown in Europe/Bucharest. Daily-suite runtime is measured from GitLab pipeline and job timestamps. Category counts come from GitLab test-report JSON artifacts, with job-trace fallback when older artifacts have expired.