Engineering Council Test Reliability Report

Scope aligned with Slack channel #dezvoltare, covering 2026-04-11 07:00 to 2026-04-18 21:50. Metrics and timings are sourced from GitLab pipelines, jobs, and test-report artifacts for the daily 6 PM regression suite and the production smoke suite. Trend charts use daily buckets across this window.

Executive Snapshot

8
Daily Runs
1/8
Daily Green
331m 02s
Avg Daily Runtime
4
Smoke Attempts
4/4
Smoke Green
3m 05s
Avg Smoke Runtime
3m 04s
Median Smoke Time
0
Current Green Streak

Executive Analysis

Bottom line: the regression system is informative but not calm. The data suggest repeatable problem areas rather than random breakage, which means focused ownership should move the needle quickly.

What Matters

  • Daily regression passed 1 of 8 runs (12.5%), with a current green streak of 0 and a best streak of 1 in this window. The latest daily run (152618) failed, so the system is ending the week under tension rather than in a clean state. 3 failed run(s) never reached complete daily-suite counts, which points to some infrastructure or setup noise mixed into the product signal.
  • Smoke passed 4 of 4 attempts (100.0%) across 2 production pipelines.
  • Failure concentration is not random: Library has the highest strict failure ratio at 0.69%, while Library has the broadest non-pass footprint at 0.69%.
  • Frontend is the weakest smoke surface in this window at 2/2 green (100.0%).
  • Daily-suite runtime averaged 331m 02s, while observed daily test volume moved from 1,159 to 1,244.

Engineering Analysis

  • A release gate should fail loudly for product regressions and quietly for infrastructure noise. Rerun recoveries plus incomplete daily or smoke attempts suggest those two failure modes are still partially mixed together.
  • The failure profile is concentrated enough to act on. Library and Library are carrying the strongest signal, which means reliability work should be assigned by category ownership instead of treating the suite as one undifferentiated problem.
  • The broader daily suite is carrying more instability than smoke, which usually means product regressions are escaping into wider coverage areas even when the narrow deploy gate looks acceptable.
  • The daily suite is now large enough that runtime itself is becoming a management variable at 331m 02s average duration. At that size, every additional flaky or redundant test has a measurable cost on feedback speed.

Recommended Actions

  • Split incomplete execution failures from real assertion failures in the report narrative. Setup breakage should stay visible, but it should not look identical to a product regression in the executive readout.
  • Assign one owner to Library for the next cycle and expect a short written burn-down: top failing tests, suspected root causes, flake versus regression breakdown, and what gets fixed or quarantined first.
  • Treat the daily regression suite like an operations queue until it is calm again: triage failures after each red run, close known-noise items fast, and avoid letting multiple unrelated red signals pile up between runs.
  • Put Frontend smoke under closer guardrails for the next release cycle. It is the best place to improve first-pass deploy confidence quickly.

Improvement Ideas

  • Introduce a small reliability budget for tests: every flaky or quarantined case needs an owner and an expiry, and the team should review that budget weekly the same way it reviews bugs or incidents.
  • Track first-fail to root-cause time as a core metric. Fast diagnosis is as important as raw pass rate because the practical value of a test gate depends on how quickly it helps the team recover.
  • Define a runtime budget per suite and require justification when test count or duration grows. Reliable feedback systems stay trusted when they remain both stable and proportionate.

Category Execution Ratios

How computed

Category total executions means the sum of that category's observed test executions across every daily-suite run in the selected window.

Strict Failure Ratio = failed executions for that category divided by total executions for that category across the window.

Non-pass Ratio = (failed + pending + skipped) executions for that category divided by total executions for that category across the window.

Example: if Billing executed 800 times across the week and 2 of those executions failed, Billing strict failure ratio is 0.25%. That does not mean 0.25% of pipelines failed; it means 0.25% of observed Billing executions ended in failed.

How computed

Category total executions means the sum of that category's observed test executions across every daily-suite run in the selected window.

Strict Failure Ratio = failed executions for that category divided by total executions for that category across the window.

Non-pass Ratio = (failed + pending + skipped) executions for that category divided by total executions for that category across the window.

Example: if Billing executed 800 times across the week and 2 of those executions failed, Billing strict failure ratio is 0.25%. That does not mean 0.25% of pipelines failed; it means 0.25% of observed Billing executions ended in failed.

Daily Daily Suite Status0011204-1104-1404-1604-18
Daily Smoke Attempts0011204-1104-1404-1604-18
Daily Average Daily Suite Runtime21m 40s324m 01s626m 22s928m 43s1231m 04s04-1104-1404-1604-18
Daily Average Smoke Runtime0m 00s0m 47s1m 34s2m 21s3m 08s04-1104-1404-1604-18
Daily Suite Total Test Growth (Recent 8 Runs)1159118012011222124404-1104-1404-1504-1704-18
Smoke Suite Total Test Growth (Latest Run Per Day)
FrontendUniversity
10356085110Frontend 04-15: 110Frontend 04-17: 110University 04-15: 10University 04-17: 1004-1504-17

Category Aggregate Table

How computed

Category total executions means the sum of that category's observed test executions across every daily-suite run in the selected window.

Strict Failure Ratio = failed executions for that category divided by total executions for that category across the window.

Non-pass Ratio = (failed + pending + skipped) executions for that category divided by total executions for that category across the window.

Example: if Billing executed 800 times across the week and 2 of those executions failed, Billing strict failure ratio is 0.25%. That does not mean 0.25% of pipelines failed; it means 0.25% of observed Billing executions ended in failed.

How computed

Category total executions means the sum of that category's observed test executions across every daily-suite run in the selected window.

Strict Failure Ratio = failed executions for that category divided by total executions for that category across the window.

Non-pass Ratio = (failed + pending + skipped) executions for that category divided by total executions for that category across the window.

Example: if Billing executed 800 times across the week and 2 of those executions failed, Billing strict failure ratio is 0.25%. That does not mean 0.25% of pipelines failed; it means 0.25% of observed Billing executions ended in failed.

CategoryTotalFailedPendingSkippedFailure RatioNon-pass RatioRuns With Failures
Billing8641000.12%0.12%1
Web62409000.14%0.14%3
Frontend216013000.60%0.60%7
Library4333000.69%0.69%3
CatFailF%NP%Tot
Billing
Pend 0Skip 0Runs 1
1
0.12%
0.12%
864
Web
Pend 0Skip 0Runs 3
9
0.14%
0.14%
6240
Frontend
Pend 0Skip 0Runs 7
13
0.60%
0.60%
2160
Library
Pend 0Skip 0Runs 3
3
0.69%
0.69%
433

Recent Runs

Recent Daily Suite Runs

DatePipelineSuitesStatusSummary
2026-04-11 18:25152054BillingWebFrontendLibraryFAILEDTotal 1159 | Passed 1157 | Failed 2 | Incomplete suite counts
2026-04-13 14:34152056BillingWebFrontendLibraryFAILEDTotal 1159 | Passed 1157 | Failed 2 | Incomplete suite counts
2026-04-14 14:42152058BillingWebFrontendLibraryFAILEDTotal 1159 | Passed 1155 | Failed 4 | Incomplete suite counts
2026-04-14 18:25152092BillingWebFrontendLibraryFAILEDTotal 1244 | Passed 1243 | Failed 1
2026-04-15 18:24152229BillingWebFrontendLibraryPASSEDTotal 1244 | Passed 1244 | Failed 0
2026-04-16 19:10152398BillingWebFrontendLibraryFAILEDTotal 1244 | Passed 1239 | Failed 5
2026-04-17 18:25152610BillingWebFrontendLibraryFAILEDTotal 1244 | Passed 1237 | Failed 7
2026-04-18 18:25152618BillingWebFrontendLibraryFAILEDTotal 1244 | Passed 1239 | Failed 5
2026-04-11 18:25Pipeline 152054BillingWebFrontendLibrary
FAILED
T 1159 | P 1157 | F 2 | Pend 0 | Incomplete
2026-04-13 14:34Pipeline 152056BillingWebFrontendLibrary
FAILED
T 1159 | P 1157 | F 2 | Pend 0 | Incomplete
2026-04-14 14:42Pipeline 152058BillingWebFrontendLibrary
FAILED
T 1159 | P 1155 | F 4 | Pend 0 | Incomplete
2026-04-14 18:25Pipeline 152092BillingWebFrontendLibrary
FAILED
T 1244 | P 1243 | F 1 | Pend 0
2026-04-15 18:24Pipeline 152229BillingWebFrontendLibrary
PASSED
T 1244 | P 1244 | F 0 | Pend 0
2026-04-16 19:10Pipeline 152398BillingWebFrontendLibrary
FAILED
T 1244 | P 1239 | F 5 | Pend 0
2026-04-17 18:25Pipeline 152610BillingWebFrontendLibrary
FAILED
T 1244 | P 1237 | F 7 | Pend 0
2026-04-18 18:25Pipeline 152618BillingWebFrontendLibrary
FAILED
T 1244 | P 1239 | F 5 | Pend 0

Recent Smoke Attempts

DateSuitePipelineJobStatusPassedFailedDuration
2026-04-15 16:56University152212University smokePASSED1002m 47s
2026-04-15 17:00Frontend152212Frontend smokePASSED11003m 16s
2026-04-17 13:12University152536University smokePASSED1002m 53s
2026-04-17 13:16Frontend152536Frontend smokePASSED11003m 22s

Smoke Suite Breakdown

Frontend
2 attempts across 2 pipelines
100% green
Passed2
Failed0
Incomplete0
Avg runtime3m 19s
Median passing runtime3m 19s
Pipelines2
University
2 attempts across 2 pipelines
100% green
Passed2
Failed0
Incomplete0
Avg runtime2m 50s
Median passing runtime2m 50s
Pipelines2
Generated from GitLab project adservio/helm2. Times are shown in Europe/Bucharest. Daily-suite runtime is measured from GitLab pipeline and job timestamps. Category counts come from GitLab test-report JSON artifacts, with job-trace fallback when older artifacts have expired.