Engineering Council Test Reliability Report

Scope aligned with Slack channel #dezvoltare, covering 2026-06-21 07:00 to 2026-06-28 07:00. Metrics and timings are sourced from GitLab pipelines, jobs, and test-report artifacts for the daily 6 PM regression suite and the production smoke suite. Trend charts use daily buckets across this window.

Executive Snapshot

7
Daily Runs
0/7
Daily Green
27m 38s
Avg Daily Runtime
16
Smoke Attempts
15/16
Smoke Green
4m 11s
Avg Smoke Runtime
4m 34s
Median Smoke Time
0
Current Green Streak

Executive Analysis

Bottom line: the regression system is informative but not calm. The data suggest repeatable problem areas rather than random breakage, which means focused ownership should move the needle quickly.

What Matters

  • Daily regression passed 0 of 7 runs (0.0%), with a current green streak of 0 and a best streak of 0 in this window. The latest daily run (162155) failed, so the system is ending the week under tension rather than in a clean state. 2 failed run(s) never reached complete daily-suite counts, which points to some infrastructure or setup noise mixed into the product signal.
  • Smoke passed 15 of 16 attempts (93.8%) across 12 production pipelines. 1 failed attempt(s) never reached test execution counts at all.
  • Failure concentration is not random: Frontend has the highest strict failure ratio at 3.29%, while Frontend has the broadest non-pass footprint at 4.33%.
  • Frontend is the weakest smoke surface in this window at 11/12 green (91.7%).
  • Daily-suite runtime averaged 27m 38s, while observed daily test volume moved from 520 to 456.

Engineering Analysis

  • A release gate should fail loudly for product regressions and quietly for infrastructure noise. Rerun recoveries plus incomplete daily or smoke attempts suggest those two failure modes are still partially mixed together.
  • The failure profile is concentrated enough to act on. Frontend and Frontend are carrying the strongest signal, which means reliability work should be assigned by category ownership instead of treating the suite as one undifferentiated problem.
  • The broader daily suite is carrying more instability than smoke, which usually means product regressions are escaping into wider coverage areas even when the narrow deploy gate looks acceptable.
  • The daily suite is now large enough that runtime itself is becoming a management variable at 27m 38s average duration. At that size, every additional flaky or redundant test has a measurable cost on feedback speed.

Recommended Actions

  • Split incomplete execution failures from real assertion failures in the report narrative. Setup breakage should stay visible, but it should not look identical to a product regression in the executive readout.
  • Assign one owner to Frontend for the next cycle and expect a short written burn-down: top failing tests, suspected root causes, flake versus regression breakdown, and what gets fixed or quarantined first.
  • Treat the daily regression suite like an operations queue until it is calm again: triage failures after each red run, close known-noise items fast, and avoid letting multiple unrelated red signals pile up between runs.
  • Put Frontend smoke under closer guardrails for the next release cycle. It is the best place to improve first-pass deploy confidence quickly.

Improvement Ideas

  • Introduce a small reliability budget for tests: every flaky or quarantined case needs an owner and an expiry, and the team should review that budget weekly the same way it reviews bugs or incidents.
  • Track first-fail to root-cause time as a core metric. Fast diagnosis is as important as raw pass rate because the practical value of a test gate depends on how quickly it helps the team recover.
  • Define a runtime budget per suite and require justification when test count or duration grows. Reliable feedback systems stay trusted when they remain both stable and proportionate.

Category Execution Ratios

How computed

Category total executions means the sum of that category's observed test executions across every daily-suite run in the selected window.

Strict Failure Ratio = failed executions for that category divided by total executions for that category across the window.

Non-pass Ratio = (failed + pending + skipped) executions for that category divided by total executions for that category across the window.

Example: if Billing executed 800 times across the week and 2 of those executions failed, Billing strict failure ratio is 0.25%. That does not mean 0.25% of pipelines failed; it means 0.25% of observed Billing executions ended in failed.

How computed

Category total executions means the sum of that category's observed test executions across every daily-suite run in the selected window.

Strict Failure Ratio = failed executions for that category divided by total executions for that category across the window.

Non-pass Ratio = (failed + pending + skipped) executions for that category divided by total executions for that category across the window.

Example: if Billing executed 800 times across the week and 2 of those executions failed, Billing strict failure ratio is 0.25%. That does not mean 0.25% of pipelines failed; it means 0.25% of observed Billing executions ended in failed.

Daily Daily Suite Status0000106-2106-2306-2506-27
Daily Smoke Attempts0246906-2106-2306-2506-27
Daily Average Daily Suite Runtime10m 13s39m 02s67m 51s96m 40s125m 29s06-2106-2306-2506-27
Daily Average Smoke Runtime0m 00s1m 11s2m 23s3m 34s4m 46s06-2106-2306-2506-27
Daily Suite Total Test Growth (Recent 7 Runs)45647248850452106-2106-2306-2506-27
Smoke Suite Total Test Growth (Latest Run Per Day)
FrontendUniversity
0275582110Frontend 06-22: 110Frontend 06-24: 110Frontend 06-25: 110Frontend 06-26: 0University 06-22: 60University 06-25: 6006-2206-2406-2506-26

Category Aggregate Table

How computed

Category total executions means the sum of that category's observed test executions across every daily-suite run in the selected window.

Strict Failure Ratio = failed executions for that category divided by total executions for that category across the window.

Non-pass Ratio = (failed + pending + skipped) executions for that category divided by total executions for that category across the window.

Example: if Billing executed 800 times across the week and 2 of those executions failed, Billing strict failure ratio is 0.25%. That does not mean 0.25% of pipelines failed; it means 0.25% of observed Billing executions ended in failed.

How computed

Category total executions means the sum of that category's observed test executions across every daily-suite run in the selected window.

Strict Failure Ratio = failed executions for that category divided by total executions for that category across the window.

Non-pass Ratio = (failed + pending + skipped) executions for that category divided by total executions for that category across the window.

Example: if Billing executed 800 times across the week and 2 of those executions failed, Billing strict failure ratio is 0.25%. That does not mean 0.25% of pipelines failed; it means 0.25% of observed Billing executions ended in failed.

CategoryTotalFailedPendingSkippedFailure RatioNon-pass RatioRuns With Failures
Billing79620002.51%2.51%2
Web700000.00%0.00%0
Frontend2218730233.29%4.33%7
Library4322000.46%0.46%2
CatFailF%NP%Tot
Billing
Pend 0Skip 0Runs 2
20
2.51%
2.51%
796
Web
Pend 0Skip 0Runs 0
0
0.00%
0.00%
70
Frontend
Pend 0Skip 23Runs 7
73
3.29%
4.33%
2218
Library
Pend 0Skip 0Runs 2
2
0.46%
0.46%
432

Recent Runs

Recent Daily Suite Runs

DatePipelineSuitesStatusSummary
2026-06-21 18:14161024BillingWebFrontendLibraryFAILEDTotal 520 | Passed 509 | Failed 11
2026-06-22 18:13161190BillingWebFrontendLibraryFAILEDTotal 521 | Passed 517 | Failed 2
2026-06-23 18:14161325BillingWebFrontendLibraryFAILEDTotal 521 | Passed 516 | Failed 4
2026-06-24 20:08161648BillingWebFrontendLibraryFAILEDTotal 521 | Passed 492 | Failed 14
2026-06-25 18:14161855BillingWebFrontendLibraryFAILEDTotal 521 | Passed 488 | Failed 30
2026-06-26 18:16162153BillingWebFrontendLibraryFAILEDTotal 456 | Passed 425 | Failed 30 | Incomplete suite counts
2026-06-27 18:14162155BillingWebFrontendLibraryFAILEDTotal 456 | Passed 451 | Failed 4 | Incomplete suite counts
2026-06-21 18:14Pipeline 161024BillingWebFrontendLibrary
FAILED
T 520 | P 509 | F 11 | Pend 0
2026-06-22 18:13Pipeline 161190BillingWebFrontendLibrary
FAILED
T 521 | P 517 | F 2 | Pend 0
2026-06-23 18:14Pipeline 161325BillingWebFrontendLibrary
FAILED
T 521 | P 516 | F 4 | Pend 0
2026-06-24 20:08Pipeline 161648BillingWebFrontendLibrary
FAILED
T 521 | P 492 | F 14 | Pend 0
2026-06-25 18:14Pipeline 161855BillingWebFrontendLibrary
FAILED
T 521 | P 488 | F 30 | Pend 0
2026-06-26 18:16Pipeline 162153BillingWebFrontendLibrary
FAILED
T 456 | P 425 | F 30 | Pend 0 | Incomplete
2026-06-27 18:14Pipeline 162155BillingWebFrontendLibrary
FAILED
T 456 | P 451 | F 4 | Pend 0 | Incomplete

Recent Smoke Attempts

DateSuitePipelineJobStatusPassedFailedDuration
2026-06-22 11:28University161065University smokePASSED6003m 58s
2026-06-22 11:32Frontend161065Frontend smokePASSED11004m 35s
2026-06-22 13:50Frontend161134Frontend smokePASSED11004m 42s
2026-06-24 11:16Frontend161380Frontend smokePASSED11004m 42s
2026-06-24 15:00Frontend161543Frontend smokePASSED11004m 49s
2026-06-25 08:06Frontend161611Frontend smokePASSED11004m 53s
2026-06-25 10:50Frontend161673Frontend smokePASSED11004m 34s
2026-06-25 14:43Frontend161779Frontend smokePASSED11004m 41s
2026-06-25 16:36University161825University smokePASSED6003m 46s
2026-06-25 16:40Frontend161825Frontend smokePASSED11004m 32s
2026-06-25 17:08University161842University smokePASSED6003m 33s
2026-06-25 17:12Frontend161842Frontend smokePASSED11004m 45s
2026-06-25 22:41University161863University smokePASSED6003m 24s
2026-06-25 22:44Frontend161863Frontend smokePASSED11004m 34s
2026-06-26 10:29Frontend161923Frontend smokePASSED11004m 30s
2026-06-26 14:10Frontend162039Frontend smokeFAILEDn/an/a1m 01s

Smoke Suite Breakdown

Frontend
12 attempts across 12 pipelines
92% green
Passed11
Failed1
Incomplete1
Avg runtime4m 22s
Median passing runtime4m 41s
Pipelines12
University
4 attempts across 4 pipelines
100% green
Passed4
Failed0
Incomplete0
Avg runtime3m 40s
Median passing runtime3m 39s
Pipelines4
Generated from GitLab project adservio/helm2. Times are shown in Europe/Bucharest. Daily-suite runtime is measured from GitLab pipeline and job timestamps. Category counts come from GitLab test-report JSON artifacts, with job-trace fallback when older artifacts have expired.