Engineering Council Test Reliability Report

Scope aligned with Slack channel #dezvoltare, covering 2026-06-06 07:00 to 2026-06-13 07:00. Metrics and timings are sourced from GitLab pipelines, jobs, and test-report artifacts for the daily 6 PM regression suite and the production smoke suite. Trend charts use daily buckets across this window.

Executive Snapshot

7
Daily Runs
0/7
Daily Green
9m 35s
Avg Daily Runtime
20
Smoke Attempts
16/20
Smoke Green
6m 02s
Avg Smoke Runtime
4m 32s
Median Smoke Time
0
Current Green Streak

Executive Analysis

Bottom line: the weakest link is smoke reliability, not test speed. The suite can still provide signal, but deploy confidence is being taxed by failed or noisy smoke attempts.

What Matters

  • Daily regression passed 0 of 7 runs (0.0%), with a current green streak of 0 and a best streak of 0 in this window. The latest daily run (160131) failed, so the system is ending the week under tension rather than in a clean state. 7 failed run(s) never reached complete daily-suite counts, which points to some infrastructure or setup noise mixed into the product signal.
  • Smoke passed 16 of 20 attempts (80.0%) across 15 production pipelines. 1 pipeline(s) recovered on rerun, which is useful for continuity but also a sign that first-pass deploy signal is noisier than it should be.
  • Failure concentration is not random: Billing has the highest strict failure ratio at 0.13%, while Billing has the broadest non-pass footprint at 0.13%.
  • Frontend is the weakest smoke surface in this window at 12/15 green (80.0%).
  • Daily-suite runtime averaged 9m 35s.

Engineering Analysis

  • A release gate should fail loudly for product regressions and quietly for infrastructure noise. Rerun recoveries plus incomplete daily or smoke attempts suggest those two failure modes are still partially mixed together.
  • The failure profile is concentrated enough to act on. Billing and Billing are carrying the strongest signal, which means reliability work should be assigned by category ownership instead of treating the suite as one undifferentiated problem.
  • The broader daily suite is carrying more instability than smoke, which usually means product regressions are escaping into wider coverage areas even when the narrow deploy gate looks acceptable.

Recommended Actions

  • Split incomplete execution failures from real assertion failures in the report narrative. Setup breakage should stay visible, but it should not look identical to a product regression in the executive readout.
  • Assign one owner to Billing for the next cycle and expect a short written burn-down: top failing tests, suspected root causes, flake versus regression breakdown, and what gets fixed or quarantined first.
  • Treat the daily regression suite like an operations queue until it is calm again: triage failures after each red run, close known-noise items fast, and avoid letting multiple unrelated red signals pile up between runs.
  • Put Frontend smoke under closer guardrails for the next release cycle. It is the best place to improve first-pass deploy confidence quickly.

Improvement Ideas

  • Introduce a small reliability budget for tests: every flaky or quarantined case needs an owner and an expiry, and the team should review that budget weekly the same way it reviews bugs or incidents.
  • Track first-fail to root-cause time as a core metric. Fast diagnosis is as important as raw pass rate because the practical value of a test gate depends on how quickly it helps the team recover.
  • Define a runtime budget per suite and require justification when test count or duration grows. Reliable feedback systems stay trusted when they remain both stable and proportionate.

Category Execution Ratios

How computed

Category total executions means the sum of that category's observed test executions across every daily-suite run in the selected window.

Strict Failure Ratio = failed executions for that category divided by total executions for that category across the window.

Non-pass Ratio = (failed + pending + skipped) executions for that category divided by total executions for that category across the window.

Example: if Billing executed 800 times across the week and 2 of those executions failed, Billing strict failure ratio is 0.25%. That does not mean 0.25% of pipelines failed; it means 0.25% of observed Billing executions ended in failed.

How computed

Category total executions means the sum of that category's observed test executions across every daily-suite run in the selected window.

Strict Failure Ratio = failed executions for that category divided by total executions for that category across the window.

Non-pass Ratio = (failed + pending + skipped) executions for that category divided by total executions for that category across the window.

Example: if Billing executed 800 times across the week and 2 of those executions failed, Billing strict failure ratio is 0.25%. That does not mean 0.25% of pipelines failed; it means 0.25% of observed Billing executions ended in failed.

Daily Daily Suite Status0000106-0606-0806-1006-12
Daily Smoke Attempts0134606-0606-0806-1006-12
Daily Average Daily Suite Runtime8m 51s9m 14s9m 38s10m 01s10m 24s06-0606-0806-1006-12
Daily Average Smoke Runtime0m 00s2m 49s5m 39s8m 28s11m 17s06-0606-0806-1006-12
Daily Suite Total Test Growth (Recent 7 Runs)19419419419419506-0606-0806-1006-12
Smoke Suite Total Test Growth (Latest Run Per Day)
FrontendUniversity
60728597110Frontend 06-08: 110Frontend 06-09: 110Frontend 06-10: 110Frontend 06-11: 110Frontend 06-12: 110University 06-08: 60University 06-10: 60University 06-11: 6006-0806-0906-1006-1106-12

Category Aggregate Table

How computed

Category total executions means the sum of that category's observed test executions across every daily-suite run in the selected window.

Strict Failure Ratio = failed executions for that category divided by total executions for that category across the window.

Non-pass Ratio = (failed + pending + skipped) executions for that category divided by total executions for that category across the window.

Example: if Billing executed 800 times across the week and 2 of those executions failed, Billing strict failure ratio is 0.25%. That does not mean 0.25% of pipelines failed; it means 0.25% of observed Billing executions ended in failed.

How computed

Category total executions means the sum of that category's observed test executions across every daily-suite run in the selected window.

Strict Failure Ratio = failed executions for that category divided by total executions for that category across the window.

Non-pass Ratio = (failed + pending + skipped) executions for that category divided by total executions for that category across the window.

Example: if Billing executed 800 times across the week and 2 of those executions failed, Billing strict failure ratio is 0.25%. That does not mean 0.25% of pipelines failed; it means 0.25% of observed Billing executions ended in failed.

CategoryTotalFailedPendingSkippedFailure RatioNon-pass RatioRuns With Failures
Billing7561000.13%0.13%1
Web00000.00%0.00%7
Frontend00000.00%0.00%7
Library6020000.00%0.00%0
CatFailF%NP%Tot
Billing
Pend 0Skip 0Runs 1
1
0.13%
0.13%
756
Web
Pend 0Skip 0Runs 7
0
0.00%
0.00%
0
Frontend
Pend 0Skip 0Runs 7
0
0.00%
0.00%
0
Library
Pend 0Skip 0Runs 0
0
0.00%
0.00%
602

Recent Runs

Recent Daily Suite Runs

DatePipelineSuitesStatusSummary
2026-06-06 18:12159149BillingWebFrontendLibraryFAILEDTotal 194 | Passed 194 | Failed 0 | Incomplete suite counts
2026-06-07 18:12159156BillingWebFrontendLibraryFAILEDTotal 194 | Passed 194 | Failed 0 | Incomplete suite counts
2026-06-08 18:13159349BillingWebFrontendLibraryFAILEDTotal 194 | Passed 194 | Failed 0 | Incomplete suite counts
2026-06-09 18:13159589BillingWebFrontendLibraryFAILEDTotal 194 | Passed 194 | Failed 0 | Incomplete suite counts
2026-06-10 18:13159862BillingWebFrontendLibraryFAILEDTotal 194 | Passed 194 | Failed 0 | Incomplete suite counts
2026-06-11 18:13160006BillingWebFrontendLibraryFAILEDTotal 194 | Passed 194 | Failed 0 | Incomplete suite counts
2026-06-12 18:12160131BillingWebFrontendLibraryFAILEDTotal 194 | Passed 193 | Failed 1 | Incomplete suite counts
2026-06-06 18:12Pipeline 159149BillingWebFrontendLibrary
FAILED
T 194 | P 194 | F 0 | Pend 0 | Incomplete
2026-06-07 18:12Pipeline 159156BillingWebFrontendLibrary
FAILED
T 194 | P 194 | F 0 | Pend 0 | Incomplete
2026-06-08 18:13Pipeline 159349BillingWebFrontendLibrary
FAILED
T 194 | P 194 | F 0 | Pend 0 | Incomplete
2026-06-09 18:13Pipeline 159589BillingWebFrontendLibrary
FAILED
T 194 | P 194 | F 0 | Pend 0 | Incomplete
2026-06-10 18:13Pipeline 159862BillingWebFrontendLibrary
FAILED
T 194 | P 194 | F 0 | Pend 0 | Incomplete
2026-06-11 18:13Pipeline 160006BillingWebFrontendLibrary
FAILED
T 194 | P 194 | F 0 | Pend 0 | Incomplete
2026-06-12 18:12Pipeline 160131BillingWebFrontendLibrary
FAILED
T 194 | P 193 | F 1 | Pend 0 | Incomplete

Recent Smoke Attempts

DateSuitePipelineJobStatusPassedFailedDuration
2026-06-08 16:27University159345University smokeFAILED5913m 47s
2026-06-08 16:31Frontend159345Frontend smokePASSED11004m 30s
2026-06-09 11:52Frontend159403Frontend smokePASSED11004m 45s
2026-06-09 13:54Frontend159469Frontend smokePASSED11004m 24s
2026-06-10 14:40University159759University smokePASSED6003m 35s
2026-06-10 14:45Frontend159759Frontend smokePASSED11004m 55s
2026-06-10 15:19University159785University smokePASSED6003m 50s
2026-06-10 15:22Frontend159785Frontend smokePASSED11004m 39s
2026-06-10 16:05Frontend159807Frontend smokePASSED11004m 35s
2026-06-11 12:11University159923University smokePASSED6003m 29s
2026-06-11 12:14Frontend159923Frontend smokePASSED11004m 14s
2026-06-11 12:34Frontend159928Frontend smokePASSED11004m 43s
2026-06-11 13:42Frontend159939Frontend smokePASSED11004m 37s
2026-06-11 14:54University159980University smokePASSED6003m 34s
2026-06-11 14:59Frontend159980Frontend smokePASSED11004m 30s
2026-06-12 10:11Frontend160024Frontend smokeFAILED80312m 46s
2026-06-12 11:50Frontend160043Frontend smokeFAILED605021m 51s
2026-06-12 13:59Frontend160069Frontend smokeFAILED862412m 22s
2026-06-12 14:37Frontend160099Frontend smokePASSED11004m 52s
2026-06-12 15:22Frontend160111Frontend smokePASSED11004m 35s

Smoke Suite Breakdown

Frontend
15 attempts across 15 pipelines
80% green
Passed12
Failed3
Incomplete0
Avg runtime6m 49s
Median passing runtime4m 36s
Pipelines15
University
5 attempts across 5 pipelines
80% green
Passed4
Failed1
Incomplete0
Avg runtime3m 39s
Median passing runtime3m 35s
Pipelines5
Generated from GitLab project adservio/helm2. Times are shown in Europe/Bucharest. Daily-suite runtime is measured from GitLab pipeline and job timestamps. Category counts come from GitLab test-report JSON artifacts, with job-trace fallback when older artifacts have expired.