Engineering Council Test Reliability Report

Scope aligned with Slack channel #dezvoltare, covering 2026-06-28 07:00 to 2026-07-05 07:00. Metrics and timings are sourced from GitLab pipelines, jobs, and test-report artifacts for the daily 6 PM regression suite and the production smoke suite. Trend charts use daily buckets across this window.

Executive Snapshot

7
Daily Runs
0/7
Daily Green
65m 48s
Avg Daily Runtime
14
Smoke Attempts
11/14
Smoke Green
3m 18s
Avg Smoke Runtime
4m 13s
Median Smoke Time
0
Current Green Streak

Executive Analysis

Bottom line: release confidence is unstable in both the broad regression path and the deploy smoke path. The immediate job is to separate real product regressions from execution noise, then burn down the concentrated failure clusters.

What Matters

  • Daily regression passed 0 of 7 runs (0.0%), with a current green streak of 0 and a best streak of 0 in this window. The latest daily run (162871) failed, so the system is ending the week under tension rather than in a clean state. 7 failed run(s) never reached complete daily-suite counts, which points to some infrastructure or setup noise mixed into the product signal.
  • Smoke passed 11 of 14 attempts (78.6%) across 9 production pipelines. 3 failed attempt(s) never reached test execution counts at all.
  • Failure concentration is not random: Library has the highest strict failure ratio at 0.97%, while Billing has the broadest non-pass footprint at 3.51%.
  • Frontend is the weakest smoke surface in this window at 7/9 green (77.8%).
  • Daily-suite runtime averaged 65m 48s.

Engineering Analysis

  • A release gate should fail loudly for product regressions and quietly for infrastructure noise. Rerun recoveries plus incomplete daily or smoke attempts suggest those two failure modes are still partially mixed together.
  • The failure profile is concentrated enough to act on. Library and Billing are carrying the strongest signal, which means reliability work should be assigned by category ownership instead of treating the suite as one undifferentiated problem.
  • The broader daily suite is carrying more instability than smoke, which usually means product regressions are escaping into wider coverage areas even when the narrow deploy gate looks acceptable.
  • The daily suite is now large enough that runtime itself is becoming a management variable at 65m 48s average duration. At that size, every additional flaky or redundant test has a measurable cost on feedback speed.

Recommended Actions

  • Split incomplete execution failures from real assertion failures in the report narrative. Setup breakage should stay visible, but it should not look identical to a product regression in the executive readout.
  • Assign one owner to Library for the next cycle and expect a short written burn-down: top failing tests, suspected root causes, flake versus regression breakdown, and what gets fixed or quarantined first.
  • Treat the daily regression suite like an operations queue until it is calm again: triage failures after each red run, close known-noise items fast, and avoid letting multiple unrelated red signals pile up between runs.
  • Put Frontend smoke under closer guardrails for the next release cycle. It is the best place to improve first-pass deploy confidence quickly.

Improvement Ideas

  • Introduce a small reliability budget for tests: every flaky or quarantined case needs an owner and an expiry, and the team should review that budget weekly the same way it reviews bugs or incidents.
  • Track first-fail to root-cause time as a core metric. Fast diagnosis is as important as raw pass rate because the practical value of a test gate depends on how quickly it helps the team recover.
  • Define a runtime budget per suite and require justification when test count or duration grows. Reliable feedback systems stay trusted when they remain both stable and proportionate.

Category Execution Ratios

How computed

Category total executions means the sum of that category's observed test executions across every daily-suite run in the selected window.

Strict Failure Ratio = failed executions for that category divided by total executions for that category across the window.

Non-pass Ratio = (failed + pending + skipped) executions for that category divided by total executions for that category across the window.

Example: if Billing executed 800 times across the week and 2 of those executions failed, Billing strict failure ratio is 0.25%. That does not mean 0.25% of pipelines failed; it means 0.25% of observed Billing executions ended in failed.

How computed

Category total executions means the sum of that category's observed test executions across every daily-suite run in the selected window.

Strict Failure Ratio = failed executions for that category divided by total executions for that category across the window.

Non-pass Ratio = (failed + pending + skipped) executions for that category divided by total executions for that category across the window.

Example: if Billing executed 800 times across the week and 2 of those executions failed, Billing strict failure ratio is 0.25%. That does not mean 0.25% of pipelines failed; it means 0.25% of observed Billing executions ended in failed.

Daily Daily Suite Status0011206-2806-2906-3007-0107-0207-04
Daily Smoke Attempts0134606-2806-2906-3007-0107-0207-04
Daily Average Daily Suite Runtime7m 42s56m 10s104m 38s153m 07s201m 35s06-2806-2906-3007-0107-0207-04
Daily Average Smoke Runtime0m 00s1m 00s2m 00s3m 00s4m 00s06-2806-2906-3007-0107-0207-04
Daily Suite Total Test Growth (Recent 7 Runs)25510816121406-2806-3007-0207-04
Smoke Suite Total Test Growth (Latest Run Per Day)
FrontendUniversity
0275582110Frontend 06-29: 0Frontend 07-01: 110Frontend 07-02: 110Frontend 07-03: 110University 06-29: 0University 07-01: 60University 07-02: 6006-2907-0107-0207-03

Category Aggregate Table

How computed

Category total executions means the sum of that category's observed test executions across every daily-suite run in the selected window.

Strict Failure Ratio = failed executions for that category divided by total executions for that category across the window.

Non-pass Ratio = (failed + pending + skipped) executions for that category divided by total executions for that category across the window.

Example: if Billing executed 800 times across the week and 2 of those executions failed, Billing strict failure ratio is 0.25%. That does not mean 0.25% of pipelines failed; it means 0.25% of observed Billing executions ended in failed.

How computed

Category total executions means the sum of that category's observed test executions across every daily-suite run in the selected window.

Strict Failure Ratio = failed executions for that category divided by total executions for that category across the window.

Non-pass Ratio = (failed + pending + skipped) executions for that category divided by total executions for that category across the window.

Example: if Billing executed 800 times across the week and 2 of those executions failed, Billing strict failure ratio is 0.25%. That does not mean 0.25% of pipelines failed; it means 0.25% of observed Billing executions ended in failed.

CategoryTotalFailedPendingSkippedFailure RatioNon-pass RatioRuns With Failures
Billing76920250.26%3.51%2
Web00000.00%0.00%7
Frontend00000.00%0.00%7
Library5175000.97%0.97%2
CatFailF%NP%Tot
Billing
Pend 0Skip 25Runs 2
2
0.26%
3.51%
769
Web
Pend 0Skip 0Runs 7
0
0.00%
0.00%
0
Frontend
Pend 0Skip 0Runs 7
0
0.00%
0.00%
0
Library
Pend 0Skip 0Runs 2
5
0.97%
0.97%
517

Recent Runs

Recent Daily Suite Runs

DatePipelineSuitesStatusSummary
2026-06-28 18:13162157BillingWebFrontendLibraryFAILEDTotal 214 | Passed 214 | Failed 0 | Incomplete suite counts
2026-06-29 18:18162305BillingWebFrontendLibraryFAILEDTotal 214 | Passed 210 | Failed 4 | Incomplete suite counts
2026-06-30 18:11162403BillingWebFrontendLibraryFAILEDTotal 2 | Passed 0 | Failed 2 | Incomplete suite counts
2026-07-01 18:14162600BillingWebFrontendLibraryFAILEDTotal 214 | Passed 214 | Failed 0 | Incomplete suite counts
2026-07-02 18:16162742BillingWebFrontendLibraryFAILEDTotal 214 | Passed 188 | Failed 1 | Incomplete suite counts
2026-07-04 00:33162866BillingWebFrontendLibraryFAILEDTotal 214 | Passed 214 | Failed 0 | Incomplete suite counts
2026-07-04 18:16162871BillingWebFrontendLibraryFAILEDTotal 214 | Passed 214 | Failed 0 | Incomplete suite counts
2026-06-28 18:13Pipeline 162157BillingWebFrontendLibrary
FAILED
T 214 | P 214 | F 0 | Pend 0 | Incomplete
2026-06-29 18:18Pipeline 162305BillingWebFrontendLibrary
FAILED
T 214 | P 210 | F 4 | Pend 0 | Incomplete
2026-06-30 18:11Pipeline 162403BillingWebFrontendLibrary
FAILED
T 2 | P 0 | F 2 | Pend 0 | Incomplete
2026-07-01 18:14Pipeline 162600BillingWebFrontendLibrary
FAILED
T 214 | P 214 | F 0 | Pend 0 | Incomplete
2026-07-02 18:16Pipeline 162742BillingWebFrontendLibrary
FAILED
T 214 | P 188 | F 1 | Pend 0 | Incomplete
2026-07-04 00:33Pipeline 162866BillingWebFrontendLibrary
FAILED
T 214 | P 214 | F 0 | Pend 0 | Incomplete
2026-07-04 18:16Pipeline 162871BillingWebFrontendLibrary
FAILED
T 214 | P 214 | F 0 | Pend 0 | Incomplete

Recent Smoke Attempts

DateSuitePipelineJobStatusPassedFailedDuration
2026-06-29 09:38Frontend162170Frontend smokeFAILEDn/an/a0m 07s
2026-06-29 16:24University162299University smokeFAILEDn/an/a0m 04s
2026-06-29 16:30Frontend162299Frontend smokeFAILEDn/an/a1m 07s
2026-07-01 13:06University162501University smokePASSED6003m 19s
2026-07-01 13:10Frontend162501Frontend smokePASSED11004m 23s
2026-07-02 09:46University162618University smokePASSED6003m 40s
2026-07-02 09:50Frontend162618Frontend smokePASSED11004m 49s
2026-07-02 18:57University162744University smokePASSED6003m 26s
2026-07-02 18:59Frontend162744Frontend smokePASSED11004m 13s
2026-07-02 22:18University162748University smokePASSED6003m 12s
2026-07-02 22:22Frontend162748Frontend smokePASSED11004m 41s
2026-07-03 09:31Frontend162757Frontend smokePASSED11004m 02s
2026-07-03 09:42Frontend162780Frontend smokePASSED11004m 24s
2026-07-03 12:40Frontend162826Frontend smokePASSED11004m 50s

Smoke Suite Breakdown

Frontend
9 attempts across 9 pipelines
78% green
Passed7
Failed2
Incomplete2
Avg runtime3m 37s
Median passing runtime4m 24s
Pipelines9
University
5 attempts across 5 pipelines
80% green
Passed4
Failed1
Incomplete1
Avg runtime2m 44s
Median passing runtime3m 22s
Pipelines5
Generated from GitLab project adservio/helm2. Times are shown in Europe/Bucharest. Daily-suite runtime is measured from GitLab pipeline and job timestamps. Category counts come from GitLab test-report JSON artifacts, with job-trace fallback when older artifacts have expired.