Engineering Council Test Reliability Report

Executive Snapshot

3

Daily Runs

2/3

Daily Green

22m 50s

Avg Daily Runtime

8

Smoke Attempts

3/8

Smoke Green

3m 38s

Avg Smoke Runtime

3m 39s

Median Smoke Time

1

Current Green Streak

Executive Analysis

Bottom line: release confidence is unstable in both the broad regression path and the deploy smoke path. The immediate job is to separate real product regressions from execution noise, then burn down the concentrated failure clusters.

What Matters

Daily regression passed 2 of 3 runs (66.7%), with a current green streak of 1 and a best streak of 1 in this window.
Smoke passed 3 of 8 attempts (37.5%) across 5 production pipelines.
Failure concentration is not random: Frontend has the highest strict failure ratio at 0.12%, while Frontend has the broadest non-pass footprint at 0.12%.
University is the weakest smoke surface in this window at 1/3 green (33.3%).
Daily-suite runtime averaged 22m 50s.

Engineering Analysis

The failure profile is concentrated enough to act on. Frontend and Frontend are carrying the strongest signal, which means reliability work should be assigned by category ownership instead of treating the suite as one undifferentiated problem.
Smoke is lagging the broader regression suite, so deploy readiness is probably being constrained more by environment/setup stability and narrow critical-path checks than by overall test volume.

Recommended Actions

Assign one owner to Frontend for the next cycle and expect a short written burn-down: top failing tests, suspected root causes, flake versus regression breakdown, and what gets fixed or quarantined first.
Treat the daily regression suite like an operations queue until it is calm again: triage failures after each red run, close known-noise items fast, and avoid letting multiple unrelated red signals pile up between runs.
Put University smoke under closer guardrails for the next release cycle. It is the best place to improve first-pass deploy confidence quickly.

Improvement Ideas

Introduce a small reliability budget for tests: every flaky or quarantined case needs an owner and an expiry, and the team should review that budget weekly the same way it reviews bugs or incidents.
Track first-fail to root-cause time as a core metric. Fast diagnosis is as important as raw pass rate because the practical value of a test gate depends on how quickly it helps the team recover.
Define a runtime budget per suite and require justification when test count or duration grows. Reliable feedback systems stay trusted when they remain both stable and proportionate.

Category Execution Ratios

How computed

Category total executions means the sum of that category's observed test executions across every daily-suite run in the selected window.

Strict Failure Ratio = failed executions for that category divided by total executions for that category across the window.

Non-pass Ratio = (failed + pending + skipped) executions for that category divided by total executions for that category across the window.

Example: if Billing executed 800 times across the week and 2 of those executions failed, Billing strict failure ratio is 0.25%. That does not mean 0.25% of pipelines failed; it means 0.25% of observed Billing executions ended in failed.

How computed

Category total executions means the sum of that category's observed test executions across every daily-suite run in the selected window.

Strict Failure Ratio = failed executions for that category divided by total executions for that category across the window.

Non-pass Ratio = (failed + pending + skipped) executions for that category divided by total executions for that category across the window.

Example: if Billing executed 800 times across the week and 2 of those executions failed, Billing strict failure ratio is 0.25%. That does not mean 0.25% of pipelines failed; it means 0.25% of observed Billing executions ended in failed.

Strict Failure Ratio

Share of category executions that ended in failed across all daily runs in this window.

Billing0.00%

Web0.00%

Frontend0.12%

Library0.00%

Non-pass Ratio

Share of category executions that ended in failed, pending, or skipped across all daily runs in this window.

Billing0.00%

Web0.00%

Frontend0.12%

Library0.00%

Category Aggregate Table

How computed

Category total executions means the sum of that category's observed test executions across every daily-suite run in the selected window.

Strict Failure Ratio = failed executions for that category divided by total executions for that category across the window.

Non-pass Ratio = (failed + pending + skipped) executions for that category divided by total executions for that category across the window.

Example: if Billing executed 800 times across the week and 2 of those executions failed, Billing strict failure ratio is 0.25%. That does not mean 0.25% of pipelines failed; it means 0.25% of observed Billing executions ended in failed.

How computed

Category total executions means the sum of that category's observed test executions across every daily-suite run in the selected window.

Strict Failure Ratio = failed executions for that category divided by total executions for that category across the window.

Non-pass Ratio = (failed + pending + skipped) executions for that category divided by total executions for that category across the window.

Example: if Billing executed 800 times across the week and 2 of those executions failed, Billing strict failure ratio is 0.25%. That does not mean 0.25% of pipelines failed; it means 0.25% of observed Billing executions ended in failed.

Category	Total	Failed	Failure Ratio	Non-pass Ratio	Runs With Failures
Billing	324	0	0.00%	0.00%	0
Web	2340	0	0.00%	0.00%	0
Frontend	810	1	0.12%	0.12%	1
Library	258	0	0.00%	0.00%	0

Billing

Pend 0Skip 0Runs 0

0

0.00%

324

Web

Pend 0Skip 0Runs 0

0

0.00%

2340

Frontend

Pend 0Skip 0Runs 1

1

0.12%

810

Library

Pend 0Skip 0Runs 0

0

0.00%

258

Recent Runs

Recent Daily Suite Runs

Date	Pipeline	Suites	Status	Summary
2026-05-02 18:25	154322	BillingWebFrontendLibrary	PASSED	Total 1244 \| Passed 1244 \| Failed 0
2026-05-03 18:25	154324	BillingWebFrontendLibrary	FAILED	Total 1244 \| Passed 1243 \| Failed 1
2026-05-04 18:26	154462	BillingWebFrontendLibrary	PASSED	Total 1244 \| Passed 1244 \| Failed 0

2026-05-02 18:25Pipeline 154322BillingWebFrontendLibrary

PASSED

T 1244 | P 1244 | F 0 | Pend 0

2026-05-03 18:25Pipeline 154324BillingWebFrontendLibrary

FAILED

T 1244 | P 1243 | F 1 | Pend 0

2026-05-04 18:26Pipeline 154462BillingWebFrontendLibrary

PASSED

T 1244 | P 1244 | F 0 | Pend 0

Recent Smoke Attempts

Date	Suite	Pipeline	Job	Status	Passed	Failed	Duration
2026-05-05 16:13	University	154657	University smoke	FAILED	57	3	3m 54s
2026-05-05 16:16	Frontend	154657	Frontend smoke	FAILED	109	1	3m 32s
2026-05-05 17:09	University	154676	University smoke	FAILED	57	3	3m 43s
2026-05-05 17:12	Frontend	154676	Frontend smoke	FAILED	109	1	3m 51s
2026-05-05 18:29	Frontend	154686	Frontend smoke	FAILED	109	1	3m 29s
2026-05-05 19:34	Frontend	154691	Frontend smoke	PASSED	110	0	3m 06s
2026-05-07 15:31	University	155030	University smoke	PASSED	60	0	3m 39s
2026-05-07 15:34	Frontend	155030	Frontend smoke	PASSED	110	0	3m 47s

2026-05-05 16:13UniversityPipeline 154657Job University smoke

FAILED

P 57 | F 3 | 3m 54s

2026-05-05 16:16FrontendPipeline 154657Job Frontend smoke

FAILED

P 109 | F 1 | 3m 32s

2026-05-05 17:09UniversityPipeline 154676Job University smoke

FAILED

P 57 | F 3 | 3m 43s

2026-05-05 17:12FrontendPipeline 154676Job Frontend smoke

FAILED

P 109 | F 1 | 3m 51s

2026-05-05 18:29FrontendPipeline 154686Job Frontend smoke

FAILED

P 109 | F 1 | 3m 29s

2026-05-05 19:34FrontendPipeline 154691Job Frontend smoke

PASSED

P 110 | F 0 | 3m 06s

2026-05-07 15:31UniversityPipeline 155030Job University smoke

PASSED

P 60 | F 0 | 3m 39s

2026-05-07 15:34FrontendPipeline 155030Job Frontend smoke

PASSED

P 110 | F 0 | 3m 47s

Smoke Suite Breakdown

Frontend

5 attempts across 5 pipelines

40% green

Passed2

Failed3

Incomplete0

Avg runtime3m 33s

Median passing runtime3m 26s

Pipelines5

University

3 attempts across 3 pipelines

33% green

Passed1

Failed2

Incomplete0

Avg runtime3m 45s

Median passing runtime3m 39s

Pipelines3