Engineering Council Test Reliability Report

Executive Snapshot

7

Daily Runs

2/7

Daily Green

27m 24s

Avg Daily Runtime

18

Smoke Attempts

17/18

Smoke Green

3m 13s

Avg Smoke Runtime

3m 12s

Median Smoke Time

0

Current Green Streak

Executive Analysis

Bottom line: the regression system is informative but not calm. The data suggest repeatable problem areas rather than random breakage, which means focused ownership should move the needle quickly.

What Matters

Daily regression passed 2 of 7 runs (28.6%), with a current green streak of 0 and a best streak of 2 in this window. The latest daily run (152052) failed, so the system is ending the week under tension rather than in a clean state.
Smoke passed 17 of 18 attempts (94.4%) across 16 production pipelines. 1 failed attempt(s) never reached test execution counts at all.
Failure concentration is not random: Frontend has the highest strict failure ratio at 0.27%, while Frontend has the broadest non-pass footprint at 0.27%.
Frontend is the weakest smoke surface in this window at 15/16 green (93.8%).
Daily-suite runtime averaged 27m 24s, while observed daily test volume moved from 1,233 to 1,244.

Engineering Analysis

A release gate should fail loudly for product regressions and quietly for infrastructure noise. Rerun recoveries and incomplete smoke attempts suggest those two failure modes are still partially mixed together.
The failure profile is concentrated enough to act on. Frontend and Frontend are carrying the strongest signal, which means reliability work should be assigned by category ownership instead of treating the suite as one undifferentiated problem.
The broader daily suite is carrying more instability than smoke, which usually means product regressions are escaping into wider coverage areas even when the narrow deploy gate looks acceptable.
The daily suite is now large enough that runtime itself is becoming a management variable at 27m 24s average duration. At that size, every additional flaky or redundant test has a measurable cost on feedback speed.

Recommended Actions

Split smoke failures into two explicit classes: product regressions versus execution/setup failures. Incomplete attempts should never look identical to real assertion failures in the executive readout.
Assign one owner to Frontend for the next cycle and expect a short written burn-down: top failing tests, suspected root causes, flake versus regression breakdown, and what gets fixed or quarantined first.
Treat the daily regression suite like an operations queue until it is calm again: triage failures after each red run, close known-noise items fast, and avoid letting multiple unrelated red signals pile up between runs.
Put Frontend smoke under closer guardrails for the next release cycle. It is the best place to improve first-pass deploy confidence quickly.

Improvement Ideas

Introduce a small reliability budget for tests: every flaky or quarantined case needs an owner and an expiry, and the team should review that budget weekly the same way it reviews bugs or incidents.
Track first-fail to root-cause time as a core metric. Fast diagnosis is as important as raw pass rate because the practical value of a test gate depends on how quickly it helps the team recover.
Define a runtime budget per suite and require justification when test count or duration grows. Reliable feedback systems stay trusted when they remain both stable and proportionate.

Category Execution Ratios

How computed

Category total executions means the sum of that category's observed test executions across every daily-suite run in the selected window.

Strict Failure Ratio = failed executions for that category divided by total executions for that category across the window.

Non-pass Ratio = (failed + pending + skipped) executions for that category divided by total executions for that category across the window.

Example: if Billing executed 800 times across the week and 2 of those executions failed, Billing strict failure ratio is 0.25%. That does not mean 0.25% of pipelines failed; it means 0.25% of observed Billing executions ended in failed.

How computed

Category total executions means the sum of that category's observed test executions across every daily-suite run in the selected window.

Strict Failure Ratio = failed executions for that category divided by total executions for that category across the window.

Non-pass Ratio = (failed + pending + skipped) executions for that category divided by total executions for that category across the window.

Example: if Billing executed 800 times across the week and 2 of those executions failed, Billing strict failure ratio is 0.25%. That does not mean 0.25% of pipelines failed; it means 0.25% of observed Billing executions ended in failed.

Strict Failure Ratio

Share of category executions that ended in failed across all daily runs in this window.

Billing0.26%

Web0.15%

Frontend0.27%

Library0.00%

Non-pass Ratio

Share of category executions that ended in failed, pending, or skipped across all daily runs in this window.

Billing0.26%

Web0.19%

Frontend0.27%

Library0.00%

Category Aggregate Table

How computed

Category total executions means the sum of that category's observed test executions across every daily-suite run in the selected window.

Strict Failure Ratio = failed executions for that category divided by total executions for that category across the window.

Non-pass Ratio = (failed + pending + skipped) executions for that category divided by total executions for that category across the window.

Example: if Billing executed 800 times across the week and 2 of those executions failed, Billing strict failure ratio is 0.25%. That does not mean 0.25% of pipelines failed; it means 0.25% of observed Billing executions ended in failed.

How computed

Category total executions means the sum of that category's observed test executions across every daily-suite run in the selected window.

Strict Failure Ratio = failed executions for that category divided by total executions for that category across the window.

Non-pass Ratio = (failed + pending + skipped) executions for that category divided by total executions for that category across the window.

Example: if Billing executed 800 times across the week and 2 of those executions failed, Billing strict failure ratio is 0.25%. That does not mean 0.25% of pipelines failed; it means 0.25% of observed Billing executions ended in failed.

Category	Total	Failed	Skipped	Failure Ratio	Non-pass Ratio	Runs With Failures
Billing	756	2	0	0.26%	0.26%	1
Web	4684	7	2	0.15%	0.19%	3
Frontend	1840	5	0	0.27%	0.27%	2
Library	602	0	0	0.00%	0.00%	0

Billing

Pend 0Skip 0Runs 1

2

0.26%

756

Web

Pend 0Skip 2Runs 3

7

0.15%

0.19%

4684

Frontend

Pend 0Skip 0Runs 2

5

0.27%

1840

Library

Pend 0Skip 0Runs 0

0

0.00%

602

Recent Runs

Recent Daily Suite Runs

Date	Pipeline	Suites	Status	Summary
2026-04-04 18:24	151342	BillingWebFrontendLibrary	FAILED	Total 1233 \| Passed 1230 \| Failed 3
2026-04-05 18:24	151350	BillingWebFrontendLibrary	PASSED	Total 1233 \| Passed 1233 \| Failed 0
2026-04-06 18:24	151439	BillingWebFrontendLibrary	PASSED	Total 1233 \| Passed 1233 \| Failed 0
2026-04-07 18:23	151598	BillingWebFrontendLibrary	FAILED	Total 1233 \| Passed 1230 \| Failed 3
2026-04-08 18:33	151799	BillingWebFrontendLibrary	FAILED	Total 1238 \| Passed 1234 \| Failed 4
2026-04-09 19:03	152037	BillingWebFrontendLibrary	FAILED	Total 468 \| Passed 465 \| Failed 1
2026-04-10 18:25	152052	BillingWebFrontendLibrary	FAILED	Total 1244 \| Passed 1241 \| Failed 3

2026-04-04 18:24Pipeline 151342BillingWebFrontendLibrary

FAILED

T 1233 | P 1230 | F 3 | Pend 0

2026-04-05 18:24Pipeline 151350BillingWebFrontendLibrary

PASSED

T 1233 | P 1233 | F 0 | Pend 0

2026-04-06 18:24Pipeline 151439BillingWebFrontendLibrary

PASSED

T 1233 | P 1233 | F 0 | Pend 0

2026-04-07 18:23Pipeline 151598BillingWebFrontendLibrary

FAILED

T 1233 | P 1230 | F 3 | Pend 0

2026-04-08 18:33Pipeline 151799BillingWebFrontendLibrary

FAILED

T 1238 | P 1234 | F 4 | Pend 0

2026-04-09 19:03Pipeline 152037BillingWebFrontendLibrary

FAILED

T 468 | P 465 | F 1 | Pend 0

2026-04-10 18:25Pipeline 152052BillingWebFrontendLibrary

FAILED

T 1244 | P 1241 | F 3 | Pend 0

Recent Smoke Attempts

Date	Suite	Pipeline	Job	Status	Passed	Failed	Duration
2026-04-07 14:23	University	151482	University smoke	PASSED	10	0	5m 10s
2026-04-07 15:39	Frontend	151482	Frontend smoke	PASSED	110	0	3m 01s
2026-04-07 17:04	University	151576	University smoke	PASSED	10	0	2m 50s
2026-04-07 17:07	Frontend	151576	Frontend smoke	PASSED	110	0	3m 08s
2026-04-07 18:30	Frontend	151600	Frontend smoke	PASSED	110	0	3m 11s
2026-04-08 14:28	Frontend	151693	Frontend smoke	PASSED	110	0	4m 08s
2026-04-08 15:02	Frontend	151712	Frontend smoke	PASSED	110	0	3m 12s
2026-04-08 17:12	Frontend	151787	Frontend smoke	FAILED	n/a	n/a	0m 02s
2026-04-08 18:13	Frontend	151798	Frontend smoke	PASSED	110	0	3m 31s
2026-04-08 22:10	Frontend	151810	Frontend smoke	PASSED	110	0	3m 21s
2026-04-08 23:44	Frontend	151820	Frontend smoke	PASSED	110	0	3m 07s
2026-04-09 00:48	Frontend	151824	Frontend smoke	PASSED	110	0	3m 11s
2026-04-09 14:30	Frontend	151954	Frontend smoke	PASSED	110	0	3m 48s
2026-04-09 14:41	Frontend	151974	Frontend smoke	PASSED	110	0	3m 24s
2026-04-09 16:38	Frontend	152031	Frontend smoke	PASSED	110	0	3m 13s
2026-04-09 23:04	Frontend	152040	Frontend smoke	PASSED	110	0	3m 30s
2026-04-10 10:20	Frontend	152043	Frontend smoke	PASSED	110	0	3m 10s
2026-04-10 12:18	Frontend	152051	Frontend smoke	PASSED	110	0	3m 03s

2026-04-07 14:23UniversityPipeline 151482Job University smoke

PASSED

P 10 | F 0 | 5m 10s

2026-04-07 15:39FrontendPipeline 151482Job Frontend smoke

PASSED

P 110 | F 0 | 3m 01s

2026-04-07 17:04UniversityPipeline 151576Job University smoke

PASSED

P 10 | F 0 | 2m 50s

2026-04-07 17:07FrontendPipeline 151576Job Frontend smoke

PASSED

P 110 | F 0 | 3m 08s

2026-04-07 18:30FrontendPipeline 151600Job Frontend smoke

PASSED

P 110 | F 0 | 3m 11s

2026-04-08 14:28FrontendPipeline 151693Job Frontend smoke

PASSED

P 110 | F 0 | 4m 08s

2026-04-08 15:02FrontendPipeline 151712Job Frontend smoke

PASSED

P 110 | F 0 | 3m 12s

2026-04-08 17:12FrontendPipeline 151787Job Frontend smoke

FAILED

P n/a | F n/a | 0m 02s

2026-04-08 18:13FrontendPipeline 151798Job Frontend smoke

PASSED

P 110 | F 0 | 3m 31s

2026-04-08 22:10FrontendPipeline 151810Job Frontend smoke

PASSED

P 110 | F 0 | 3m 21s

2026-04-08 23:44FrontendPipeline 151820Job Frontend smoke

PASSED

P 110 | F 0 | 3m 07s

2026-04-09 00:48FrontendPipeline 151824Job Frontend smoke

PASSED

P 110 | F 0 | 3m 11s

2026-04-09 14:30FrontendPipeline 151954Job Frontend smoke

PASSED

P 110 | F 0 | 3m 48s

2026-04-09 14:41FrontendPipeline 151974Job Frontend smoke

PASSED

P 110 | F 0 | 3m 24s

2026-04-09 16:38FrontendPipeline 152031Job Frontend smoke

PASSED

P 110 | F 0 | 3m 13s

2026-04-09 23:04FrontendPipeline 152040Job Frontend smoke

PASSED

P 110 | F 0 | 3m 30s

2026-04-10 10:20FrontendPipeline 152043Job Frontend smoke

PASSED

P 110 | F 0 | 3m 10s

2026-04-10 12:18FrontendPipeline 152051Job Frontend smoke

PASSED

P 110 | F 0 | 3m 03s

Smoke Suite Breakdown

Frontend

16 attempts across 16 pipelines

94% green

Passed15

Failed1

Incomplete1

Avg runtime3m 08s

Median passing runtime3m 12s

Pipelines16

University

2 attempts across 2 pipelines

100% green

Passed2

Failed0

Incomplete0

Avg runtime4m 00s

Median passing runtime4m 00s

Pipelines2