Engineering Council Test Reliability Report

Executive Snapshot

8

Daily Runs

1/8

Daily Green

331m 02s

Avg Daily Runtime

4

Smoke Attempts

4/4

Smoke Green

3m 05s

Avg Smoke Runtime

3m 04s

Median Smoke Time

0

Current Green Streak

Executive Analysis

Bottom line: the regression system is informative but not calm. The data suggest repeatable problem areas rather than random breakage, which means focused ownership should move the needle quickly.

What Matters

Daily regression passed 1 of 8 runs (12.5%), with a current green streak of 0 and a best streak of 1 in this window. The latest daily run (152618) failed, so the system is ending the week under tension rather than in a clean state. 3 failed run(s) never reached complete daily-suite counts, which points to some infrastructure or setup noise mixed into the product signal.
Smoke passed 4 of 4 attempts (100.0%) across 2 production pipelines.
Failure concentration is not random: Library has the highest strict failure ratio at 0.69%, while Library has the broadest non-pass footprint at 0.69%.
Frontend is the weakest smoke surface in this window at 2/2 green (100.0%).
Daily-suite runtime averaged 331m 02s, while observed daily test volume moved from 1,159 to 1,244.

Engineering Analysis

A release gate should fail loudly for product regressions and quietly for infrastructure noise. Rerun recoveries plus incomplete daily or smoke attempts suggest those two failure modes are still partially mixed together.
The failure profile is concentrated enough to act on. Library and Library are carrying the strongest signal, which means reliability work should be assigned by category ownership instead of treating the suite as one undifferentiated problem.
The broader daily suite is carrying more instability than smoke, which usually means product regressions are escaping into wider coverage areas even when the narrow deploy gate looks acceptable.
The daily suite is now large enough that runtime itself is becoming a management variable at 331m 02s average duration. At that size, every additional flaky or redundant test has a measurable cost on feedback speed.

Recommended Actions

Split incomplete execution failures from real assertion failures in the report narrative. Setup breakage should stay visible, but it should not look identical to a product regression in the executive readout.
Assign one owner to Library for the next cycle and expect a short written burn-down: top failing tests, suspected root causes, flake versus regression breakdown, and what gets fixed or quarantined first.
Treat the daily regression suite like an operations queue until it is calm again: triage failures after each red run, close known-noise items fast, and avoid letting multiple unrelated red signals pile up between runs.
Put Frontend smoke under closer guardrails for the next release cycle. It is the best place to improve first-pass deploy confidence quickly.

Improvement Ideas

Introduce a small reliability budget for tests: every flaky or quarantined case needs an owner and an expiry, and the team should review that budget weekly the same way it reviews bugs or incidents.
Track first-fail to root-cause time as a core metric. Fast diagnosis is as important as raw pass rate because the practical value of a test gate depends on how quickly it helps the team recover.
Define a runtime budget per suite and require justification when test count or duration grows. Reliable feedback systems stay trusted when they remain both stable and proportionate.

Category Execution Ratios

How computed

Category total executions means the sum of that category's observed test executions across every daily-suite run in the selected window.

Strict Failure Ratio = failed executions for that category divided by total executions for that category across the window.

Non-pass Ratio = (failed + pending + skipped) executions for that category divided by total executions for that category across the window.

Example: if Billing executed 800 times across the week and 2 of those executions failed, Billing strict failure ratio is 0.25%. That does not mean 0.25% of pipelines failed; it means 0.25% of observed Billing executions ended in failed.

How computed

Category total executions means the sum of that category's observed test executions across every daily-suite run in the selected window.

Strict Failure Ratio = failed executions for that category divided by total executions for that category across the window.

Non-pass Ratio = (failed + pending + skipped) executions for that category divided by total executions for that category across the window.

Example: if Billing executed 800 times across the week and 2 of those executions failed, Billing strict failure ratio is 0.25%. That does not mean 0.25% of pipelines failed; it means 0.25% of observed Billing executions ended in failed.

Strict Failure Ratio

Share of category executions that ended in failed across all daily runs in this window.

Billing0.12%

Web0.14%

Frontend0.60%

Library0.69%

Non-pass Ratio

Share of category executions that ended in failed, pending, or skipped across all daily runs in this window.

Billing0.12%

Web0.14%

Frontend0.60%

Library0.69%

Category Aggregate Table

How computed

Category total executions means the sum of that category's observed test executions across every daily-suite run in the selected window.

Strict Failure Ratio = failed executions for that category divided by total executions for that category across the window.

Non-pass Ratio = (failed + pending + skipped) executions for that category divided by total executions for that category across the window.

Example: if Billing executed 800 times across the week and 2 of those executions failed, Billing strict failure ratio is 0.25%. That does not mean 0.25% of pipelines failed; it means 0.25% of observed Billing executions ended in failed.

How computed

Category total executions means the sum of that category's observed test executions across every daily-suite run in the selected window.

Strict Failure Ratio = failed executions for that category divided by total executions for that category across the window.

Non-pass Ratio = (failed + pending + skipped) executions for that category divided by total executions for that category across the window.

Example: if Billing executed 800 times across the week and 2 of those executions failed, Billing strict failure ratio is 0.25%. That does not mean 0.25% of pipelines failed; it means 0.25% of observed Billing executions ended in failed.

Category	Total	Failed	Failure Ratio	Non-pass Ratio	Runs With Failures
Billing	864	1	0.12%	0.12%	1
Web	6240	9	0.14%	0.14%	3
Frontend	2160	13	0.60%	0.60%	7
Library	433	3	0.69%	0.69%	3

Billing

Pend 0Skip 0Runs 1

1

0.12%

864

Web

Pend 0Skip 0Runs 3

9

0.14%

6240

Frontend

Pend 0Skip 0Runs 7

13

0.60%

2160

Library

Pend 0Skip 0Runs 3

3

0.69%

433

Recent Runs

Recent Daily Suite Runs

Date	Pipeline	Suites	Status	Summary
2026-04-11 18:25	152054	BillingWebFrontendLibrary	FAILED	Total 1159 \| Passed 1157 \| Failed 2 \| Incomplete suite counts
2026-04-13 14:34	152056	BillingWebFrontendLibrary	FAILED	Total 1159 \| Passed 1157 \| Failed 2 \| Incomplete suite counts
2026-04-14 14:42	152058	BillingWebFrontendLibrary	FAILED	Total 1159 \| Passed 1155 \| Failed 4 \| Incomplete suite counts
2026-04-14 18:25	152092	BillingWebFrontendLibrary	FAILED	Total 1244 \| Passed 1243 \| Failed 1
2026-04-15 18:24	152229	BillingWebFrontendLibrary	PASSED	Total 1244 \| Passed 1244 \| Failed 0
2026-04-16 19:10	152398	BillingWebFrontendLibrary	FAILED	Total 1244 \| Passed 1239 \| Failed 5
2026-04-17 18:25	152610	BillingWebFrontendLibrary	FAILED	Total 1244 \| Passed 1237 \| Failed 7
2026-04-18 18:25	152618	BillingWebFrontendLibrary	FAILED	Total 1244 \| Passed 1239 \| Failed 5

2026-04-11 18:25Pipeline 152054BillingWebFrontendLibrary

FAILED

T 1159 | P 1157 | F 2 | Pend 0 | Incomplete

2026-04-13 14:34Pipeline 152056BillingWebFrontendLibrary

FAILED

T 1159 | P 1157 | F 2 | Pend 0 | Incomplete

2026-04-14 14:42Pipeline 152058BillingWebFrontendLibrary

FAILED

T 1159 | P 1155 | F 4 | Pend 0 | Incomplete

2026-04-14 18:25Pipeline 152092BillingWebFrontendLibrary

FAILED

T 1244 | P 1243 | F 1 | Pend 0

2026-04-15 18:24Pipeline 152229BillingWebFrontendLibrary

PASSED

T 1244 | P 1244 | F 0 | Pend 0

2026-04-16 19:10Pipeline 152398BillingWebFrontendLibrary

FAILED

T 1244 | P 1239 | F 5 | Pend 0

2026-04-17 18:25Pipeline 152610BillingWebFrontendLibrary

FAILED

T 1244 | P 1237 | F 7 | Pend 0

2026-04-18 18:25Pipeline 152618BillingWebFrontendLibrary

FAILED

T 1244 | P 1239 | F 5 | Pend 0

Recent Smoke Attempts

Date	Suite	Pipeline	Job	Status	Passed	Duration
2026-04-15 16:56	University	152212	University smoke	PASSED	10	2m 47s
2026-04-15 17:00	Frontend	152212	Frontend smoke	PASSED	110	3m 16s
2026-04-17 13:12	University	152536	University smoke	PASSED	10	2m 53s
2026-04-17 13:16	Frontend	152536	Frontend smoke	PASSED	110	3m 22s

2026-04-15 16:56UniversityPipeline 152212Job University smoke

PASSED

P 10 | F 0 | 2m 47s

2026-04-15 17:00FrontendPipeline 152212Job Frontend smoke

PASSED

P 110 | F 0 | 3m 16s

2026-04-17 13:12UniversityPipeline 152536Job University smoke

PASSED

P 10 | F 0 | 2m 53s

2026-04-17 13:16FrontendPipeline 152536Job Frontend smoke

PASSED

P 110 | F 0 | 3m 22s

Smoke Suite Breakdown

Frontend

2 attempts across 2 pipelines

100% green

Passed2

Failed0

Incomplete0

Avg runtime3m 19s

Median passing runtime3m 19s

Pipelines2

University

2 attempts across 2 pipelines

100% green

Passed2

Failed0

Incomplete0

Avg runtime2m 50s

Median passing runtime2m 50s

Pipelines2