Engineering Council Test Reliability Report

Executive Snapshot

7

Daily Runs

0/7

Daily Green

65m 48s

Avg Daily Runtime

14

Smoke Attempts

11/14

Smoke Green

3m 18s

Avg Smoke Runtime

4m 13s

Median Smoke Time

0

Current Green Streak

Executive Analysis

Bottom line: release confidence is unstable in both the broad regression path and the deploy smoke path. The immediate job is to separate real product regressions from execution noise, then burn down the concentrated failure clusters.

What Matters

Daily regression passed 0 of 7 runs (0.0%), with a current green streak of 0 and a best streak of 0 in this window. The latest daily run (162871) failed, so the system is ending the week under tension rather than in a clean state. 7 failed run(s) never reached complete daily-suite counts, which points to some infrastructure or setup noise mixed into the product signal.
Smoke passed 11 of 14 attempts (78.6%) across 9 production pipelines. 3 failed attempt(s) never reached test execution counts at all.
Failure concentration is not random: Library has the highest strict failure ratio at 0.97%, while Billing has the broadest non-pass footprint at 3.51%.
Frontend is the weakest smoke surface in this window at 7/9 green (77.8%).
Daily-suite runtime averaged 65m 48s.

Engineering Analysis

A release gate should fail loudly for product regressions and quietly for infrastructure noise. Rerun recoveries plus incomplete daily or smoke attempts suggest those two failure modes are still partially mixed together.
The failure profile is concentrated enough to act on. Library and Billing are carrying the strongest signal, which means reliability work should be assigned by category ownership instead of treating the suite as one undifferentiated problem.
The broader daily suite is carrying more instability than smoke, which usually means product regressions are escaping into wider coverage areas even when the narrow deploy gate looks acceptable.
The daily suite is now large enough that runtime itself is becoming a management variable at 65m 48s average duration. At that size, every additional flaky or redundant test has a measurable cost on feedback speed.

Recommended Actions

Split incomplete execution failures from real assertion failures in the report narrative. Setup breakage should stay visible, but it should not look identical to a product regression in the executive readout.
Assign one owner to Library for the next cycle and expect a short written burn-down: top failing tests, suspected root causes, flake versus regression breakdown, and what gets fixed or quarantined first.
Treat the daily regression suite like an operations queue until it is calm again: triage failures after each red run, close known-noise items fast, and avoid letting multiple unrelated red signals pile up between runs.
Put Frontend smoke under closer guardrails for the next release cycle. It is the best place to improve first-pass deploy confidence quickly.

Improvement Ideas

Introduce a small reliability budget for tests: every flaky or quarantined case needs an owner and an expiry, and the team should review that budget weekly the same way it reviews bugs or incidents.
Track first-fail to root-cause time as a core metric. Fast diagnosis is as important as raw pass rate because the practical value of a test gate depends on how quickly it helps the team recover.
Define a runtime budget per suite and require justification when test count or duration grows. Reliable feedback systems stay trusted when they remain both stable and proportionate.

Category Execution Ratios

How computed

Category total executions means the sum of that category's observed test executions across every daily-suite run in the selected window.

Strict Failure Ratio = failed executions for that category divided by total executions for that category across the window.

Non-pass Ratio = (failed + pending + skipped) executions for that category divided by total executions for that category across the window.

Example: if Billing executed 800 times across the week and 2 of those executions failed, Billing strict failure ratio is 0.25%. That does not mean 0.25% of pipelines failed; it means 0.25% of observed Billing executions ended in failed.

How computed

Category total executions means the sum of that category's observed test executions across every daily-suite run in the selected window.

Strict Failure Ratio = failed executions for that category divided by total executions for that category across the window.

Non-pass Ratio = (failed + pending + skipped) executions for that category divided by total executions for that category across the window.

Example: if Billing executed 800 times across the week and 2 of those executions failed, Billing strict failure ratio is 0.25%. That does not mean 0.25% of pipelines failed; it means 0.25% of observed Billing executions ended in failed.

Strict Failure Ratio

Share of category executions that ended in failed across all daily runs in this window.

Billing0.26%

Web0.00%

Frontend0.00%

Library0.97%

Non-pass Ratio

Share of category executions that ended in failed, pending, or skipped across all daily runs in this window.

Billing3.51%

Web0.00%

Frontend0.00%

Library0.97%

Category Aggregate Table

How computed

Category total executions means the sum of that category's observed test executions across every daily-suite run in the selected window.

Strict Failure Ratio = failed executions for that category divided by total executions for that category across the window.

Non-pass Ratio = (failed + pending + skipped) executions for that category divided by total executions for that category across the window.

Example: if Billing executed 800 times across the week and 2 of those executions failed, Billing strict failure ratio is 0.25%. That does not mean 0.25% of pipelines failed; it means 0.25% of observed Billing executions ended in failed.

How computed

Category total executions means the sum of that category's observed test executions across every daily-suite run in the selected window.

Strict Failure Ratio = failed executions for that category divided by total executions for that category across the window.

Non-pass Ratio = (failed + pending + skipped) executions for that category divided by total executions for that category across the window.

Example: if Billing executed 800 times across the week and 2 of those executions failed, Billing strict failure ratio is 0.25%. That does not mean 0.25% of pipelines failed; it means 0.25% of observed Billing executions ended in failed.

Category	Total	Failed	Skipped	Failure Ratio	Non-pass Ratio	Runs With Failures
Billing	769	2	25	0.26%	3.51%	2
Web	0	0	0	0.00%	0.00%	7
Frontend	0	0	0	0.00%	0.00%	7
Library	517	5	0	0.97%	0.97%	2

Billing

Pend 0Skip 25Runs 2

2

0.26%

3.51%

769

Web

Pend 0Skip 0Runs 7

0

0.00%

0

Frontend

Pend 0Skip 0Runs 7

0

0.00%

0

Library

Pend 0Skip 0Runs 2

5

0.97%

517

Recent Runs

Recent Daily Suite Runs

Date	Pipeline	Suites	Status	Summary
2026-06-28 18:13	162157	BillingWebFrontendLibrary	FAILED	Total 214 \| Passed 214 \| Failed 0 \| Incomplete suite counts
2026-06-29 18:18	162305	BillingWebFrontendLibrary	FAILED	Total 214 \| Passed 210 \| Failed 4 \| Incomplete suite counts
2026-06-30 18:11	162403	BillingWebFrontendLibrary	FAILED	Total 2 \| Passed 0 \| Failed 2 \| Incomplete suite counts
2026-07-01 18:14	162600	BillingWebFrontendLibrary	FAILED	Total 214 \| Passed 214 \| Failed 0 \| Incomplete suite counts
2026-07-02 18:16	162742	BillingWebFrontendLibrary	FAILED	Total 214 \| Passed 188 \| Failed 1 \| Incomplete suite counts
2026-07-04 00:33	162866	BillingWebFrontendLibrary	FAILED	Total 214 \| Passed 214 \| Failed 0 \| Incomplete suite counts
2026-07-04 18:16	162871	BillingWebFrontendLibrary	FAILED	Total 214 \| Passed 214 \| Failed 0 \| Incomplete suite counts

2026-06-28 18:13Pipeline 162157BillingWebFrontendLibrary

FAILED

T 214 | P 214 | F 0 | Pend 0 | Incomplete

2026-06-29 18:18Pipeline 162305BillingWebFrontendLibrary

FAILED

T 214 | P 210 | F 4 | Pend 0 | Incomplete

2026-06-30 18:11Pipeline 162403BillingWebFrontendLibrary

FAILED

T 2 | P 0 | F 2 | Pend 0 | Incomplete

2026-07-01 18:14Pipeline 162600BillingWebFrontendLibrary

FAILED

T 214 | P 214 | F 0 | Pend 0 | Incomplete

2026-07-02 18:16Pipeline 162742BillingWebFrontendLibrary

FAILED

T 214 | P 188 | F 1 | Pend 0 | Incomplete

2026-07-04 00:33Pipeline 162866BillingWebFrontendLibrary

FAILED

T 214 | P 214 | F 0 | Pend 0 | Incomplete

2026-07-04 18:16Pipeline 162871BillingWebFrontendLibrary

FAILED

T 214 | P 214 | F 0 | Pend 0 | Incomplete

Recent Smoke Attempts

Date	Suite	Pipeline	Job	Status	Passed	Failed	Duration
2026-06-29 09:38	Frontend	162170	Frontend smoke	FAILED	n/a	n/a	0m 07s
2026-06-29 16:24	University	162299	University smoke	FAILED	n/a	n/a	0m 04s
2026-06-29 16:30	Frontend	162299	Frontend smoke	FAILED	n/a	n/a	1m 07s
2026-07-01 13:06	University	162501	University smoke	PASSED	60	0	3m 19s
2026-07-01 13:10	Frontend	162501	Frontend smoke	PASSED	110	0	4m 23s
2026-07-02 09:46	University	162618	University smoke	PASSED	60	0	3m 40s
2026-07-02 09:50	Frontend	162618	Frontend smoke	PASSED	110	0	4m 49s
2026-07-02 18:57	University	162744	University smoke	PASSED	60	0	3m 26s
2026-07-02 18:59	Frontend	162744	Frontend smoke	PASSED	110	0	4m 13s
2026-07-02 22:18	University	162748	University smoke	PASSED	60	0	3m 12s
2026-07-02 22:22	Frontend	162748	Frontend smoke	PASSED	110	0	4m 41s
2026-07-03 09:31	Frontend	162757	Frontend smoke	PASSED	110	0	4m 02s
2026-07-03 09:42	Frontend	162780	Frontend smoke	PASSED	110	0	4m 24s
2026-07-03 12:40	Frontend	162826	Frontend smoke	PASSED	110	0	4m 50s

2026-06-29 09:38FrontendPipeline 162170Job Frontend smoke

FAILED

P n/a | F n/a | 0m 07s

2026-06-29 16:24UniversityPipeline 162299Job University smoke

FAILED

P n/a | F n/a | 0m 04s

2026-06-29 16:30FrontendPipeline 162299Job Frontend smoke

FAILED

P n/a | F n/a | 1m 07s

2026-07-01 13:06UniversityPipeline 162501Job University smoke

PASSED

P 60 | F 0 | 3m 19s

2026-07-01 13:10FrontendPipeline 162501Job Frontend smoke

PASSED

P 110 | F 0 | 4m 23s

2026-07-02 09:46UniversityPipeline 162618Job University smoke

PASSED

P 60 | F 0 | 3m 40s

2026-07-02 09:50FrontendPipeline 162618Job Frontend smoke

PASSED

P 110 | F 0 | 4m 49s

2026-07-02 18:57UniversityPipeline 162744Job University smoke

PASSED

P 60 | F 0 | 3m 26s

2026-07-02 18:59FrontendPipeline 162744Job Frontend smoke

PASSED

P 110 | F 0 | 4m 13s

2026-07-02 22:18UniversityPipeline 162748Job University smoke

PASSED

P 60 | F 0 | 3m 12s

2026-07-02 22:22FrontendPipeline 162748Job Frontend smoke

PASSED

P 110 | F 0 | 4m 41s

2026-07-03 09:31FrontendPipeline 162757Job Frontend smoke

PASSED

P 110 | F 0 | 4m 02s

2026-07-03 09:42FrontendPipeline 162780Job Frontend smoke

PASSED

P 110 | F 0 | 4m 24s

2026-07-03 12:40FrontendPipeline 162826Job Frontend smoke

PASSED

P 110 | F 0 | 4m 50s

Smoke Suite Breakdown

Frontend

9 attempts across 9 pipelines

78% green

Passed7

Failed2

Incomplete2

Avg runtime3m 37s

Median passing runtime4m 24s

Pipelines9

University

5 attempts across 5 pipelines

80% green

Passed4

Failed1

Incomplete1

Avg runtime2m 44s

Median passing runtime3m 22s

Pipelines5