CI/CD Billing Block Incident Report
Incident Date: 2025-11-27 Status: Resolved Severity: P1 - All CI Blocked Root Cause: GitHub billing/payment processing issue
Executive Summary
On November 27, 2025, all GitHub-hosted runners became unavailable due to GitHub billing issues, completely blocking CI/CD pipelines. The solution was to change all workflows to run the decide_runner job on self-hosted runners first, with comprehensive cross-platform support for Windows, macOS, and Linux.
Impact
What Failed
- All GitHub-hosted runners (ubuntu-latest, etc.) were unavailable
- Error: “The job was not started because recent account payments have failed”
- All CI pipelines completely blocked
- No PR validation or deployments possible
Scope
- Affected Repositories: archery-apprentice, archery-apprentice-docs
- Affected Workflows: 4 total
- android-ci.yml
- non-android-check.yml
- deploy-to-play-store.yml
- deploy-quartz.yml (docs repo)
- Duration: Until fix deployed
Root Cause Analysis
The Problem
GitHub’s billing system blocked all hosted runner usage:
Error: The job was not started because recent account payments have failed
or your spending limit has been reached. Please update your payment information.
This affected ALL jobs, including the decide_runner job that determines which runner to use for subsequent jobs. Since decide_runner couldn’t run, no other jobs could start.
Why Previous Architecture Failed
The original hybrid runner system had decide_runner running on GitHub-hosted runners:
# BEFORE - decide_runner used GitHub-hosted
decide_runner:
runs-on: ubuntu-latest # <-- BLOCKED by billing issueThis created a chicken-and-egg problem: the job that decides between GitHub-hosted and self-hosted couldn’t run because it needed a GitHub-hosted runner.
Resolution
Strategy
Change decide_runner to run on self-hosted runners first. Self-hosted runners:
- Have no quota limits
- Aren’t affected by GitHub billing issues
- Are always available (when online)
Implementation
# AFTER - decide_runner uses self-hosted
decide_runner:
runs-on: self-hosted # <-- Always available
outputs:
# Use outputs from whichever platform-specific step ran
runner_label: ${{ steps.decision.outputs.runner_label || steps.decision_unix.outputs.runner_label }}
should_skip: ${{ steps.decision.outputs.should_skip || steps.decision_unix.outputs.should_skip }}Challenge: Cross-Platform Self-Hosted
Self-hosted runners can be on any platform:
- Windows desktop (primary)
- macOS laptop (secondary)
- Linux server (future)
The decide_runner job must work on ALL platforms.
Solution: Platform-Specific Steps with Output Merging
decide_runner:
runs-on: self-hosted
outputs:
# Merge outputs from platform-specific steps
runner_label: ${{ steps.decision.outputs.runner_label || steps.decision_unix.outputs.runner_label }}
should_skip: ${{ steps.decision.outputs.should_skip || steps.decision_unix.outputs.should_skip }}
steps:
- name: Determine runner (Windows)
id: decision
if: runner.os == 'Windows'
shell: powershell
run: |
# PowerShell-specific logic
"runner_label=$RUNNER" | Out-File -FilePath $env:GITHUB_OUTPUT -Append
- name: Determine runner (macOS/Linux)
id: decision_unix
if: runner.os != 'Windows'
shell: bash
run: |
# Bash-specific logic
echo "runner_label=$RUNNER" >> $GITHUB_OUTPUTFiles Modified
archery-apprentice Repository
| File | Lines Changed | Changes |
|---|---|---|
.github/workflows/android-ci.yml | 46-159 | Cross-platform decide_runner |
.github/workflows/non-android-check.yml | 39-113 | Cross-platform decide_runner |
.github/workflows/deploy-to-play-store.yml | 24-127 | Cross-platform decide_runner |
archery-apprentice-docs Repository
| File | Changes |
|---|---|
.github/workflows/deploy-quartz.yml | Added BILLING_FIX marker |
Technical Patterns Introduced
1. Output Merging Pattern
outputs:
runner_label: ${{ steps.decision.outputs.runner_label || steps.decision_unix.outputs.runner_label }}The || operator returns the first truthy value, allowing either Windows or Unix step to populate the output.
2. PowerShell Error Handling for Git
$ErrorActionPreference = "SilentlyContinue"
try {
$COMMIT_MSG = git log -1 --pretty=%B $PR_HEAD_SHA 2>&1
if ($LASTEXITCODE -ne 0) { $COMMIT_MSG = "" }
} catch { $COMMIT_MSG = "" }
$ErrorActionPreference = "Stop"3. PowerShell GITHUB_OUTPUT Syntax
"variable=value" | Out-File -FilePath $env:GITHUB_OUTPUT -Append4. BSD sed Compatibility
sed -i.bak "s/old/new/" file && rm file.bakPrevention Measures
Immediate
- Default to self-hosted:
decide_runneralways runs on self-hosted first - Cross-platform support: All decision logic works on Windows, macOS, and Linux
- Documentation: New Cross-Platform Runner Patterns guide
Long-term
- Monitor billing: Set up alerts for GitHub billing issues
- Redundant runners: Ensure multiple self-hosted runners on different platforms
- Graceful degradation: System should always fall back to self-hosted if GitHub unavailable
Lessons Learned
1. Self-Hosted First
For critical infrastructure jobs (runner selection, skip detection), always run on self-hosted first. GitHub-hosted should be an optimization, not a requirement.
2. Platform Agnostic Design
Any job that might run on self-hosted must handle all possible platforms. Don’t assume Windows just because that’s the primary self-hosted runner.
3. Output Merging
The ${{ a || b }} pattern is essential for platform-specific steps that both need to produce the same output.
4. PowerShell Nuances
PowerShell error handling is fundamentally different from bash:
$ErrorActionPreferencecontrols behavior$LASTEXITCODEchecks command successtry/catchfor exception handlingOut-File -Appendfor output files
Timeline
| Time | Event |
|---|---|
| T+0 | GitHub billing block begins |
| T+10m | CI failures noticed across all PRs |
| T+20m | Root cause identified (billing) |
| T+30m | Decision to move decide_runner to self-hosted |
| T+60m | Cross-platform patterns implemented |
| T+90m | All workflows updated and tested |
| T+120m | Fix deployed to all repositories |
Related Documentation
- Cross-Platform Runner Patterns - Technical implementation guide
- Multi-Platform Workflows - November 2025 cascade analysis
- Platform Compatibility Matrix - Tool availability reference
- Hybrid Runner System - Original architecture
Marker
All affected workflow files contain the marker:
# BILLING_FIX_2025_11_27: Run on self-hosted first to avoid GitHub billing blocksSearch for this marker to find all related changes.
Tags: incident-report ci-cd billing cross-platform infrastructure Status: Resolved Post-Incident Review: Complete