CI/CD Billing Block Incident Report

Incident Date: 2025-11-27 Status: Resolved Severity: P1 - All CI Blocked Root Cause: GitHub billing/payment processing issue


Executive Summary

On November 27, 2025, all GitHub-hosted runners became unavailable due to GitHub billing issues, completely blocking CI/CD pipelines. The solution was to change all workflows to run the decide_runner job on self-hosted runners first, with comprehensive cross-platform support for Windows, macOS, and Linux.


Impact

What Failed

  • All GitHub-hosted runners (ubuntu-latest, etc.) were unavailable
  • Error: “The job was not started because recent account payments have failed”
  • All CI pipelines completely blocked
  • No PR validation or deployments possible

Scope

  • Affected Repositories: archery-apprentice, archery-apprentice-docs
  • Affected Workflows: 4 total
    • android-ci.yml
    • non-android-check.yml
    • deploy-to-play-store.yml
    • deploy-quartz.yml (docs repo)
  • Duration: Until fix deployed

Root Cause Analysis

The Problem

GitHub’s billing system blocked all hosted runner usage:

Error: The job was not started because recent account payments have failed
or your spending limit has been reached. Please update your payment information.

This affected ALL jobs, including the decide_runner job that determines which runner to use for subsequent jobs. Since decide_runner couldn’t run, no other jobs could start.

Why Previous Architecture Failed

The original hybrid runner system had decide_runner running on GitHub-hosted runners:

# BEFORE - decide_runner used GitHub-hosted
decide_runner:
  runs-on: ubuntu-latest  # <-- BLOCKED by billing issue

This created a chicken-and-egg problem: the job that decides between GitHub-hosted and self-hosted couldn’t run because it needed a GitHub-hosted runner.


Resolution

Strategy

Change decide_runner to run on self-hosted runners first. Self-hosted runners:

  • Have no quota limits
  • Aren’t affected by GitHub billing issues
  • Are always available (when online)

Implementation

# AFTER - decide_runner uses self-hosted
decide_runner:
  runs-on: self-hosted  # <-- Always available
  outputs:
    # Use outputs from whichever platform-specific step ran
    runner_label: ${{ steps.decision.outputs.runner_label || steps.decision_unix.outputs.runner_label }}
    should_skip: ${{ steps.decision.outputs.should_skip || steps.decision_unix.outputs.should_skip }}

Challenge: Cross-Platform Self-Hosted

Self-hosted runners can be on any platform:

  • Windows desktop (primary)
  • macOS laptop (secondary)
  • Linux server (future)

The decide_runner job must work on ALL platforms.

Solution: Platform-Specific Steps with Output Merging

decide_runner:
  runs-on: self-hosted
  outputs:
    # Merge outputs from platform-specific steps
    runner_label: ${{ steps.decision.outputs.runner_label || steps.decision_unix.outputs.runner_label }}
    should_skip: ${{ steps.decision.outputs.should_skip || steps.decision_unix.outputs.should_skip }}
  steps:
    - name: Determine runner (Windows)
      id: decision
      if: runner.os == 'Windows'
      shell: powershell
      run: |
        # PowerShell-specific logic
        "runner_label=$RUNNER" | Out-File -FilePath $env:GITHUB_OUTPUT -Append
 
    - name: Determine runner (macOS/Linux)
      id: decision_unix
      if: runner.os != 'Windows'
      shell: bash
      run: |
        # Bash-specific logic
        echo "runner_label=$RUNNER" >> $GITHUB_OUTPUT

Files Modified

archery-apprentice Repository

FileLines ChangedChanges
.github/workflows/android-ci.yml46-159Cross-platform decide_runner
.github/workflows/non-android-check.yml39-113Cross-platform decide_runner
.github/workflows/deploy-to-play-store.yml24-127Cross-platform decide_runner

archery-apprentice-docs Repository

FileChanges
.github/workflows/deploy-quartz.ymlAdded BILLING_FIX marker

Technical Patterns Introduced

1. Output Merging Pattern

outputs:
  runner_label: ${{ steps.decision.outputs.runner_label || steps.decision_unix.outputs.runner_label }}

The || operator returns the first truthy value, allowing either Windows or Unix step to populate the output.

2. PowerShell Error Handling for Git

$ErrorActionPreference = "SilentlyContinue"
try {
  $COMMIT_MSG = git log -1 --pretty=%B $PR_HEAD_SHA 2>&1
  if ($LASTEXITCODE -ne 0) { $COMMIT_MSG = "" }
} catch { $COMMIT_MSG = "" }
$ErrorActionPreference = "Stop"

3. PowerShell GITHUB_OUTPUT Syntax

"variable=value" | Out-File -FilePath $env:GITHUB_OUTPUT -Append

4. BSD sed Compatibility

sed -i.bak "s/old/new/" file && rm file.bak

Prevention Measures

Immediate

  1. Default to self-hosted: decide_runner always runs on self-hosted first
  2. Cross-platform support: All decision logic works on Windows, macOS, and Linux
  3. Documentation: New Cross-Platform Runner Patterns guide

Long-term

  1. Monitor billing: Set up alerts for GitHub billing issues
  2. Redundant runners: Ensure multiple self-hosted runners on different platforms
  3. Graceful degradation: System should always fall back to self-hosted if GitHub unavailable

Lessons Learned

1. Self-Hosted First

For critical infrastructure jobs (runner selection, skip detection), always run on self-hosted first. GitHub-hosted should be an optimization, not a requirement.

2. Platform Agnostic Design

Any job that might run on self-hosted must handle all possible platforms. Don’t assume Windows just because that’s the primary self-hosted runner.

3. Output Merging

The ${{ a || b }} pattern is essential for platform-specific steps that both need to produce the same output.

4. PowerShell Nuances

PowerShell error handling is fundamentally different from bash:

  • $ErrorActionPreference controls behavior
  • $LASTEXITCODE checks command success
  • try/catch for exception handling
  • Out-File -Append for output files

Timeline

TimeEvent
T+0GitHub billing block begins
T+10mCI failures noticed across all PRs
T+20mRoot cause identified (billing)
T+30mDecision to move decide_runner to self-hosted
T+60mCross-platform patterns implemented
T+90mAll workflows updated and tested
T+120mFix deployed to all repositories


Marker

All affected workflow files contain the marker:

# BILLING_FIX_2025_11_27: Run on self-hosted first to avoid GitHub billing blocks

Search for this marker to find all related changes.


Tags: incident-report ci-cd billing cross-platform infrastructure Status: Resolved Post-Incident Review: Complete