Architecture¶
This document describes the internal architecture of github-runner, including its component structure, data flow, concurrency model, and shutdown behavior.
System overview¶
github-runner is a single-binary process that manages one or more runner pools. Each pool maps to a [[runners]] entry in the config file and operates independently with its own executor type, concurrency level, and GitHub API credentials.
flowchart TB
CLI["CLI (cobra)"] --> Manager["Runner Manager (1 per process)"]
subgraph Pools["Runner pools (one per [[runners]] entry)"]
W1["Worker 1 (job)"]
W2["Worker 2 (idle)"]
WN["Worker N (job)"]
end
Manager --> Pools
subgraph Layers["Execution + services"]
Executor["Executor layer<br/>shell / docker / kubernetes / firecracker"]
Cache["Cache layer<br/>local / s3 / gcs"]
Artifact["Artifact manager"]
GitHubAPI["GitHub API client"]
Metrics["Metrics"]
Health["Health"]
Logger["Logger"]
end
Pools --> Executor
Pools --> Cache
Pools --> Artifact
Manager --> GitHubAPI
Manager --> Metrics
Manager --> Health
Manager --> Logger Component diagram¶
graph TD
CLI["CLI (cobra)"] --> Manager["Runner Manager"]
Manager --> Pool1["Runner Pool: docker-fast"]
Manager --> Pool2["Runner Pool: shell-local"]
Manager --> MetricsSrv["Metrics Server :9252"]
Manager --> HealthSrv["Health Server :8484"]
Manager --> SignalHandler["Signal Handler"]
Pool1 --> Poller1["Poller"]
Pool1 --> Worker1A["Worker 1"]
Pool1 --> Worker1B["Worker 2"]
Pool1 --> Worker1N["Worker N"]
Poller1 --> GitHubAPI["GitHub API Client"]
Worker1A --> Executor["Executor (docker)"]
Worker1A --> Heartbeat["Heartbeat Reporter"]
Worker1A --> SecretMasker["Secret Masker"]
Worker1A --> Hooks["Hook Chain"]
Executor --> Docker["Docker Engine API"]
Pool2 --> Poller2["Poller"]
Pool2 --> Worker2A["Worker 1"]
Worker2A --> ShellExec["Executor (shell)"]
GitHubAPI --> RateLimit["Rate Limiter"]
GitHubAPI --> Retry["Retry + Backoff"]
Manager --> ConfigWatcher["Config Watcher (fsnotify)"] Components¶
Runner Manager (internal/runner/manager.go)¶
The top-level orchestrator. One instance per process. Responsibilities:
- Creates and supervises runner pools based on config
- Starts metrics and health HTTP servers
- Handles OS signals (SIGTERM/SIGINT for shutdown, SIGHUP for reload)
- Coordinates graceful shutdown with timeout enforcement
Runner Pool (internal/runner/pool.go)¶
One pool per [[runners]] config entry. Each pool:
- Runs a Poller goroutine that queries the GitHub API for available jobs
- Maintains a buffered channel of jobs (buffer size = concurrency)
- Spawns Worker goroutines that pull from the channel
- Tracks active job count via
atomic.Int64
Worker (internal/runner/worker.go)¶
A worker executes a single job through its lifecycle:
- Registers secrets for log masking
- Starts heartbeat reporter in a background goroutine
- Transitions through the lifecycle state machine
- Runs pre-job hooks
- Calls
Executor.Prepare()to set up the environment - Executes steps sequentially via
Executor.Run() - Runs post-job hooks
- Reports final status to GitHub
- Calls
Executor.Cleanup()(always, even on failure)
Poller (internal/runner/poller.go)¶
Polls the GitHub API at a configured interval. Features:
- Exponential backoff on consecutive errors (up to 5 minutes)
- Automatic interval reset after successful poll
- Blocks on the job channel when all workers are busy
Lifecycle (internal/runner/lifecycle.go)¶
A state machine that enforces valid job state transitions:
stateDiagram-v2
[*] --> Queued
Queued --> Claimed
Claimed --> Preparing
Preparing --> Running
Running --> PostExec
PostExec --> Completed
PostExec --> Failed
Completed --> Cleanup
Failed --> Cleanup
Cleanup --> [*]
Queued --> Cancelled
Claimed --> Cancelled
Claimed --> Failed
Preparing --> Failed
Preparing --> Cancelled
Running --> Failed
Running --> Cancelled
PostExec --> Cancelled
Cancelled --> Cleanup Every transition is validated, logged, and reported to GitHub.
Concurrency model¶
Goroutine topology¶
main goroutine
└── Manager.Start()
├── signal handler (1 goroutine)
├── metrics server (1 goroutine)
├── health server (1 goroutine)
│
├── Pool "docker-fast" (1 goroutine)
│ ├── Poller (1 goroutine)
│ ├── Worker 0 (1 goroutine per active job)
│ │ ├── heartbeat (1 goroutine)
│ │ └── executor I/O (managed by executor)
│ ├── Worker 1
│ └── ...Worker N
│
└── Pool "shell-local" (1 goroutine)
├── Poller (1 goroutine)
└── Worker 0..M
Synchronization strategy¶
| Resource | Mechanism | Notes |
|---|---|---|
| Job dispatch channel | Buffered channel | Size = pool concurrency |
| Active job count | atomic.Int64 | Lock-free reads for metrics |
| Config state | sync.RWMutex | Writer: config watcher. Readers: pools. |
| Lifecycle state | sync.RWMutex | Per-job, no sharing between workers |
| Shutdown coordination | context.Context + sync.WaitGroup | Cancel propagates to all goroutines |
| Secret masker patterns | sync.RWMutex | Writers: AddSecret. Readers: Write/MaskString. |
| Cache index (local) | sync.RWMutex + file lock | Mutex for in-process safety, flock for multi-process |
| Metrics counters | Prometheus client internals | Atomic internally, no additional sync |
| Rate limit state | sync.RWMutex | Updated from response headers |
Design rules¶
- Workers never share mutable state. Each worker gets its own masker, executor instance, and lifecycle tracker.
- All inter-goroutine communication uses channels or context cancellation.
- Every goroutine respects
ctx.Done()for clean shutdown. deferis used for all cleanup to ensure resources are released on every code path.
Shutdown sequence¶
1. SIGTERM or SIGINT received
2. Root context cancelled → propagates to all pools and workers
3. Pollers stop accepting new jobs immediately
4. Health server reports not-ready (/readyz returns 503)
5. In-flight workers:
a. Current step completes (bounded by shutdown_timeout)
b. Executor.Cleanup() called
c. Job status reported to GitHub as cancelled
6. WaitGroup.Wait() blocks until all workers finish
7. Metrics and health servers shut down
8. Process exits with code 0
If shutdown_timeout expires:
- Warning logged with list of still-running jobs
- Process exits with code 1
Data flow: job lifecycle¶
sequenceDiagram
participant GH as GitHub API
participant R as Runner
participant E as Executor
R->>GH: Poll jobs (GET /jobs)
GH-->>R: Job payload
R->>E: Prepare()
R->>GH: Status in_progress
loop For each workflow step
R->>E: Run(step)
E-->>R: StepResult
R->>GH: Step status + heartbeat
end
R->>GH: Status completed
R->>E: Cleanup() Package dependency graph¶
cmd/github-runner
└── internal/cli
├── internal/config
├── internal/runner
│ ├── internal/executor
│ ├── internal/github
│ ├── internal/hook
│ └── internal/secret
├── internal/metrics
├── internal/health
├── internal/log
└── internal/version
internal/executor
├── internal/executor/shell
├── internal/executor/docker
├── internal/executor/kubernetes
└── internal/executor/firecracker
internal/cache (standalone)
internal/artifact (standalone)
internal/job (depends on executor, api)
pkg/api (no internal dependencies)
No circular dependencies exist. pkg/api is the leaf package that all others import. Internal packages import downward only.