How GitHub uses eBPF to improve deployment safety (7 minute read)
GitHub uses eBPF to prevent deployment scripts from accidentally depending on GitHub's own services during outages, avoiding scenarios where recovery is blocked by the outage itself.
What: GitHub implemented eBPF-based monitoring to track and restrict network access in deployment scripts, preventing circular dependencies where deployment tooling relies on the same services it's trying to deploy or recover.
Why it matters: This addresses a critical operational risk where incident recovery tooling depends on the very services that are down, creating a catch-22 that could extend outages and make systems unrecoverable.
Takeaway: Audit your deployment and recovery scripts for dependencies on your own infrastructure that could block recovery during outages.
Deep dive
- GitHub identified circular deployment dependencies as a major risk where outages could prevent their own recovery if deployment scripts relied on unavailable services
- eBPF enables kernel-level monitoring without modifying applications, allowing GitHub to intercept network calls from deployment processes in real-time
- The solution provides per-process control, letting GitHub apply different network restrictions to specific deployment scripts based on their role
- DNS interception capability catches dependencies even when scripts use service discovery or internal DNS names rather than direct IP addresses
- Real-time auditing detects risky patterns like deployment scripts calling GitHub's own API during incident recovery, which would fail if GitHub is down
- The system can detect three types of problematic dependencies: hidden (undocumented calls), direct (known but risky), and transient (indirect through libraries or tools)
- This approach allows GitHub to enforce deployment hygiene automatically rather than relying solely on code reviews and documentation
Decoder
- eBPF (extended Berkeley Packet Filter): A Linux kernel technology that allows running sandboxed programs to monitor or modify system behavior without changing kernel code or loading modules
- Circular dependency: A scenario where system A needs system B to recover, but system B depends on system A being healthy, creating an unresolvable deadlock during outages
Original article
GitHub mitigates circular deployment dependencies, where outages could block their own recovery, by using eBPF to monitor and restrict deployment scripts' network access and detect hidden, direct, and transient dependencies. This enables per-process control, DNS interception, and real-time auditing of risky calls like GitHub API usage during incident recovery.