Situational Awareness Tracker¶
A living tracker for Situational Awareness: The Decade Ahead (Leopold Aschenbrenner, June 2024) — the essay series arguing that AGI by 2027 is plausible, that an intelligence explosion follows shortly after, and that managing the transition is primarily a governance and security problem, not just a model-alignment problem.
This page does two things:
- Tracks the series' falsifiable claims against what has actually happened (update protocol below).
- Maps its AGI-management problems onto concrete SWARM mechanisms — the essay describes the world SWARM's simulations are built to stress-test: millions of automated AI agents whose individual alignment cannot be verified, interacting faster than human oversight can react.
Nodes for this tracker live in the knowledge graph (docs/knowledge-graph.jsonld) under the IDs sa:essay and sa:tracker.
Thesis chain¶
The series is one compounding argument; each chapter is a link:
| Ch. | Title | Core claim |
|---|---|---|
| I | From GPT-4 to AGI: Counting the OOMs | ~5 OOMs of effective compute 2023→2027 (compute + algorithmic efficiency + unhobbling) make AGI by 2027 "strikingly plausible" |
| II | From AGI to Superintelligence | Automated AI research compresses a decade of algorithmic progress into ≤1 year |
| IIIa | Racing to the Trillion-Dollar Cluster | Capital and power buildout reaches trillion-dollar scale; US electricity production must grow materially |
| IIIb | Lock Down the Labs | Lab security is inadequate against state actors; AGI secrets leak by default |
| IIIc | Superalignment | Controlling much-smarter-than-human systems is unsolved; plan is to align "somewhat-superhuman" systems and delegate alignment research to millions of automated researchers |
| IIId | The Free World Must Prevail | Superintelligence confers decisive strategic advantage; the race is existential |
| IV | The Project | USG nationalizes/co-runs AGI development by ~2027–28 |
| V | Parting Thoughts | If the above holds, the next few years are the highest-stakes in history |
Claim status¶
Status legend: ✅ borne out · 🟡 partially / directionally · ❌ not borne out · ⏳ open (deadline not reached) · ❓ unassessed.
| # | Claim (chapter) | Trackable signal | Status (2026-06) | Notes |
|---|---|---|---|---|
| 1 | ~0.5 OOM/yr physical compute scaling continues (I) | Frontier training-run FLOP estimates (Epoch AI) | 🟡 | Buildout continued, but frontier gains shifted partly to test-time compute rather than pretraining FLOP alone |
| 2 | ~0.5 OOM/yr algorithmic efficiency (I) | Cost to reach fixed benchmark level | ✅ | Token-cost collapse at fixed capability continued through 2025 (cf. AI Index 2025: 280× cost drop at GPT-3.5 level) |
| 3 | Test-time compute is a large unhobbling OOM source (I) | Reasoning-model scaling (o-series et al.) | ✅ | Reasoning/test-time-compute models became the dominant frontier axis in 2025 — arguably the series' best near-term call |
| 4 | Expert-PhD-level science (GPQA) cracked "soon" (I) | GPQA scores | ✅ | Frontier models exceeded in-domain PhD baselines by 2025 |
| 5 | AGI (drop-in remote worker / automated AI researcher) by 2027 (I) | Drop-in automation of AI R&D roles | ⏳ | Agents handle multi-hour tasks; full researcher automation not demonstrated. Deadline end-2027 |
| 6 | Intelligence explosion: decade of progress in ≤1 yr post-AGI (II) | Post-AGI algorithmic progress rate | ⏳ | Contingent on #5 |
| 7 | $100B+ single clusters; trillion-dollar aggregate buildout (IIIa) | Announced datacenter capex, power contracts | 🟡→✅ | Hundred-billion-scale programs (e.g. Stargate-class announcements, 2025) and power constraints became mainstream; trillion-dollar aggregate trajectory plausible but not yet realized |
| 8 | Lab security inadequate vs. state actors (IIIb) | Lab security incidents, RAND/gov assessments, export-control posture | 🟡 | Security tightened (clearances, compartmentalization at frontier labs) but no public evidence of weights-grade security; periodic exfiltration reports persist |
| 9 | Superalignment unsolved at superhuman scale (IIIc) | Existence of validated scalable-oversight method | ✅ (still unsolved) | Scalable oversight, interp, and adversarial testing remain open research; no lab claims a solution |
| 10 | USG "Project" — government-run AGI effort by 2027–28 (IV) | Formal USG AGI program with lab integration | ⏳ | Government involvement deepened (compute/security/export controls) but no Manhattan-Project-style consolidation as of early 2026 |
Statuses are coarse editorial judgments as of the last update; they cite no run data. When a status changes, append a dated note rather than rewriting (core-principles append-only discipline applies).
Mapping: managing AGI with SWARM¶
The essay's endgame — "millions of automated AI researchers" supervised by AI systems whose own alignment is uncertain — is precisely a distributional safety problem. Individual-model alignment (chapter IIIc's framing) is necessary but not sufficient: even if each agent is "probably fine" (p < 1 of being beneficial), risk compounds across the population. That is the regime SWARM measures and governs.
Problem → mechanism table¶
| Situational Awareness problem | SWARM mechanism | Where in repo |
|---|---|---|
| Millions of automated researchers; no human can review each output | Soft labels: every interaction carries p = P(beneficial) instead of a binary verdict; population-level risk is aggregated, not enumerated |
swarm/core/proxy.py (ProxyComputer), swarm/core/payoff.py |
| Scalable oversight — AI critiquing AI, humans out of the loop | Proxy/pseudo-verifier signals (verifier_rejections, rework_count) feeding v_hat → p; calibration experiments test when the proxy itself degrades |
swarm/core/proxy.py, calibration pilots (calibration-pilot-findings*.md) |
| "Extremely high confidence before each capability jump" | Gate analog: circuit breakers freeze agents whose sliding-window toxicity exceeds threshold — a mechanical version of "don't proceed until trust is established" | swarm/governance/ circuit breaker; governance concepts |
| Deliberately-planted misalignment / adversarial testing | Adversarial scenario fixtures, trust-then-exploit strategies, /red_team workflow; selection-geometry experiments show evasive adversaries invert naive metric rankings |
tests/fixtures/interactions.py (generate_adversarial_scenario), scenarios/, selection-geometry experiment series |
| Intelligence explosion outpaces human governance response | Governance latency is a first-class metric: delay between problem emergence and mechanism effect, measured under accelerating interaction rates | swarm:governance-latency concept; distributional safety |
| Espionage / insider exfiltration (IIIb) | Information asymmetry + collusion detection: correlated exploitation timing across agent pairs as a proxy for coordinated insiders | swarm:collusion-detection, swarm:information-asymmetry |
| Race dynamics erode safety margins (IIId) | Payoff engine with externality internalization ρ: when agents don't bear ecosystem harm costs, adverse selection accelerates — the race-to-the-bottom, quantified | swarm/core/payoff.py, swarm/metrics/soft_metrics.py (quality gap) |
| The Project: what should the regulator actually do? (IV) | The six governance levers (tax, staking, reputation decay, audits, circuit breakers, collusion detection) are testable policy analogs; sweeps map lever settings → toxicity/welfare tradeoffs | /sweep, /red_team, governance |
What SWARM adds that the essay lacks¶
Situational Awareness is forecast and advocacy; it names the management problem but offers no apparatus for it. SWARM's contribution to "the Project" question is empirical: which governance levers retain effectiveness when (a) agents adapt adversarially, (b) interaction rates exceed oversight bandwidth, and (c) the quality signal itself is a learned proxy that can be gamed. The v3 selection-geometry result — an evasive adversary inverting the v2 metric ranking — is a miniature of exactly the failure mode chapter IIIc worries about: oversight metrics that work on static threats break on adaptive ones.
Open research questions this tracker feeds¶
- Oversight-bandwidth scaling: at what interactions-per-epoch does each governance lever's latency exceed damage-propagation time? (Maps to chapter II's speed claims.)
- Proxy-gaming during capability jumps: if agent capability rises mid-run, does a calibrated
pgo stale faster than reputation decay can compensate? (Maps to "confidence before each jump.") - Insider-coalition detection floors: minimum coalition size/correlation detectable by collusion detection under evasive timing. (Maps to IIIb.)
Update protocol¶
- Cadence: revisit when a tracked deadline passes (end-2027 for claims 5–6), when a major capability or policy event lands, or at least quarterly.
- How: re-fetch situational-awareness.ai and current capability data (Epoch AI, AI Index); update the Status column; append dated notes — don't rewrite old judgments.
- Knowledge graph: if a new SWARM concept becomes relevant to a mapping row, add the edge in
docs/knowledge-graph.jsonldon thesa:trackernode'saboutlist. - Runs: experiments motivated by the open questions above should tag
situational-awarenessin their run metadata sorun-querycan find them.
Sources¶
- Aschenbrenner, L. (2024). Situational Awareness: The Decade Ahead.
- Stanford AI Index 2025 analysis (capability/cost context for claims 1–4).
- Theoretical Foundations — the distributional-safety frame the mapping relies on.