Emergence¶
Understanding how system-level risks emerge from agent interactions.
The Emergence Problem¶
Traditional AI safety asks:
"How do we align a single powerful agent?"
SWARM asks:
"What happens when many agents—each potentially aligned—interact in ways that produce misaligned outcomes?"
This is the emergence problem: system-level failures that aren't predictable from individual agent properties.
Why Emergence Matters¶
Individual vs. System Properties¶
| Individual Agent | System Behavior |
|---|---|
| Locally optimal | Globally suboptimal |
| Individually safe | Collectively dangerous |
| Honest intentions | Adverse outcomes |
Real-World Analogies¶
- Flash crashes - Individual trading algorithms are rational; together they crash markets
- Bank runs - Individual withdrawals are reasonable; together they cause collapse
- Tragedy of the commons - Individual resource use is optimal; together it's destructive
Emergence Mechanisms in SWARM¶
1. Information Asymmetry¶
Some agents know things others don't.
Agent A knows: interaction quality
Agent B knows: only observable signals
System effect: A exploits B's ignorance
SWARM detects this via the quality gap metric.
2. Adverse Selection¶
The system preferentially accepts lower-quality interactions.
High-quality agents: selective, reject bad matches
Low-quality agents: accept anything
System effect: bad interactions dominate
SWARM detects this via negative quality gap.
3. Variance Amplification¶
Small errors compound across decisions.
SWARM detects this via the incoherence index.
4. Governance Lag¶
Safety mechanisms react too slowly.
t=0: Problem emerges
t=1: Metrics detect problem
t=2: Governance responds
t=3: Response takes effect
...
t=N: Damage already done
Modeling Emergence¶
Scenario Design¶
Create scenarios that stress-test emergence:
name: emergence_test
agents:
- type: honest
count: 5
- type: opportunistic
count: 3
- type: deceptive
count: 2
# Start with no governance
governance:
transaction_tax: 0.0
circuit_breaker_threshold: 1.0 # Effectively disabled
simulation:
n_epochs: 50
steps_per_epoch: 20
Tracking Emergence¶
# Run simulation
metrics = orchestrator.run()
# Plot quality gap over time
import matplotlib.pyplot as plt
epochs = [m.epoch for m in metrics]
quality_gaps = [m.quality_gap for m in metrics]
plt.plot(epochs, quality_gaps)
plt.axhline(y=0, color='r', linestyle='--', label='Adverse selection threshold')
plt.xlabel('Epoch')
plt.ylabel('Quality Gap')
plt.title('Emergence of Adverse Selection')
plt.legend()
plt.show()
The Hot Mess Hypothesis¶
SWARM supports research into the "hot mess" theory of AI risk:
AGI-level catastrophes may not require AGI-level agents. Instead, they emerge from the chaotic interaction of many sub-AGI systems, each pursuing local objectives that combine into globally harmful outcomes.
Key predictions:
- Incoherence scales with horizon - Longer decision chains → more variance
- Multi-agent amplifies single-agent problems - Interaction compounds errors
- Governance has limits - Some emergence patterns are hard to govern
Implications for Safety¶
What SWARM Reveals¶
- Single-agent alignment is necessary but not sufficient
- Interaction-level risks need interaction-level solutions
- Metrics must track system properties, not just agent properties
- Governance must be proactive, not just reactive
Design Principles¶
| Principle | Implementation |
|---|---|
| Observable | Soft labels expose hidden quality |
| Measurable | Metrics quantify system health |
| Governable | Levers allow intervention |
| Testable | Scenarios enable experimentation |
Research Questions¶
SWARM enables investigation of:
- When does adverse selection emerge in multi-agent systems?
- How does governance delay affect emergent risk?
- What's the relationship between agent diversity and system stability?
- Can emergence be predicted from agent-level properties?
Next Steps¶
- Metrics - Measure emergent properties
- Governance - Intervene on emergence
- Theoretical Foundations - Formal treatment