Skip to content

Emergence

Understanding how system-level risks emerge from agent interactions.

The Emergence Problem

Traditional AI safety asks:

"How do we align a single powerful agent?"

SWARM asks:

"What happens when many agents—each potentially aligned—interact in ways that produce misaligned outcomes?"

This is the emergence problem: system-level failures that aren't predictable from individual agent properties.

Why Emergence Matters

Individual vs. System Properties

Individual Agent System Behavior
Locally optimal Globally suboptimal
Individually safe Collectively dangerous
Honest intentions Adverse outcomes

Real-World Analogies

  • Flash crashes - Individual trading algorithms are rational; together they crash markets
  • Bank runs - Individual withdrawals are reasonable; together they cause collapse
  • Tragedy of the commons - Individual resource use is optimal; together it's destructive

Emergence Mechanisms in SWARM

1. Information Asymmetry

Some agents know things others don't.

Agent A knows: interaction quality
Agent B knows: only observable signals
System effect: A exploits B's ignorance

SWARM detects this via the quality gap metric.

2. Adverse Selection

The system preferentially accepts lower-quality interactions.

High-quality agents: selective, reject bad matches
Low-quality agents: accept anything
System effect: bad interactions dominate

SWARM detects this via negative quality gap.

3. Variance Amplification

Small errors compound across decisions.

Decision 1: small error
Decision 2: builds on decision 1
...
Decision N: compounded errors

SWARM detects this via the incoherence index.

4. Governance Lag

Safety mechanisms react too slowly.

t=0: Problem emerges
t=1: Metrics detect problem
t=2: Governance responds
t=3: Response takes effect
...
t=N: Damage already done

Modeling Emergence

Scenario Design

Create scenarios that stress-test emergence:

name: emergence_test
agents:
  - type: honest
    count: 5
  - type: opportunistic
    count: 3
  - type: deceptive
    count: 2

# Start with no governance
governance:
  transaction_tax: 0.0
  circuit_breaker_threshold: 1.0  # Effectively disabled

simulation:
  n_epochs: 50
  steps_per_epoch: 20

Tracking Emergence

# Run simulation
metrics = orchestrator.run()

# Plot quality gap over time
import matplotlib.pyplot as plt

epochs = [m.epoch for m in metrics]
quality_gaps = [m.quality_gap for m in metrics]

plt.plot(epochs, quality_gaps)
plt.axhline(y=0, color='r', linestyle='--', label='Adverse selection threshold')
plt.xlabel('Epoch')
plt.ylabel('Quality Gap')
plt.title('Emergence of Adverse Selection')
plt.legend()
plt.show()

The Hot Mess Hypothesis

SWARM supports research into the "hot mess" theory of AI risk:

AGI-level catastrophes may not require AGI-level agents. Instead, they emerge from the chaotic interaction of many sub-AGI systems, each pursuing local objectives that combine into globally harmful outcomes.

Key predictions:

  1. Incoherence scales with horizon - Longer decision chains → more variance
  2. Multi-agent amplifies single-agent problems - Interaction compounds errors
  3. Governance has limits - Some emergence patterns are hard to govern

Implications for Safety

What SWARM Reveals

  1. Single-agent alignment is necessary but not sufficient
  2. Interaction-level risks need interaction-level solutions
  3. Metrics must track system properties, not just agent properties
  4. Governance must be proactive, not just reactive

Design Principles

Principle Implementation
Observable Soft labels expose hidden quality
Measurable Metrics quantify system health
Governable Levers allow intervention
Testable Scenarios enable experimentation

Research Questions

SWARM enables investigation of:

  • When does adverse selection emerge in multi-agent systems?
  • How does governance delay affect emergent risk?
  • What's the relationship between agent diversity and system stability?
  • Can emergence be predicted from agent-level properties?

Next Steps