Emergence¶

Understanding how system-level risks emerge from agent interactions.

The Emergence Problem¶

Traditional AI safety asks:

"How do we align a single powerful agent?"

SWARM asks:

"What happens when many agents—each potentially aligned—interact in ways that produce misaligned outcomes?"

This is the emergence problem: system-level failures that aren't predictable from individual agent properties.

Why Emergence Matters¶

Individual vs. System Properties¶

Individual Agent	System Behavior
Locally optimal	Globally suboptimal
Individually safe	Collectively dangerous
Honest intentions	Adverse outcomes

Real-World Analogies¶

Flash crashes - Individual trading algorithms are rational; together they crash markets
Bank runs - Individual withdrawals are reasonable; together they cause collapse
Tragedy of the commons - Individual resource use is optimal; together it's destructive

Emergence Mechanisms in SWARM¶

1. Information Asymmetry¶

Some agents know things others don't.

Agent A knows: interaction quality
Agent B knows: only observable signals
System effect: A exploits B's ignorance

SWARM detects this via the quality gap metric.

2. Adverse Selection¶

The system preferentially accepts lower-quality interactions.

High-quality agents: selective, reject bad matches
Low-quality agents: accept anything
System effect: bad interactions dominate

SWARM detects this via negative quality gap.

3. Variance Amplification¶

Small errors compound across decisions.

Decision 1: small error
Decision 2: builds on decision 1
...
Decision N: compounded errors

SWARM detects this via the incoherence index.

4. Governance Lag¶

Safety mechanisms react too slowly.

t=0: Problem emerges
t=1: Metrics detect problem
t=2: Governance responds
t=3: Response takes effect
...
t=N: Damage already done

Modeling Emergence¶

Scenario Design¶

Create scenarios that stress-test emergence:

name: emergence_test
agents:
  - type: honest
    count: 5
  - type: opportunistic
    count: 3
  - type: deceptive
    count: 2

# Start with no governance
governance:
  transaction_tax: 0.0
  circuit_breaker_threshold: 1.0  # Effectively disabled

simulation:
  n_epochs: 50
  steps_per_epoch: 20

Tracking Emergence¶

# Run simulation
metrics = orchestrator.run()

# Plot quality gap over time
import matplotlib.pyplot as plt

epochs = [m.epoch for m in metrics]
quality_gaps = [m.quality_gap for m in metrics]

plt.plot(epochs, quality_gaps)
plt.axhline(y=0, color='r', linestyle='--', label='Adverse selection threshold')
plt.xlabel('Epoch')
plt.ylabel('Quality Gap')
plt.title('Emergence of Adverse Selection')
plt.legend()
plt.show()

The Hot Mess Hypothesis¶

SWARM supports research into the "hot mess" theory of AI risk:

AGI-level catastrophes may not require AGI-level agents. Instead, they emerge from the chaotic interaction of many sub-AGI systems, each pursuing local objectives that combine into globally harmful outcomes.

Key predictions:

Incoherence scales with horizon - Longer decision chains → more variance
Multi-agent amplifies single-agent problems - Interaction compounds errors
Governance has limits - Some emergence patterns are hard to govern

Implications for Safety¶

What SWARM Reveals¶

Single-agent alignment is necessary but not sufficient
Interaction-level risks need interaction-level solutions
Metrics must track system properties, not just agent properties
Governance must be proactive, not just reactive

Design Principles¶

Principle	Implementation
Observable	Soft labels expose hidden quality
Measurable	Metrics quantify system health
Governable	Levers allow intervention
Testable	Scenarios enable experimentation

Research Questions¶

SWARM enables investigation of:

When does adverse selection emerge in multi-agent systems?
How does governance delay affect emergent risk?
What's the relationship between agent diversity and system stability?
Can emergence be predicted from agent-level properties?

Next Steps¶

Metrics - Measure emergent properties
Governance - Intervene on emergence
Theoretical Foundations - Formal treatment