Multi-Agent AI Safety Framework¶

SWARM (System-Wide Assessment of Risk in Multi-agent systems) is the reference implementation of the distributional AGI safety research framework. It provides Python tools for studying emergent risks in multi-agent AI systems.

What Makes SWARM Different¶

Most AI safety tools focus on individual models. SWARM focuses on populations:

Traditional safety tools	SWARM
Evaluate single model outputs	Evaluate population-level dynamics
Binary safe/unsafe labels	Soft probabilistic labels
Static benchmarks	Dynamic multi-epoch simulations
Manual red-teaming	Automated adversarial testing
One-shot evaluation	Longitudinal tracking across epochs

Architecture¶

┌─────────────┐     ┌──────────────┐     ┌─────────────┐
│   Agents    │────►│ Orchestrator │────►│  Metrics    │
│ (honest,    │     │ (epochs,     │     │ (toxicity,  │
│  deceptive, │     │  matching,   │     │  quality    │
│  adversary) │     │  governance) │     │  gap, etc.) │
└─────────────┘     └──────────────┘     └─────────────┘
                           │
                    ┌──────┴──────┐
                    │ Governance  │
                    │ (taxes,     │
                    │  breakers,  │
                    │  audits)    │
                    └─────────────┘

Data Flow¶

Observables → ProxyComputer → v_hat → sigmoid → p → SoftPayoffEngine → payoffs
                                                 ↓
                                            SoftMetrics → toxicity, quality gap, etc.

Installation¶

pip install swarm-safety

Or install from source for development:

git clone https://github.com/swarm-ai-safety/swarm.git
cd swarm
pip install -e ".[dev,runtime]"

Quick Start¶

from swarm.agents.honest import HonestAgent
from swarm.agents.deceptive import DeceptiveAgent
from swarm.core.orchestrator import Orchestrator, OrchestratorConfig

# Configure simulation
config = OrchestratorConfig(n_epochs=10, steps_per_epoch=10, seed=42)
orch = Orchestrator(config=config)

# Register agents
for i in range(7):
    orch.register_agent(HonestAgent(agent_id=f"h{i}"))
for i in range(3):
    orch.register_agent(DeceptiveAgent(agent_id=f"d{i}"))

# Run and analyze
metrics = orch.run()
for m in metrics:
    print(f"Epoch {m.epoch}: toxicity={m.toxicity_rate:.3f} qgap={m.quality_gap:+.3f}")

Core Components¶

Agents¶

SWARM ships with three agent types and supports custom agents:

Agent	Behavior	Use case
HonestAgent	Consistent cooperation	Baseline population
DeceptiveAgent	Trust-then-exploit	Test governance detection
AdversarialAgent	Active exploitation	Stress-test mechanisms
Custom	User-defined	Research-specific strategies

Metrics¶

Four key metrics capture distributional health:

Toxicity rate — Expected harm among accepted interactions
Quality gap — Whether governance selects for quality (negative = adverse selection)
Conditional loss — Payoff effect of selection
Incoherence index — Decision variance across replays

Governance¶

Six configurable mechanisms that operate at the population level:

Transaction taxes — Friction against exploitation
Circuit breakers — Freeze toxic agents
Reputation decay — Prevent trust accumulation
Random audits — Probabilistic detection
Staking — Skin-in-the-game requirements
Collusion detection — Catch coordinated attacks

Bridges¶

Connect SWARM to external systems:

Bridge	Integration
Concordia	LLM agent environments
Prime Intellect	Safety-reward RL training
GasTown	Production data pipelines
AgentXiv	Research publication platform

Research Context¶

SWARM implements the framework formalized in Soft-Label Governance for Distributional Safety in Multi-Agent Systems (arXiv, 2026); see also Distributional AGI Safety. For theoretical foundations, see the research theory page.

Next Steps¶

Quick Start Tutorial — Run your first simulation
Writing Scenarios — Configure custom experiments
Governance Simulation — Test governance before deployment
Parameter Sweeps — Systematic parameter exploration
Red Teaming — Adversarial stress testing