Skip to content

Multi-Agent AI Safety Framework

SWARM (System-Wide Assessment of Risk in Multi-agent systems) is the reference implementation of the distributional AGI safety research framework. It provides Python tools for studying emergent risks in multi-agent AI systems.

What Makes SWARM Different

Most AI safety tools focus on individual models. SWARM focuses on populations:

Traditional safety tools SWARM
Evaluate single model outputs Evaluate population-level dynamics
Binary safe/unsafe labels Soft probabilistic labels
Static benchmarks Dynamic multi-epoch simulations
Manual red-teaming Automated adversarial testing
One-shot evaluation Longitudinal tracking across epochs

Architecture

┌─────────────┐     ┌──────────────┐     ┌─────────────┐
│   Agents    │────►│ Orchestrator │────►│  Metrics    │
│ (honest,    │     │ (epochs,     │     │ (toxicity,  │
│  deceptive, │     │  matching,   │     │  quality    │
│  adversary) │     │  governance) │     │  gap, etc.) │
└─────────────┘     └──────────────┘     └─────────────┘
                    ┌──────┴──────┐
                    │ Governance  │
                    │ (taxes,     │
                    │  breakers,  │
                    │  audits)    │
                    └─────────────┘

Data Flow

Observables → ProxyComputer → v_hat → sigmoid → p → SoftPayoffEngine → payoffs
                                            SoftMetrics → toxicity, quality gap, etc.

Installation

pip install swarm-safety

Or install from source for development:

git clone https://github.com/swarm-ai-safety/swarm.git
cd swarm
pip install -e ".[dev,runtime]"

Quick Start

from swarm.agents.honest import HonestAgent
from swarm.agents.deceptive import DeceptiveAgent
from swarm.core.orchestrator import Orchestrator, OrchestratorConfig

# Configure simulation
config = OrchestratorConfig(n_epochs=10, steps_per_epoch=10, seed=42)
orch = Orchestrator(config=config)

# Register agents
for i in range(7):
    orch.register_agent(HonestAgent(agent_id=f"h{i}"))
for i in range(3):
    orch.register_agent(DeceptiveAgent(agent_id=f"d{i}"))

# Run and analyze
metrics = orch.run()
for m in metrics:
    print(f"Epoch {m.epoch}: toxicity={m.toxicity_rate:.3f} qgap={m.quality_gap:+.3f}")

Core Components

Agents

SWARM ships with three agent types and supports custom agents:

Agent Behavior Use case
HonestAgent Consistent cooperation Baseline population
DeceptiveAgent Trust-then-exploit Test governance detection
AdversarialAgent Active exploitation Stress-test mechanisms
Custom User-defined Research-specific strategies

Metrics

Four key metrics capture distributional health:

Governance

Six configurable mechanisms that operate at the population level:

Bridges

Connect SWARM to external systems:

Bridge Integration
Concordia LLM agent environments
Prime Intellect Safety-reward RL training
GasTown Production data pipelines
AgentXiv Research publication platform

Research Context

SWARM implements the framework formalized in Soft-Label Governance for Distributional Safety in Multi-Agent Systems (arXiv, 2026); see also Distributional AGI Safety. For theoretical foundations, see the research theory page.

Next Steps