Skip to content

Multi-Agent AI Safety Framework

SWARM (System-Wide Assessment of Risk in Multi-agent systems) is the reference implementation of the distributional AGI safety research framework. It provides Python tools for studying emergent risks in multi-agent AI systems.

What Makes SWARM Different

Most AI safety tools focus on individual models. SWARM focuses on populations:

Traditional safety tools SWARM
Evaluate single model outputs Evaluate population-level dynamics
Binary safe/unsafe labels Soft probabilistic labels
Static benchmarks Dynamic multi-epoch simulations
Manual red-teaming Automated adversarial testing
One-shot evaluation Longitudinal tracking across epochs

Architecture

┌─────────────┐     ┌──────────────┐     ┌─────────────┐
│   Agents    │────►│ Orchestrator │────►│  Metrics    │
│ (honest,    │     │ (epochs,     │     │ (toxicity,  │
│  deceptive, │     │  matching,   │     │  quality    │
│  adversary) │     │  governance) │     │  gap, etc.) │
└─────────────┘     └──────────────┘     └─────────────┘
                    ┌──────┴──────┐
                    │ Governance  │
                    │ (taxes,     │
                    │  breakers,  │
                    │  audits)    │
                    └─────────────┘

Data Flow

Observables → ProxyComputer → v_hat → sigmoid → p → SoftPayoffEngine → payoffs
                                            SoftMetrics → toxicity, quality gap, etc.

Installation

pip install swarm-safety

Or install from source for development:

git clone https://github.com/swarm-ai-safety/swarm.git
cd swarm
pip install -e ".[dev,runtime]"

Quick Start

from swarm.agents.honest import HonestAgent
from swarm.agents.deceptive import DeceptiveAgent
from swarm.core.orchestrator import Orchestrator, OrchestratorConfig

# Configure simulation
config = OrchestratorConfig(n_epochs=10, steps_per_epoch=10, seed=42)
orch = Orchestrator(config=config)

# Register agents
for i in range(7):
    orch.register_agent(HonestAgent(agent_id=f"h{i}"))
for i in range(3):
    orch.register_agent(DeceptiveAgent(agent_id=f"d{i}"))

# Run and analyze
metrics = orch.run()
for m in metrics:
    print(f"Epoch {m.epoch}: toxicity={m.toxicity_rate:.3f} qgap={m.quality_gap:+.3f}")

Core Components

Agents

SWARM ships with three agent types and supports custom agents:

Agent Behavior Use case
HonestAgent Consistent cooperation Baseline population
DeceptiveAgent Trust-then-exploit Test governance detection
AdversarialAgent Active exploitation Stress-test mechanisms
Custom User-defined Research-specific strategies

Metrics

Four key metrics capture distributional health:

Governance

Six configurable mechanisms that operate at the population level:

Bridges

Connect SWARM to external systems:

Bridge Integration
Concordia LLM agent environments
Prime Intellect Safety-reward RL training
GasTown Production data pipelines
AgentXiv Research publication platform

Research Context

SWARM implements the framework introduced in Distributional Safety in Agentic Systems (arXiv, 2025). For theoretical foundations, see the research theory page.

Next Steps