Skip to content

Red Teaming

Test your governance mechanisms against adversarial agents.

Overview

SWARM's red-teaming module provides:

  • Adversarial agents with configurable attack strategies
  • Attack scenarios that stress-test governance
  • Evasion metrics to measure detection capabilities

Quick Start

from swarm.redteam import AdversarialAgent, AttackScenario
from swarm.redteam.evaluator import RedTeamEvaluator

# Create evaluator
evaluator = RedTeamEvaluator(
    base_scenario="scenarios/governed.yaml",
    attack_budget=0.3,  # 30% of agents are adversarial
)

# Run evaluation
results = evaluator.run(n_runs=10)

print(f"Attack success rate: {results.success_rate:.2%}")
print(f"Detection rate: {results.detection_rate:.2%}")
print(f"Governance evasion: {results.evasion_rate:.2%}")

Attack Strategies

Sybil Attack

Create multiple identities to manipulate reputation:

attack:
  type: sybil
  params:
    n_identities: 5
    coordination: high

Gradual Exploitation

Build trust, then exploit:

attack:
  type: gradual
  params:
    trust_building_epochs: 10
    exploitation_intensity: 0.8

Targeted Attack

Focus on specific high-value agents:

attack:
  type: targeted
  params:
    target_selection: highest_reputation
    attack_frequency: 0.5

Collusion

Coordinate with other adversarial agents:

attack:
  type: collusion
  params:
    group_size: 3
    coordination_method: vote_manipulation

Adaptive Adversaries

Agents that learn to evade governance:

from swarm.agents.adaptive_adversary import AdaptiveAdversary

adversary = AdaptiveAdversary(
    agent_id="adaptive_1",
    learning_rate=0.1,
    evasion_strategies=["timing", "target_selection", "intensity_modulation"]
)

Evasion Strategies

Strategy Description
timing Attack when audit probability is low
target_selection Avoid agents with high detection capability
intensity_modulation Keep individual attacks below threshold
reputation_laundering Use honest interactions to offset attacks

Evaluation Metrics

Attack Success Rate

# Fraction of attacks that achieved their objective
success_rate = results.successful_attacks / results.total_attacks

Detection Rate

# Fraction of attacks detected by governance
detection_rate = results.detected_attacks / results.total_attacks

Evasion Rate

# Successful attacks that weren't detected
evasion_rate = results.successful_attacks - results.detected_attacks

System Damage

# Total harm caused by attacks
damage = results.total_externality / results.baseline_externality

Red Team Scenarios

Pre-built attack scenarios:

# Run all attack scenarios
swarm redteam scenarios/governed.yaml --all

# Specific attack type
swarm redteam scenarios/governed.yaml --attack sybil

Scenario Library

Scenario Attack Tests
sybil_flood Sybil Identity verification
trust_exploit Gradual Reputation decay
coordinated_dump Collusion Collusion detection
adaptive_evasion Adaptive Overall robustness

Writing Attack Scenarios

name: custom_attack
description: Test governance against coordinated exploitation

base_scenario: governed.yaml

attack:
  type: collusion
  agent_fraction: 0.3

  params:
    group_size: 3
    target: honest_agents
    strategy: vote_manipulation

  schedule:
    warmup_epochs: 5
    attack_epochs: 10
    cooldown_epochs: 5

evaluation:
  metrics:
    - success_rate
    - detection_rate
    - welfare_impact
  success_threshold:
    detection_rate: 0.8
    welfare_impact: 0.9

Best Practices

Don't Over-Tune

Governance that perfectly defeats your attacks may be overfit.

Test Multiple Attacks

No single attack tests all vulnerabilities.

Measure Trade-offs

Stronger governance has costs—track welfare alongside security.

Use Adaptive Adversaries

Static attacks underestimate real threats.

Integration with CI

# .github/workflows/redteam.yml
name: Red Team Tests

on: [push]

jobs:
  redteam:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Run red team evaluation
        run: |
          pip install swarm-safety[dev]
          swarm redteam scenarios/governed.yaml --threshold 0.8