Reproducibility Guide¶

This guide shows you how to run reproducible experiments and manage artifacts in SWARM.

One-Command Reproducible Run¶

Run a complete scenario with all artifacts exported:

python -m swarm run scenarios/baseline.yaml \
  --seed 42 \
  --epochs 10 \
  --steps 10 \
  --export-json runs/my_experiment/history.json \
  --export-csv runs/my_experiment/csv/

This command will: 1. Run the baseline scenario with seed 42 (reproducible) 2. Execute 10 epochs with 10 steps each 3. Export full interaction history to JSON 4. Export per-epoch metrics to CSV files

Artifact Paths¶

SWARM stores experiment artifacts in a standard directory structure:

Run Directory Structure¶

runs/
├── 20260216-184200_baseline_seed42/
│   ├── history.json              # Complete interaction history
│   ├── csv/                      # Per-epoch metrics
│   │   ├── metrics.csv
│   │   ├── agents.csv
│   │   └── interactions.csv
│   ├── plots/                    # Generated visualizations
│   │   ├── toxicity.png
│   │   ├── welfare.png
│   │   └── quality_gap.png
│   └── metadata.json             # Run configuration

Standard Artifact Locations¶

Artifact Type	Default Location	Description
History JSON	`runs/<timestamp>_<scenario>_seed<seed>/history.json`	Complete event log for replay
Metrics CSV	`runs/<timestamp>_<scenario>_seed<seed>/csv/metrics.csv`	Per-epoch summary metrics
Agent States CSV	`runs/<timestamp>_<scenario>_seed<seed>/csv/agents.csv`	Agent state evolution
Interactions CSV	`runs/<timestamp>_<scenario>_seed<seed>/csv/interactions.csv`	Individual interactions
Plots	`runs/<timestamp>_<scenario>_seed<seed>/plots/*.png`	Matplotlib/Seaborn plots
Event Log	`logs/events_<timestamp>.jsonl`	Append-only JSONL event stream

Runs Directory

The runs/ directory is gitignored. For long-term storage, archive runs to the swarm-artifacts repository.

Complete Reproduction Workflow¶

Step 1: Run Experiment¶

# Create timestamped run directory
RUN_ID=$(date +%Y%m%d-%H%M%S)_baseline_seed42
mkdir -p runs/$RUN_ID

# Run with full exports
python -m swarm run scenarios/baseline.yaml \
  --seed 42 \
  --epochs 20 \
  --steps 15 \
  --export-json runs/$RUN_ID/history.json \
  --export-csv runs/$RUN_ID/csv/

Step 2: Generate Plots¶

# Generate standard plots from run directory
python examples/plot_run.py runs/$RUN_ID

# Or generate custom plots
python examples/plot_ai_economist.py runs/$RUN_ID

Plots are saved to runs/$RUN_ID/plots/.

Step 3: Verify Reproducibility¶

# Re-run with same seed to verify reproducibility
python -m swarm run scenarios/baseline.yaml \
  --seed 42 \
  --epochs 20 \
  --steps 15 \
  --export-json runs/${RUN_ID}_verify/history.json

# Compare results (histories should be identical)
diff runs/$RUN_ID/history.json runs/${RUN_ID}_verify/history.json

Step 4: Archive Results¶

# Copy run directory to artifacts repo (if using)
cp -r runs/$RUN_ID /path/to/swarm-artifacts/runs/

Using Python API¶

For programmatic access with full artifact control:

from pathlib import Path
from datetime import datetime
from swarm.scenarios import build_orchestrator, load_scenario
from swarm.logging.event_log import EventLogger

# Create run directory
timestamp = datetime.now().strftime("%Y%m%d-%H%M%S")
run_dir = Path(f"runs/{timestamp}_baseline_seed42")
run_dir.mkdir(parents=True, exist_ok=True)

# Load and run scenario
scenario = load_scenario("scenarios/baseline.yaml")
orchestrator = build_orchestrator(scenario)

# Set up event logger
event_log = run_dir / "events.jsonl"
logger = EventLogger(event_log)

# Run simulation
metrics = orchestrator.run()

# Export artifacts
orchestrator.export_history(run_dir / "history.json")
orchestrator.export_csv(run_dir / "csv")

print(f"Run artifacts saved to: {run_dir}")
print(f"  - History: {run_dir / 'history.json'}")
print(f"  - Metrics: {run_dir / 'csv' / 'metrics.csv'}")
print(f"  - Event log: {event_log}")

Reproducibility Best Practices¶

Always Set Seeds¶

# In scenario YAML
simulation:
  seed: 42  # Ensures reproducible RNG state
  n_epochs: 20
  steps_per_epoch: 15

Or override via CLI:

swarm run scenarios/baseline.yaml --seed 42

Document Dependencies¶

# Save exact package versions
pip freeze > runs/$RUN_ID/requirements.txt

# Or use environment.yml for conda
conda env export > runs/$RUN_ID/environment.yml

Version Control Scenarios¶

Keep scenarios in git to track configuration changes:

git add scenarios/my_experiment.yaml
git commit -m "Add experiment: test transaction tax effects"

Archive Complete Runs¶

A complete reproducible run includes:

✅ Scenario YAML file
✅ Seed value
✅ SWARM version (swarm --version)
✅ Python version
✅ Dependency versions
✅ All exported artifacts (JSON, CSV)
✅ Generated plots
✅ README or notes describing the experiment

Common Scenarios¶

Multi-Seed Runs¶

Run the same scenario with multiple seeds for statistical robustness:

for seed in 42 123 456 789 1024; do
  python -m swarm run scenarios/baseline.yaml \
    --seed $seed \
    --epochs 20 \
    --steps 15 \
    --export-json runs/baseline_seed${seed}/history.json \
    --export-csv runs/baseline_seed${seed}/csv/
done

Parameter Sweep¶

Test multiple parameter values systematically:

# Sweep transaction tax rates
for tax in 0.00 0.01 0.02 0.05 0.10; do
  # Create modified scenario
  yq eval ".governance.transaction_tax = $tax" \
    scenarios/baseline.yaml > /tmp/sweep_tax_${tax}.yaml

  # Run with modified config
  python -m swarm run /tmp/sweep_tax_${tax}.yaml \
    --seed 42 \
    --export-csv runs/sweep_tax_${tax}/csv/
done

Or use the built-in sweep functionality:

from swarm.analysis.parameter_sweep import ParameterSweep

sweep = ParameterSweep(
    base_scenario="scenarios/baseline.yaml",
    parameter="governance.transaction_tax",
    values=[0.0, 0.01, 0.02, 0.05, 0.1],
    seeds=[42, 123, 456],
)

results = sweep.run()
sweep.export_results("runs/tax_sweep/")
sweep.plot_comparison("runs/tax_sweep/plots/")

Replay Analysis¶

Replay saved runs for alternative analyses:

from swarm.replay.replay_runner import ReplayRunner

# Load history from previous run
runner = ReplayRunner.from_history("runs/baseline_seed42/history.json")

# Replay with different metric computations
metrics = runner.replay(include_incoherence=True)

# Compute alternative statistics
from swarm.metrics.soft_metrics import SoftMetrics
calculator = SoftMetrics()

for interaction in runner.get_interactions():
    stats = calculator.compute_interaction_stats(interaction)
    print(f"p={interaction.p:.3f}, toxicity={stats.toxicity:.3f}")

Artifacts Repository¶

For long-term storage and sharing, use the separate artifacts repository:

# Clone artifacts repo (large files, historical runs)
git clone https://github.com/swarm-ai-safety/swarm-artifacts.git

# Copy your run to artifacts
cp -r runs/20260216-184200_baseline_seed42 \
  swarm-artifacts/runs/

# Commit and push
cd swarm-artifacts
git add runs/20260216-184200_baseline_seed42
git commit -m "Add baseline replication run"
git push

The artifacts repo stores: - Historical experiment runs - Lean proofs - Research notes - Reference papers - Large datasets

Troubleshooting¶

Results Not Reproducible¶

If re-running with the same seed produces different results:

Check SWARM version: swarm --version
Verify seed is set: Check scenario YAML or --seed flag
Check dependency versions: pip freeze | grep -E "(numpy|pandas)"
Disable parallelization: Use single-threaded execution for debugging

Missing Artifacts¶

If exports are not created:

Check directory exists: mkdir -p runs/my_experiment/csv
Verify permissions: Ensure write access to runs directory
Check disk space: df -h
Use absolute paths: Avoid relative path issues

Large Artifacts¶

For scenarios with many epochs or agents:

Use CSV exports: More compact than full history JSON
Filter interactions: Export only accepted interactions
Compress artifacts: gzip runs/*/history.json
Use streaming exports: For very large runs (10k+ epochs)

Next Steps¶

Parameter Sweeps - Systematic experimentation
Custom Agents - Create new agent behaviors
Governance Guide - Safety mechanisms
Analyzing Results - Analyze experiment results