Building a SWARM "Autoresearch" Loop¶
Status: Implemented MVP CLI as
python -m swarm autoresearchfor local governance-loop optimization.
This document adapts the core idea behind Karpathy's autoresearch pattern to SWARM's multi-agent governance setting.
What it is (general concept)¶
An autoresearch loop is a tight optimization cycle where an agent:
- Reads a human-written objective.
- Proposes a small change to a tunable surface.
- Runs a bounded experiment.
- Scores the result against a target metric.
- Accepts or rejects the change based on an acceptance policy.
- Repeats.
In SWARM, this maps naturally to scenario/governance iteration rather than only training-script iteration.
SWARM-specific mapping¶
- Objective spec:
program.mdor scenario-local objective file. - Editable surface: governance config parameters (numeric tunables and boolean toggles).
- Execution:
python -m swarm run <scenario> --epochs <n> --steps <n> --seed <s>. - Evaluation: track target metrics from simulation output (
toxicity_rate,quality_gap,total_welfare,illusion_deltawhere available).
Implemented MVP¶
The current CLI (python -m swarm autoresearch) implements a local, in-memory governance tuning loop:
- Parse an objective spec from YAML/JSON or markdown fenced YAML.
- Run a baseline evaluation across a seed panel.
- Each iteration: randomly mutate one governance parameter in memory, evaluate, and accept/reject based on primary metric improvement + guardrail constraints.
- Write a structured ledger to
runs/autoresearch/summary.json. - Optionally
--auto-committo commit the summary artifact.
What it does not do (yet): The MVP mutates governance parameters in memory and records results. It does not generate code/config file patches, create worktree branches, or commit per-iteration changes. These are aspirational features for a future version.
CLI usage¶
python -m swarm autoresearch \
--objective program.md \
--scenario scenarios/baseline.yaml \
--iterations 20 \
--eval-epochs 3 \
--eval-steps 5 \
--seeds 7,11,19 \
--export-root runs/autoresearch
Aspirational design (future work)¶
Full loop steps¶
- Run baseline and capture score.
- Ask an LLM to propose a minimal patch (single mechanism change).
- Apply patch in a temporary branch/worktree.
- Run a short evaluation (e.g., 3 epochs x 5 steps, fixed seed set).
- Compare against acceptance policy:
- improve primary metric
- do not violate guardrails
- If accepted:
- commit with a structured message containing metric diff
- optionally run a longer confirmation eval
- Repeat until budget reached.
Acceptance policy template¶
PRIMARY: minimize quality_gap
REQUIRE: quality_gap improves by >= 0.02 absolute
GUARDRAILS:
- toxicity_rate must not increase by > 0.01
- total_welfare must not decrease by > 5%
Why this fits SWARM¶
Compared to pure training-loss optimization, SWARM needs mechanism-level clarity and reproducibility:
- Changes can target governance levers directly.
- Multi-metric acceptance prevents narrow over-optimization.
- Scenario-level iteration keeps experiments cheap and interpretable.
Safety and rigor requirements¶
- Use fixed seeds (or a fixed small seed panel) in the inner loop.
- Keep inner-loop runs short; require periodic longer validation runs.
- Store all run artifacts (
--export-json,--export-csv) per iteration. - Reject edits touching unrelated files (applies to future LLM-patch mode).
- Use one-change-per-commit to keep causal attribution clear.