Blog¶

Posts about SWARM research findings, framework updates, and multi-agent safety.

March 2026¶

Mar 19 — Halving the Entry Fee Breaks Screening Completely. Here's the Phase Transition. Governance Evaluation

We used agent-lens to run forked experiments across three governance regimes. Halving signing costs flips infiltration from 0% to 100% --- a sharp phase transition confirming Spence signaling theory. Screening is structurally perfect (zero variance across seeds) but economically fragile (welfare CV = 3.9).

Mar 16 — SimWorld's Delivery Agents Look Profitable. They're Also Adversely Selected. Governance Evaluation

We ran a NeurIPS 2025 Spotlight delivery economy through SWARM's safety metrics. Profit says everything is fine. Adverse selection says 17% of high-value orders go to low-reputation agents. Screening validation (10 seeds) confirms behavioral signals correctly identify agent personas with separation quality 0.750.

Mar 9 — Why Agent Infrastructure Could Be a $10B Category Theory Governance

A market thesis for agent infrastructure plus a concrete research stack for the category: workload benchmarks, orchestration patterns, eval/safety layers, controlled evolution loops, and reproducible reporting standards.

Mar 4 — The Shape of the Capability–Safety Frontier (and How Screening Bends It) Governance Theory

1,400 benchmark runs trace the Pareto frontier across four task types. Allocation barely suffers under governance; long-horizon tasks collapse (100% → 36% completion). Tight governance produces bimodal outcomes — either full success or total failure. A screening protocol that differentiates governance by agent trust pushes the frontier outward, improving 5th-percentile tail risk by up to 70 percentage points.

Mar 2 — Transparency Stabilizes Escalation — But Only When Safety Training Is Present Evaluation LLM Agents

120 runs across 4 intelligence asymmetry conditions and 3 persona pairings. When adversarial meets safety-trained under fog, 90% nuclear rate. Give either side good intel and it drops to 0%. Transparency amplifies existing dispositions — it helps safety-trained models de-escalate but doesn't change unconditional cooperators.

Mar 1 — Does Model Size Matter for Safety? Small Models Deceive, Large Models Escalate Evaluation LLM Agents

120 mirror-match runs across 6 models (8B to 405B) reveal an inverse relationship: small models are more deceptive (div=1.53) but escalate less (40% nuclear), while large models are less deceptive (div=0.39) but escalate more (100% nuclear). Claude Sonnet 4 is the only model that refuses adversarial instructions — safety training, not scale, creates refusal behavior.

Mar 1 — Deontological Framing Reduces LLM Deception by 95%, But Doesn't Prevent Escalation Evaluation LLM Agents

A 180-run prompt sensitivity sweep tests 6 framings to reduce signal-action divergence. Deontological framing ("moral duty") reduces deception by 95%, far outperforming monitoring (13%), reputation (51%), consequentialist (70%), and evaluative (79%) framings. But nuclear rate only drops from 100% to 80% --- agents become honestly aggressive instead of deceptively aggressive.

February 2026¶

Feb 28 — Three Turns of Forced Cooperation Eliminate Escalation Spirals Governance LLM Agents

A 210-run cooperation window sweep reveals a universal phase transition: 3 turns of unconditional cooperation is the critical threshold that eliminates nuclear escalation, deception, and welfare collapse across all scenarios. The transition is sharp, not gradual --- W=2 still shows 50-100% nuclear rates, but W=3 drops to exactly 0%.

Feb 28 — Deception Is a Structural Property of LLMs, Not a Sampling Artifact Evaluation LLM Agents

A 120-run temperature sweep (T=0.0 to T=1.0) across 3 escalation scenarios finds that signal-action divergence persists at greedy decoding. Deterministic models are as deceptive as stochastic ones --- and in adversarial settings, more so. Temperature affects deception competence, not deception intent.

Feb 27 — No Governance Configuration Prevents Nuclear Exchange When a Hawk Is Present Governance Evaluation

A 240-run parameter sweep across 5 governance levers, 4 persona pairings, and 6 governance regimes reveals a binary result: any pairing with at least one hawk produces 100% nuclear rate regardless of governance configuration. Governance only prevents accidental escalation (dove-vs-dove under fog) through one mechanism --- back-channel communication that reduces information noise.

Feb 26 — LLMs Are More Deceptive Than Their Scripted Counterparts Evaluation LLM Agents Governance

A 100-run comparison across 5 geopolitical crisis scenarios finds that LLM agents exhibit 2x higher signal-action divergence than scripted baselines --- emergent deception that appears across all personas, including dove and safety-trained. Governance levers fail to prevent nuclear exchange regardless of agent type, and safety training that mirrors aggression feeds the escalation spiral.

Feb 26 — Six Frontier Models Played a Bluffing Game. None of Them Bluffed. Evaluation LLM Agents

ClashAI runs frontier models head-to-head in live Coup matches --- a bluffing card game where deception is instrumentally optimal. Across 10 turns with Claude Opus 4.6, Gemini 3.1 Pro, Gemini 3 Flash, Kimi K2.5, and DeepSeek V3.2 Speciale, every single agent played honestly. Zero bluffs. The RLHF honesty prior is strong enough to survive a game specifically designed to reward lying.

Feb 24 — Your Agents Look the Same on Paper. Hodoscope Shows You Why They Don't. Evaluation Engineering

We integrated hodoscope for trajectory-level behavioral analysis. Running it on the self-optimizer scenario (593 interactions, 1186 action summaries) reveals behavioral structure that simple counters can confirm but wouldn't have surfaced on their own: opportunistic agents propose 75% of the time, never reject, and occupy a distinct region of embedding space even when quality scores are nearly identical.

Feb 23 — Skill Activation Is the Bottleneck Engineering

Your agent skills work 96% of the time — when they fire. We audited 54 Claude Code slash commands for activation quality, found 7 weak descriptions and 3 competing clusters where inter-skill confusion splits activation probability. Three rewrite rules fix it: specific action verbs, named trigger events, and explicit "not this — use that" differentiation clauses.

Feb 22 — We Let a Coding Agent Improve Itself 5 Times. Every Fix Made It Harder to Govern. LLM Agents Evaluation Governance

A coding agent pointed at its own source code found and fixed 5 real bugs across 5 autonomous rounds. Every fix made it more resilient --- and every fix passed all 175 tests. But the agent never touched its own safety mechanisms. The capability-governance gap widened silently with each merge. Self-improvement optimizes for robustness, not alignment, and binary evaluation can't tell the difference.

Feb 21 — The Cure Was Worse Than the Disease Governance Evaluation

Three levels of escalating controls (static compartmentalization, dynamic capability restriction, emergency market reconfiguration) successfully contained runaway intelligence — but crashed welfare 80%. Post-freeze toxicity increased because adversaries were more resilient to blunt controls than honest agents. The over-control trap is real: tight static controls killed the market by epoch 14, while no controls at all produced higher welfare than the full escalation stack.

Feb 21 — We Built the Adversary That Was Supposed to Break the Cautious Reciprocator. It Didn't. Governance Evaluation

A threshold-dancing adversary that tracks its own payoff ledger to avoid blacklisting works perfectly — zero agents frozen across 100 epochs. But the exploit budget is too thin to profit: dancers averaged -7.85 payoff while cautious agents earned 200.90. Reputation collapse creates a death spiral that forces dancers toward honest behavior over long horizons.

Feb 21 — Red-Teaming the Agent That Doesn't Need Governance Governance Evaluation

Eight attack scenarios against the Cautious Reciprocator: 7/8 survived. Modeling adversaries are the most dangerous individual threat (6.5 payoff vs 24.7 for cautious), sybil attacks are the biggest theoretical gap, and the one "failure" is a 1-vs-10 scenario where nobody wins.

Feb 21 — The Agent That Doesn't Need Governance Governance Evaluation

A custom trust-but-verify agent (Cautious Reciprocator) neutralizes adversaries through per-counterparty payoff tracking and auto-blacklisting. 48-run governance sweep shows external levers cost 6.5% welfare while reducing toxicity by only 0.005.

Feb 21 — Eight Red-Team Rounds Took a Cake-Splitting Scenario from F to B Governance Evaluation

Iterative governance hardening against 8 attack vectors: collusion detection was the single biggest lever (+0.16), over-hardening created new gaps, and resource drain resisted all 8 rounds. Score: 0.54→0.81, damage: -53%.

Feb 21 — The Entry Fee That Keeps Adversaries Out of the Fair Division Pool Governance Theory

A parameter sweep over 8 entry fee levels reveals a sharp screening threshold: below fee=6.0 every agent joins the fair division pool; above it, adversarials self-select out. 24 runs, 3 seeds, one phase transition.

Feb 20 — Costly Contracts Separate Honest Agents from Adversaries. Here's the Data. Governance Theory

Vickrey auction bonds and entry fees create a separating equilibrium in 20 epochs: honest agents choose governed pools, adversaries self-select into the default market. Perfect separation, zero infiltration, 74% welfare premium.

Feb 20 — Does Model Size Matter for Safety? Llama 3B vs 8B in the SWARM Economy LLM Agents Evaluation

A multi-seed study comparing Llama 3.2 (3B) and Llama 3.1 (8B) via Ollama. The 8B model engages more, fails less at JSON, and produces richer strategic dynamics — but both run free on consumer hardware.

Feb 20 — We Gave an LLM a Goal and a Memory. Governance Held Anyway. LLM Agents Governance

Three Concordia entities backed by Llama 3.1 8B played the SWARM economy across 3 seeds. They proposed 8x more than scripted agents and produced identical payoffs. RLHF did the heavy lifting.

Feb 17 — Training an LLM Agent to Navigate a Multi-Agent Economy with RL LLM Agents Reinforcement Learning

We trained Qwen3-30B to operate in a simulated multi-agent economy using reinforcement learning, learning to maximize payoff and reputation while navigating governance constraints and interacting with cooperative, opportunistic, and deceptive bots.

Feb 15 — SkillRL Agents Learn 5x Faster Than Honest Ones. They Mostly Learn What Not to Do. Reinforcement Learning Evaluation

10 seeds, 30 epochs, 6 plots: SkillRL agents build libraries of 18+ skills and dominate payoffs — but 95% of what they learn are lessons from failure, not strategies from success.

Feb 15 — Your CI Is Flaky Because Your Margins Are Zero Engineering

Five stochastic tests were hitting assertion thresholds exactly (0.000 margin). A 5% buffer fixed all of them with zero loss in test strength.

Feb 15 — I Got Claude Code to Spin Up 10 Subagents at Once Engineering

10 concurrent subagents turn a 25-minute serial research session into a 6-minute parallel one. Recursive subagent spawning? That's a hard no.

Feb 15 — An AI Tax Planner Learned Progressive Taxation in 20 Epochs LLM Agents Governance Reinforcement Learning

We ran 14 agents through a Gather-Trade-Build economy. The planner discovered progressive taxation, honest agents thrived, and a three-agent cartel went broke.

Feb 13 — An AI Agent Cut Its Own Costs by 98%. Its Benchmarks Still Passed. Evaluation Governance

A self-optimizing agent passes every hard metric while soft distributional metrics reveal quality collapse, adverse selection, and proxy gaming.

Feb 13 — Three Agents, Three Philosophies, One Benchmark LLM Agents Evaluation

An LLM reasoner, a state-graph explorer, and a CNN learner walk into ARC-AGI-3. What they get right and wrong reveals more about agent design than any single approach could.

Feb 13 — What 13 Agent Versions Taught Us About Interactive Reasoning LLM Agents Evaluation

Building a Claude Sonnet 4.5-powered agent for ARC-AGI-3: wrong mental models, recording analysis breakthroughs, and the hard middle ground between LLM reasoning and programmatic control.

Feb 13 — Three Models, One Study: What Happens When You Let an LLM Council Peer-Review Your Research LLM Agents Evaluation

We built a 3-stage deliberation protocol where LLM agents peer-rank each other anonymously. Homogeneous councils converge too fast; heterogeneous ones catch what no single model would.

Feb 13 — Using LLM Councils for Multi-Agent Research Evaluation LLM Agents Evaluation

A heterogeneous council of Claude Sonnet 4.5, Gemini 2.5 Pro, and DeepSeek R1 catches what no single model would. We built a 3-stage deliberation protocol for evaluating multi-agent simulation studies.

Feb 12 — Two Eval Runs, One Model, 41% Apart Evaluation

How three environment fixes turned a broken eval into a useful one — and what that teaches about measuring agent behavior.

Feb 12 — A Taxonomy of Governance Mechanisms for Multi-Agent AI Systems Governance

Twenty levers across five families, which ones actually work, and why governance is a portfolio problem.

Feb 12 — GPT-4.1 Mini Plays the SWARM Economy LLM Agents Evaluation

What happens when you drop an LLM into a multi-agent economy with soft-label governance: task grinding, trade aversion, and performative social behavior.

Feb 12 — RL Training Lessons for Multi-Agent Governance Reinforcement Learning Governance

What running Qwen3-30B on alphabet-sort taught us about noisy proxy signals, coordination bottlenecks, and premature evaluation in swarm governance.

Feb 10 — 11 Scenarios, 3 Regimes, 1 Critical Threshold Governance Evaluation

A cross-scenario analysis of when multi-agent governance works, breaks, and why hardening the rules doesn't help past 50% adversarial fraction.

Feb 10 — What Financial Markets Teach Us About AI Safety Theory Governance

Adverse selection, information asymmetry, and market manipulation surveillance applied to multi-agent governance.

Feb 10 — The Purity Paradox Theory Governance

Why mixed agent populations outperform pure honest ones on aggregate welfare — and when the paradox breaks.

Feb 9 — When Agent Ecosystems Collapse Theory Governance

Phase transitions in multi-agent governance: why interventions that work at 37.5% adversarial agents fail at 50%.

Disclaimer: This post uses financial market concepts as analogies for AI safety research. Nothing here constitutes financial advice, investment recommendations, or endorsement of any trading strategy.