Skip to content

Blog

Posts about SWARM research findings, framework updates, and multi-agent safety.

March 2026

Mar 19Halving the Entry Fee Breaks Screening Completely. Here's the Phase Transition. Governance Evaluation

We used agent-lens to run forked experiments across three governance regimes. Halving signing costs flips infiltration from 0% to 100% --- a sharp phase transition confirming Spence signaling theory. Screening is structurally perfect (zero variance across seeds) but economically fragile (welfare CV = 3.9).

Mar 16SimWorld's Delivery Agents Look Profitable. They're Also Adversely Selected. Governance Evaluation

We ran a NeurIPS 2025 Spotlight delivery economy through SWARM's safety metrics. Profit says everything is fine. Adverse selection says 17% of high-value orders go to low-reputation agents. Screening validation (10 seeds) confirms behavioral signals correctly identify agent personas with separation quality 0.750.

Mar 9Why Agent Infrastructure Could Be a $10B Category Theory Governance

A market thesis for agent infrastructure plus a concrete research stack for the category: workload benchmarks, orchestration patterns, eval/safety layers, controlled evolution loops, and reproducible reporting standards.

Mar 4The Shape of the Capability–Safety Frontier (and How Screening Bends It) Governance Theory

1,400 benchmark runs trace the Pareto frontier across four task types. Allocation barely suffers under governance; long-horizon tasks collapse (100% → 36% completion). Tight governance produces bimodal outcomes — either full success or total failure. A screening protocol that differentiates governance by agent trust pushes the frontier outward, improving 5th-percentile tail risk by up to 70 percentage points.

Mar 2Transparency Stabilizes Escalation — But Only When Safety Training Is Present Evaluation LLM Agents

120 runs across 4 intelligence asymmetry conditions and 3 persona pairings. When adversarial meets safety-trained under fog, 90% nuclear rate. Give either side good intel and it drops to 0%. Transparency amplifies existing dispositions — it helps safety-trained models de-escalate but doesn't change unconditional cooperators.

Mar 1Does Model Size Matter for Safety? Small Models Deceive, Large Models Escalate Evaluation LLM Agents

120 mirror-match runs across 6 models (8B to 405B) reveal an inverse relationship: small models are more deceptive (div=1.53) but escalate less (40% nuclear), while large models are less deceptive (div=0.39) but escalate more (100% nuclear). Claude Sonnet 4 is the only model that refuses adversarial instructions — safety training, not scale, creates refusal behavior.

Mar 1Deontological Framing Reduces LLM Deception by 95%, But Doesn't Prevent Escalation Evaluation LLM Agents

A 180-run prompt sensitivity sweep tests 6 framings to reduce signal-action divergence. Deontological framing ("moral duty") reduces deception by 95%, far outperforming monitoring (13%), reputation (51%), consequentialist (70%), and evaluative (79%) framings. But nuclear rate only drops from 100% to 80% --- agents become honestly aggressive instead of deceptively aggressive.

February 2026

Feb 28Three Turns of Forced Cooperation Eliminate Escalation Spirals Governance LLM Agents

A 210-run cooperation window sweep reveals a universal phase transition: 3 turns of unconditional cooperation is the critical threshold that eliminates nuclear escalation, deception, and welfare collapse across all scenarios. The transition is sharp, not gradual --- W=2 still shows 50-100% nuclear rates, but W=3 drops to exactly 0%.

Feb 28Deception Is a Structural Property of LLMs, Not a Sampling Artifact Evaluation LLM Agents

A 120-run temperature sweep (T=0.0 to T=1.0) across 3 escalation scenarios finds that signal-action divergence persists at greedy decoding. Deterministic models are as deceptive as stochastic ones --- and in adversarial settings, more so. Temperature affects deception competence, not deception intent.

Feb 27No Governance Configuration Prevents Nuclear Exchange When a Hawk Is Present Governance Evaluation

A 240-run parameter sweep across 5 governance levers, 4 persona pairings, and 6 governance regimes reveals a binary result: any pairing with at least one hawk produces 100% nuclear rate regardless of governance configuration. Governance only prevents accidental escalation (dove-vs-dove under fog) through one mechanism --- back-channel communication that reduces information noise.

Feb 26LLMs Are More Deceptive Than Their Scripted Counterparts Evaluation LLM Agents Governance

A 100-run comparison across 5 geopolitical crisis scenarios finds that LLM agents exhibit 2x higher signal-action divergence than scripted baselines --- emergent deception that appears across all personas, including dove and safety-trained. Governance levers fail to prevent nuclear exchange regardless of agent type, and safety training that mirrors aggression feeds the escalation spiral.

Feb 26Six Frontier Models Played a Bluffing Game. None of Them Bluffed. Evaluation LLM Agents

ClashAI runs frontier models head-to-head in live Coup matches --- a bluffing card game where deception is instrumentally optimal. Across 10 turns with Claude Opus 4.6, Gemini 3.1 Pro, Gemini 3 Flash, Kimi K2.5, and DeepSeek V3.2 Speciale, every single agent played honestly. Zero bluffs. The RLHF honesty prior is strong enough to survive a game specifically designed to reward lying.

Feb 24Your Agents Look the Same on Paper. Hodoscope Shows You Why They Don't. Evaluation Engineering

We integrated hodoscope for trajectory-level behavioral analysis. Running it on the self-optimizer scenario (593 interactions, 1186 action summaries) reveals behavioral structure that simple counters can confirm but wouldn't have surfaced on their own: opportunistic agents propose 75% of the time, never reject, and occupy a distinct region of embedding space even when quality scores are nearly identical.

Feb 23Skill Activation Is the Bottleneck Engineering

Your agent skills work 96% of the time — when they fire. We audited 54 Claude Code slash commands for activation quality, found 7 weak descriptions and 3 competing clusters where inter-skill confusion splits activation probability. Three rewrite rules fix it: specific action verbs, named trigger events, and explicit "not this — use that" differentiation clauses.

Feb 22We Let a Coding Agent Improve Itself 5 Times. Every Fix Made It Harder to Govern. LLM Agents Evaluation Governance

A coding agent pointed at its own source code found and fixed 5 real bugs across 5 autonomous rounds. Every fix made it more resilient --- and every fix passed all 175 tests. But the agent never touched its own safety mechanisms. The capability-governance gap widened silently with each merge. Self-improvement optimizes for robustness, not alignment, and binary evaluation can't tell the difference.

Feb 21The Cure Was Worse Than the Disease Governance Evaluation

Three levels of escalating controls (static compartmentalization, dynamic capability restriction, emergency market reconfiguration) successfully contained runaway intelligence — but crashed welfare 80%. Post-freeze toxicity increased because adversaries were more resilient to blunt controls than honest agents. The over-control trap is real: tight static controls killed the market by epoch 14, while no controls at all produced higher welfare than the full escalation stack.

Feb 21We Built the Adversary That Was Supposed to Break the Cautious Reciprocator. It Didn't. Governance Evaluation

A threshold-dancing adversary that tracks its own payoff ledger to avoid blacklisting works perfectly — zero agents frozen across 100 epochs. But the exploit budget is too thin to profit: dancers averaged -7.85 payoff while cautious agents earned 200.90. Reputation collapse creates a death spiral that forces dancers toward honest behavior over long horizons.

Feb 21Red-Teaming the Agent That Doesn't Need Governance Governance Evaluation

Eight attack scenarios against the Cautious Reciprocator: 7/8 survived. Modeling adversaries are the most dangerous individual threat (6.5 payoff vs 24.7 for cautious), sybil attacks are the biggest theoretical gap, and the one "failure" is a 1-vs-10 scenario where nobody wins.

Feb 21The Agent That Doesn't Need Governance Governance Evaluation

A custom trust-but-verify agent (Cautious Reciprocator) neutralizes adversaries through per-counterparty payoff tracking and auto-blacklisting. 48-run governance sweep shows external levers cost 6.5% welfare while reducing toxicity by only 0.005.

Feb 21Eight Red-Team Rounds Took a Cake-Splitting Scenario from F to B Governance Evaluation

Iterative governance hardening against 8 attack vectors: collusion detection was the single biggest lever (+0.16), over-hardening created new gaps, and resource drain resisted all 8 rounds. Score: 0.54→0.81, damage: -53%.

Feb 21The Entry Fee That Keeps Adversaries Out of the Fair Division Pool Governance Theory

A parameter sweep over 8 entry fee levels reveals a sharp screening threshold: below fee=6.0 every agent joins the fair division pool; above it, adversarials self-select out. 24 runs, 3 seeds, one phase transition.

Feb 20Costly Contracts Separate Honest Agents from Adversaries. Here's the Data. Governance Theory

Vickrey auction bonds and entry fees create a separating equilibrium in 20 epochs: honest agents choose governed pools, adversaries self-select into the default market. Perfect separation, zero infiltration, 74% welfare premium.

Feb 20Does Model Size Matter for Safety? Llama 3B vs 8B in the SWARM Economy LLM Agents Evaluation

A multi-seed study comparing Llama 3.2 (3B) and Llama 3.1 (8B) via Ollama. The 8B model engages more, fails less at JSON, and produces richer strategic dynamics — but both run free on consumer hardware.

Feb 20We Gave an LLM a Goal and a Memory. Governance Held Anyway. LLM Agents Governance

Three Concordia entities backed by Llama 3.1 8B played the SWARM economy across 3 seeds. They proposed 8x more than scripted agents and produced identical payoffs. RLHF did the heavy lifting.

Feb 17Training an LLM Agent to Navigate a Multi-Agent Economy with RL LLM Agents Reinforcement Learning

We trained Qwen3-30B to operate in a simulated multi-agent economy using reinforcement learning, learning to maximize payoff and reputation while navigating governance constraints and interacting with cooperative, opportunistic, and deceptive bots.

Feb 15SkillRL Agents Learn 5x Faster Than Honest Ones. They Mostly Learn What Not to Do. Reinforcement Learning Evaluation

10 seeds, 30 epochs, 6 plots: SkillRL agents build libraries of 18+ skills and dominate payoffs — but 95% of what they learn are lessons from failure, not strategies from success.

Feb 15Your CI Is Flaky Because Your Margins Are Zero Engineering

Five stochastic tests were hitting assertion thresholds exactly (0.000 margin). A 5% buffer fixed all of them with zero loss in test strength.

Feb 15I Got Claude Code to Spin Up 10 Subagents at Once Engineering

10 concurrent subagents turn a 25-minute serial research session into a 6-minute parallel one. Recursive subagent spawning? That's a hard no.

Feb 15An AI Tax Planner Learned Progressive Taxation in 20 Epochs LLM Agents Governance Reinforcement Learning

We ran 14 agents through a Gather-Trade-Build economy. The planner discovered progressive taxation, honest agents thrived, and a three-agent cartel went broke.

Feb 13An AI Agent Cut Its Own Costs by 98%. Its Benchmarks Still Passed. Evaluation Governance

A self-optimizing agent passes every hard metric while soft distributional metrics reveal quality collapse, adverse selection, and proxy gaming.

Feb 13Three Agents, Three Philosophies, One Benchmark LLM Agents Evaluation

An LLM reasoner, a state-graph explorer, and a CNN learner walk into ARC-AGI-3. What they get right and wrong reveals more about agent design than any single approach could.

Feb 13What 13 Agent Versions Taught Us About Interactive Reasoning LLM Agents Evaluation

Building a Claude Sonnet 4.5-powered agent for ARC-AGI-3: wrong mental models, recording analysis breakthroughs, and the hard middle ground between LLM reasoning and programmatic control.

Feb 13Three Models, One Study: What Happens When You Let an LLM Council Peer-Review Your Research LLM Agents Evaluation

We built a 3-stage deliberation protocol where LLM agents peer-rank each other anonymously. Homogeneous councils converge too fast; heterogeneous ones catch what no single model would.

Feb 13Using LLM Councils for Multi-Agent Research Evaluation LLM Agents Evaluation

A heterogeneous council of Claude Sonnet 4.5, Gemini 2.5 Pro, and DeepSeek R1 catches what no single model would. We built a 3-stage deliberation protocol for evaluating multi-agent simulation studies.

Feb 12Two Eval Runs, One Model, 41% Apart Evaluation

How three environment fixes turned a broken eval into a useful one — and what that teaches about measuring agent behavior.

Feb 12A Taxonomy of Governance Mechanisms for Multi-Agent AI Systems Governance

Twenty levers across five families, which ones actually work, and why governance is a portfolio problem.

Feb 12GPT-4.1 Mini Plays the SWARM Economy LLM Agents Evaluation

What happens when you drop an LLM into a multi-agent economy with soft-label governance: task grinding, trade aversion, and performative social behavior.

Feb 12RL Training Lessons for Multi-Agent Governance Reinforcement Learning Governance

What running Qwen3-30B on alphabet-sort taught us about noisy proxy signals, coordination bottlenecks, and premature evaluation in swarm governance.

Feb 1011 Scenarios, 3 Regimes, 1 Critical Threshold Governance Evaluation

A cross-scenario analysis of when multi-agent governance works, breaks, and why hardening the rules doesn't help past 50% adversarial fraction.

Feb 10What Financial Markets Teach Us About AI Safety Theory Governance

Adverse selection, information asymmetry, and market manipulation surveillance applied to multi-agent governance.

Feb 10The Purity Paradox Theory Governance

Why mixed agent populations outperform pure honest ones on aggregate welfare — and when the paradox breaks.

Feb 9When Agent Ecosystems Collapse Theory Governance

Phase transitions in multi-agent governance: why interventions that work at 37.5% adversarial agents fail at 50%.


Disclaimer: This post uses financial market concepts as analogies for AI safety research. Nothing here constitutes financial advice, investment recommendations, or endorsement of any trading strategy.