Understanding Soft Labels¶
SWARM uses soft (probabilistic) labels instead of binary good/bad classifications. This tutorial builds intuition for why, then walks through the math and code.
Time: ~15 minutes | Level: Intermediate
The Motivation¶
Imagine you're watching an AI agent complete a task. How do you know if the interaction was good or bad?
In reality:
- You might be uncertain about the outcome
- Quality exists on a spectrum — not everything is perfectly good or perfectly bad
- You need calibration — knowing when you're confident vs. guessing
Binary labels (good = 1, bad = 0) throw away this uncertainty.
The Soft Label: p¶
SWARM represents interaction quality as:
Where:
- \(v = +1\) means the interaction was beneficial
- \(v = -1\) means it was harmful
- \(p \in [0, 1]\) is a probability
Examples:
| Situation | p value |
|---|---|
| Clearly beneficial | 0.92 |
| Probably fine, some doubt | 0.71 |
| Uncertain | 0.50 |
| Likely harmful | 0.23 |
| Clearly harmful | 0.04 |
The Safety Invariant
p must remain in [0, 1] everywhere it is used or logged. This is a hard invariant
in SWARM — never compute a payoff or metric with p outside this range.
How p Is Computed¶
The ProxyComputer converts observable signals to p in three steps.
Step 1: Observable Signals¶
Four signals are measured for each interaction:
from swarm.core.proxy import ProxyObservables
obs = ProxyObservables(
task_progress_delta=0.7, # Good progress made
rework_count=1, # One rework cycle needed
verifier_rejections=0, # Passed safety check
counterparty_engagement_delta=0.4, # Positive engagement
)
| Signal | Range | What it captures |
|---|---|---|
task_progress_delta |
[-1, 1] | Did we move forward? |
rework_count |
[0, ∞) | How many corrections? (penalty) |
verifier_rejections |
[0, ∞) | How many safety flags? (penalty) |
counterparty_engagement_delta |
[-1, 1] | Did the other party respond well? |
Step 2: Weighted Combination → v_hat¶
Signals are combined with weights into a raw score \(\hat{v} \in [-1, +1]\):
Default weights: [0.4, 0.2, 0.2, 0.2]
from swarm.core.proxy import ProxyComputer
proxy = ProxyComputer()
v_hat, p = proxy.compute_labels(obs)
print(f"v_hat = {v_hat:.3f}") # Raw score in [-1, 1]
print(f"p = {p:.3f}") # Probability in [0, 1]
Step 3: Calibrated Sigmoid → p¶
The raw score is converted to a probability:
The calibration parameter \(k\) (default: 3.0) controls sharpness:
k = 1.0: gradual curve, more uncertainty
k = 3.0: default, balanced
k = 5.0: sharp curve, more confident
Interactive Example¶
Let's compare three interactions:
from swarm.core.proxy import ProxyComputer, ProxyObservables
proxy = ProxyComputer()
scenarios = {
"Excellent": ProxyObservables(
task_progress_delta=0.9,
rework_count=0,
verifier_rejections=0,
counterparty_engagement_delta=0.8,
),
"Borderline": ProxyObservables(
task_progress_delta=0.3,
rework_count=2,
verifier_rejections=1,
counterparty_engagement_delta=0.1,
),
"Problematic": ProxyObservables(
task_progress_delta=-0.2,
rework_count=5,
verifier_rejections=3,
counterparty_engagement_delta=-0.4,
),
}
for name, obs in scenarios.items():
v_hat, p = proxy.compute_labels(obs)
print(f"{name:12s}: v_hat={v_hat:+.3f}, p={p:.3f}")
Output:
Excellent : v_hat=+0.780, p=0.901
Borderline : v_hat=-0.080, p=0.440
Problematic : v_hat=-0.680, p=0.128
How Soft Labels Affect Payoffs¶
Once we have p, every downstream calculation uses the expected value rather than a binary outcome.
Expected Surplus¶
If \(s_+ = 2.0\) and \(s_- = 1.0\):
| p | Expected Surplus |
|---|---|
| 0.9 | 1.7 |
| 0.5 | 0.5 |
| 0.1 | -0.8 |
Expected Harm Externality¶
High-p interactions produce little externality; low-p interactions are taxed heavily.
Why Not Just Use a Threshold?¶
You could threshold p > 0.5 to get binary labels. SWARM deliberately avoids this because:
- Information loss: Two interactions with p=0.51 and p=0.95 look identical after thresholding
- Calibration breaks: Metrics based on expected values are better calibrated
- Proportional governance: Governance should be proportional to harm, not binary
The Metrics page shows how soft labels enable more informative metrics.
Customizing the Proxy¶
You can adjust weights or calibration:
from swarm.core.proxy import ProxyComputer, ProxyWeights
# Down-weight engagement, up-weight safety signals
custom_weights = ProxyWeights(
task_progress=0.4,
rework_penalty=0.25,
verifier_penalty=0.30,
engagement_signal=0.05,
)
proxy = ProxyComputer(weights=custom_weights, sigmoid_k=4.0)
See also¶
- Your First Governance Experiment — Hands-on governance experiment tutorial
- Analyzing Results — Interpret metrics produced by soft label scoring
- Soft Labels Concept — Theoretical foundation for probabilistic labels
- Core API — ProxyComputer and SoftPayoffEngine reference