# H_u / H_s analysis — qwen3-8b · medial · geo

Idiom vs. parallel literal-VP (non-idiom) datasets. All entropies are uniform MC averages over each phrase's observed contexts; scores are unnormalized geometric-mean (or joint) per-token LM probabilities. See `README.md` / `code/SCORING_MATH.md` for the derivation.

> **How to read these magnitudes** (directions, why H_s is +inf, the new finite synergy metrics): see [`INTERPRETATION.md`](../INTERPRETATION.md). Quick key — ↑`H_u/H(p)`, ↑`syn_frac`, ↑`H_s^log` mean **more** synergy; ↑`H_s^reg` and the original ↑`H_s` mean **less** synergy.

## Configuration

| field | idioms run | non-idioms run |
|---|---|---|
| model | Qwen/Qwen3-8B | Qwen/Qwen3-8B |
| reduction | geometric_mean | geometric_mean |
| medial_only | True | True |
| dtype | bfloat16 | bfloat16 |
| num_idioms | 18 | 18 |
| dataset | /home/prada/PID_evaluation/data/dataset.tsv | /home/prada/PID_evaluation/data/nonidioms_dataset.tsv |

Bound `H_u + H_s ≥ 2H(p) + 2log2 ≥ H(p)` holds for 18/18 idioms and 18/18 non-idioms.

`H_s = +inf` (≥1 non-synergistic slot) for 3/18 idioms and 12/18 non-idioms. A *finite* H_s means **every** context of that phrase is synergistic (p > max(q,r) everywhere).

## Per-metric summary (idioms vs non-idioms)

Means and 95% bootstrap CIs (20k resamples, percentile method, phrase-level). Non-finite values are dropped per metric before bootstrapping; the drop count is shown.

### H(p)

*base entropy, uniform MC average over contexts (nats). ↓ smaller = idiom more concentrated*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 4.4622 | 4.4674 | 3.4261 | 5.7807 | [4.1848, 4.7503] |
| non-idioms | 18 | 0 | 5.0307 | 4.9217 | 3.9509 | 6.1989 | [4.7558, 5.3083] |

**Cross-dataset gap** (idioms − non-idioms): Δ = -0.5685  (95% CI [-0.9669, -0.1649]) → idioms < non-idioms, **significant**.

### H_u

*unique / redundant entropy = -log min{p, max(q,r)} (nats); >= H(p)*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 5.3292 | 5.3166 | 4.3291 | 6.5154 | [5.0790, 5.5842] |
| non-idioms | 18 | 0 | 5.3791 | 5.4802 | 4.4119 | 6.3737 | [5.1204, 5.6326] |

**Cross-dataset gap** (idioms − non-idioms): Δ = -0.0499  (95% CI [-0.4117, 0.3121]) → idioms < non-idioms, not significant.

### H_u / H(p)

*unique-information ratio (>= 1). ↑ bigger = MORE synergy. THE headline metric*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 1.2015 | 1.1701 | 1.0725 | 1.3320 | [1.1675, 1.2372] |
| non-idioms | 18 | 0 | 1.0719 | 1.0652 | 1.0168 | 1.1335 | [1.0536, 1.0903] |

**Cross-dataset gap** (idioms − non-idioms): Δ = 0.1296  (95% CI [0.0907, 0.1700]) → idioms > non-idioms, **significant**.

### syn_frac

*synergy coverage in [0,1] = frac. of contexts with p>m. ↑ bigger = MORE synergy (most intuitive)*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 0.9556 | 1.0000 | 0.6000 | 1.0000 | [0.9000, 1.0000] |
| non-idioms | 18 | 0 | 0.7333 | 0.8000 | 0.4000 | 1.0000 | [0.6222, 0.8333] |

**Cross-dataset gap** (idioms − non-idioms): Δ = 0.2222  (95% CI [0.1000, 0.3333]) → idioms > non-idioms, **significant**.

### H_s^log

*log-space synergy = mean max{0, log p - log m} (nats). ↑ bigger = MORE synergy; finite always*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 0.8670 | 0.8092 | 0.3547 | 1.3543 | [0.7519, 0.9851] |
| non-idioms | 18 | 0 | 0.3484 | 0.3133 | 0.0832 | 0.6438 | [0.2669, 0.4298] |

**Cross-dataset gap** (idioms − non-idioms): Δ = 0.5186  (95% CI [0.3757, 0.6619]) → idioms > non-idioms, **significant**.

### H_s^log / H(p)

*log-space synergy ratio. ↑ bigger = MORE synergy*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 0.2015 | 0.1701 | 0.0725 | 0.3320 | [0.1675, 0.2372] |
| non-idioms | 18 | 0 | 0.0719 | 0.0652 | 0.0168 | 0.1335 | [0.0536, 0.0903] |

**Cross-dataset gap** (idioms − non-idioms): Δ = 0.1296  (95% CI [0.0907, 0.1700]) → idioms > non-idioms, **significant**.

### H_s^log signed

*signed log-space synergy = mean(log p - log m); can be negative (net anti-synergistic)*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 0.8512 | 0.8092 | 0.3503 | 1.3543 | [0.7280, 0.9772] |
| non-idioms | 18 | 0 | 0.3017 | 0.2626 | -0.0880 | 0.6438 | [0.2036, 0.4003] |

**Cross-dataset gap** (idioms − non-idioms): Δ = 0.5495  (95% CI [0.3888, 0.7100]) → idioms > non-idioms, **significant**.

### H_s^reg

*regularized H_s (eps-floored, finite, continuous). ↑ bigger = LESS synergy*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 5.3335 | 5.0841 | 3.8394 | 6.8904 | [4.9311, 5.7406] |
| non-idioms | 18 | 0 | 7.2634 | 7.5473 | 5.2551 | 9.6390 | [6.6668, 7.8431] |

**Cross-dataset gap** (idioms − non-idioms): Δ = -1.9299  (95% CI [-2.6452, -1.2074]) → idioms < non-idioms, **significant**.

### H_s^reg / H(p)

*regularized synergy ratio (>= 1). ↑ bigger = LESS synergy*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 1.1945 | 1.1490 | 1.0905 | 1.4719 | [1.1491, 1.2470] |
| non-idioms | 18 | 0 | 1.4421 | 1.4252 | 1.1654 | 1.7466 | [1.3604, 1.5252] |

**Cross-dataset gap** (idioms − non-idioms): Δ = -0.2477  (95% CI [-0.3437, -0.1514]) → idioms < non-idioms, **significant**.

### H_s (original)

*synergy entropy = -log max{0, p - max(q,r)} (nats); +inf if ANY slot non-synergistic (mostly +inf)*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 15 | 3 | 5.1623 | 5.0551 | 3.8394 | 6.5009 | [4.7686, 5.5667] |
| non-idioms | 6 | 12 | 5.7693 | 5.5466 | 5.2551 | 6.4949 | [5.4224, 6.1465] |

**Cross-dataset gap** (idioms − non-idioms): Δ = -0.6070  (95% CI [-1.1587, -0.0766]) → idioms < non-idioms, **significant**.

### H_s / H(p)

*original synergy ratio (mostly +inf; use H_s^log or syn_frac instead)*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 15 | 3 | 1.1528 | 1.1306 | 1.0905 | 1.2654 | [1.1282, 1.1802] |
| non-idioms | 6 | 12 | 1.2560 | 1.2585 | 1.1654 | 1.3301 | [1.2121, 1.2940] |

**Cross-dataset gap** (idioms − non-idioms): Δ = -0.1032  (95% CI [-0.1510, -0.0536]) → idioms < non-idioms, **significant**.

## Per-phrase detail

#### Idioms (sorted by H_u/H, descending)

| phrase | H(p) | H_u | H_u/H | syn_frac | H_s^log | H_s^log/H | H_s^reg | H_s^reg/H | H_s (orig) | n_idiom | n_head | n_non |
|---|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|
| cut corners | 4.0242 | 5.3604 | 1.3320 | 1.00 | 1.3362 | 0.3320 | 4.3914 | 1.0912 | 4.3914 | 5 | 5 | 5 |
| break the mold | 3.4261 | 4.5419 | 1.3257 | 1.00 | 1.1157 | 0.3257 | 3.8394 | 1.1206 | 3.8394 | 5 | 5 | 5 |
| bite the dust | 4.2435 | 5.5978 | 1.3191 | 1.00 | 1.3543 | 0.3191 | 4.6275 | 1.0905 | 4.6275 | 5 | 5 | 5 |
| strike a chord | 3.9945 | 5.1564 | 1.2909 | 1.00 | 1.1618 | 0.2909 | 4.4016 | 1.1019 | 4.4016 | 5 | 5 | 5 |
| call the shots | 3.9079 | 5.0157 | 1.2835 | 1.00 | 1.1078 | 0.2835 | 4.3971 | 1.1252 | 4.3971 | 5 | 5 | 5 |
| spill the beans | 3.7601 | 4.6896 | 1.2472 | 0.80 | 0.9295 | 0.2472 | 4.9947 | 1.3283 | +inf | 5 | 5 | 5 |
| clear the air | 4.4107 | 5.2729 | 1.1955 | 1.00 | 0.8622 | 0.1955 | 5.0532 | 1.1457 | 5.0532 | 5 | 5 | 5 |
| pull strings | 4.5240 | 5.3699 | 1.1870 | 1.00 | 0.8459 | 0.1870 | 5.1131 | 1.1302 | 5.1131 | 5 | 5 | 5 |
| get the sack | 4.7428 | 5.5599 | 1.1723 | 1.00 | 0.8171 | 0.1723 | 6.0013 | 1.2654 | 6.0013 | 5 | 5 | 5 |
| rock the boat | 3.7066 | 4.3291 | 1.1680 | 1.00 | 0.6225 | 0.1680 | 4.5701 | 1.2330 | 4.5701 | 5 | 5 | 5 |
| lead the field | 4.7446 | 5.5350 | 1.1666 | 1.00 | 0.7905 | 0.1666 | 5.4675 | 1.1524 | 5.4675 | 5 | 5 | 5 |
| lose ground | 4.9108 | 5.7120 | 1.1632 | 1.00 | 0.8012 | 0.1632 | 5.5522 | 1.1306 | 5.5522 | 5 | 5 | 5 |
| run the show | 5.0538 | 5.7948 | 1.1466 | 1.00 | 0.7411 | 0.1466 | 6.0858 | 1.2042 | 6.0858 | 5 | 5 | 5 |
| have a ball | 4.1533 | 4.7545 | 1.1447 | 1.00 | 0.6011 | 0.1447 | 5.0551 | 1.2171 | 5.0551 | 5 | 5 | 5 |
| make waves | 4.5406 | 5.1974 | 1.1447 | 0.60 | 0.6568 | 0.1447 | 6.6833 | 1.4719 | +inf | 5 | 5 | 5 |
| turn tail | 5.5030 | 6.2761 | 1.1405 | 1.00 | 0.7731 | 0.1405 | 6.3775 | 1.1589 | 6.3775 | 5 | 5 | 5 |
| mean business | 5.7807 | 6.5154 | 1.1271 | 1.00 | 0.7347 | 0.1271 | 6.5009 | 1.1246 | 6.5009 | 5 | 5 | 5 |
| raise hell | 4.8918 | 5.2465 | 1.0725 | 0.80 | 0.3547 | 0.0725 | 6.8904 | 1.4086 | +inf | 5 | 5 | 5 |

#### Non-idioms (sorted by H_u/H, descending)

| phrase | H(p) | H_u | H_u/H | syn_frac | H_s^log | H_s^log/H | H_s^reg | H_s^reg/H | H_s (orig) | n_idiom | n_head | n_non |
|---|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|
| clear the table | 4.8241 | 5.4679 | 1.1335 | 1.00 | 0.6438 | 0.1335 | 5.6221 | 1.1654 | 5.6221 | 5 | 5 | 5 |
| call the police | 4.4330 | 4.9873 | 1.1250 | 1.00 | 0.5543 | 0.1250 | 5.4711 | 1.2342 | 5.4711 | 5 | 5 | 5 |
| build the boat | 3.9509 | 4.4438 | 1.1248 | 1.00 | 0.4929 | 0.1248 | 5.2551 | 1.3301 | 5.2551 | 5 | 5 | 5 |
| cut hair | 4.8990 | 5.4926 | 1.1211 | 0.80 | 0.5935 | 0.1211 | 6.3720 | 1.3007 | +inf | 5 | 5 | 5 |
| break the window | 4.2394 | 4.7322 | 1.1163 | 1.00 | 0.4928 | 0.1163 | 5.4656 | 1.2892 | 5.4656 | 5 | 5 | 5 |
| tie knots | 5.0713 | 5.5772 | 1.0997 | 1.00 | 0.5058 | 0.0997 | 6.4949 | 1.2807 | 6.4949 | 5 | 5 | 5 |
| raise children | 5.1013 | 5.5734 | 1.0925 | 1.00 | 0.4721 | 0.0925 | 6.3068 | 1.2363 | 6.3068 | 5 | 5 | 5 |
| turn dials | 5.5217 | 6.0142 | 1.0892 | 0.80 | 0.4925 | 0.0892 | 7.1039 | 1.2865 | +inf | 5 | 5 | 5 |
| make lunch | 4.6647 | 4.9714 | 1.0657 | 0.40 | 0.3067 | 0.0657 | 7.6930 | 1.6492 | +inf | 5 | 5 | 5 |
| eat the apple | 4.9444 | 5.2644 | 1.0647 | 0.60 | 0.3200 | 0.0647 | 7.9469 | 1.6072 | +inf | 5 | 5 | 5 |
| lead the meeting | 5.6028 | 5.8500 | 1.0441 | 0.60 | 0.2472 | 0.0441 | 8.1931 | 1.4623 | +inf | 5 | 5 | 5 |
| see the show | 5.6224 | 5.8617 | 1.0426 | 0.40 | 0.2393 | 0.0426 | 8.8536 | 1.5747 | +inf | 5 | 5 | 5 |
| get a present | 4.8765 | 5.0685 | 1.0394 | 0.40 | 0.1920 | 0.0394 | 8.2254 | 1.6867 | +inf | 5 | 5 | 5 |
| throw a ball | 4.6551 | 4.8376 | 1.0392 | 0.60 | 0.1825 | 0.0392 | 7.5339 | 1.6184 | +inf | 5 | 5 | 5 |
| lose keys | 5.6681 | 5.8452 | 1.0312 | 0.60 | 0.1770 | 0.0312 | 8.3990 | 1.4818 | +inf | 5 | 5 | 5 |
| remember details | 6.1989 | 6.3737 | 1.0282 | 0.80 | 0.1748 | 0.0282 | 8.6042 | 1.3880 | +inf | 5 | 5 | 5 |
| spill the water | 4.3287 | 4.4119 | 1.0192 | 0.80 | 0.0832 | 0.0192 | 7.5607 | 1.7466 | +inf | 5 | 5 | 5 |
| strike a drum | 5.9500 | 6.0502 | 1.0168 | 0.40 | 0.1003 | 0.0168 | 9.6390 | 1.6200 | +inf | 5 | 5 | 5 |

