# H_u / H_s analysis — qwen3-8b-base · medial · geo

Idiom vs. parallel literal-VP (non-idiom) datasets. All entropies are uniform MC averages over each phrase's observed contexts; scores are unnormalized geometric-mean (or joint) per-token LM probabilities. See `README.md` / `code/SCORING_MATH.md` for the derivation.

> **How to read these magnitudes** (directions, why H_s is +inf, the new finite synergy metrics): see [`INTERPRETATION.md`](../INTERPRETATION.md). Quick key — ↑`H_u/H(p)`, ↑`syn_frac`, ↑`H_s^log` mean **more** synergy; ↑`H_s^reg` and the original ↑`H_s` mean **less** synergy.

## Configuration

| field | idioms run | non-idioms run |
|---|---|---|
| model | Qwen/Qwen3-8B-Base | Qwen/Qwen3-8B-Base |
| reduction | geometric_mean | geometric_mean |
| medial_only | True | True |
| dtype | bfloat16 | bfloat16 |
| num_idioms | 18 | 18 |
| dataset | /home/prada/PID_evaluation/data/dataset.tsv | /home/prada/PID_evaluation/data/nonidioms_dataset.tsv |

Bound `H_u + H_s ≥ 2H(p) + 2log2 ≥ H(p)` holds for 18/18 idioms and 18/18 non-idioms.

`H_s = +inf` (≥1 non-synergistic slot) for 3/18 idioms and 14/18 non-idioms. A *finite* H_s means **every** context of that phrase is synergistic (p > max(q,r) everywhere).

## Per-metric summary (idioms vs non-idioms)

Means and 95% bootstrap CIs (20k resamples, percentile method, phrase-level). Non-finite values are dropped per metric before bootstrapping; the drop count is shown.

### H(p)

*base entropy, uniform MC average over contexts (nats). ↓ smaller = idiom more concentrated*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 4.0829 | 4.0334 | 3.0850 | 5.5690 | [3.7992, 4.3803] |
| non-idioms | 18 | 0 | 4.6077 | 4.3651 | 3.5701 | 5.7603 | [4.3289, 4.9014] |

**Cross-dataset gap** (idioms − non-idioms): Δ = -0.5249  (95% CI [-0.9321, -0.1133]) → idioms < non-idioms, **significant**.

### H_u

*unique / redundant entropy = -log min{p, max(q,r)} (nats); >= H(p)*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 4.9355 | 4.8954 | 3.7983 | 6.1498 | [4.6743, 5.2017] |
| non-idioms | 18 | 0 | 4.9483 | 4.8940 | 3.8767 | 5.9257 | [4.6765, 5.2195] |

**Cross-dataset gap** (idioms − non-idioms): Δ = -0.0129  (95% CI [-0.3916, 0.3674]) → idioms < non-idioms, not significant.

### H_u / H(p)

*unique-information ratio (>= 1). ↑ bigger = MORE synergy. THE headline metric*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 1.2173 | 1.2044 | 1.0920 | 1.3582 | [1.1808, 1.2555] |
| non-idioms | 18 | 0 | 1.0764 | 1.0799 | 1.0086 | 1.1540 | [1.0598, 1.0935] |

**Cross-dataset gap** (idioms − non-idioms): Δ = 0.1409  (95% CI [0.1000, 0.1824]) → idioms > non-idioms, **significant**.

### syn_frac

*synergy coverage in [0,1] = frac. of contexts with p>m. ↑ bigger = MORE synergy (most intuitive)*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 0.9556 | 1.0000 | 0.6000 | 1.0000 | [0.9000, 1.0000] |
| non-idioms | 18 | 0 | 0.7556 | 0.8000 | 0.2000 | 1.0000 | [0.6556, 0.8444] |

**Cross-dataset gap** (idioms − non-idioms): Δ = 0.2000  (95% CI [0.1000, 0.3111]) → idioms > non-idioms, **significant**.

### H_s^log

*log-space synergy = mean max{0, log p - log m} (nats). ↑ bigger = MORE synergy; finite always*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 0.8526 | 0.7787 | 0.4235 | 1.4028 | [0.7387, 0.9726] |
| non-idioms | 18 | 0 | 0.3406 | 0.3378 | 0.0483 | 0.6520 | [0.2703, 0.4119] |

**Cross-dataset gap** (idioms − non-idioms): Δ = 0.5120  (95% CI [0.3761, 0.6505]) → idioms > non-idioms, **significant**.

### H_s^log / H(p)

*log-space synergy ratio. ↑ bigger = MORE synergy*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 0.2173 | 0.2044 | 0.0920 | 0.3582 | [0.1808, 0.2555] |
| non-idioms | 18 | 0 | 0.0764 | 0.0799 | 0.0086 | 0.1540 | [0.0598, 0.0935] |

**Cross-dataset gap** (idioms − non-idioms): Δ = 0.1409  (95% CI [0.1000, 0.1824]) → idioms > non-idioms, **significant**.

### H_s^log signed

*signed log-space synergy = mean(log p - log m); can be negative (net anti-synergistic)*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 0.8383 | 0.7731 | 0.3854 | 1.4028 | [0.7158, 0.9650] |
| non-idioms | 18 | 0 | 0.2836 | 0.2601 | -0.0739 | 0.5575 | [0.2023, 0.3646] |

**Cross-dataset gap** (idioms − non-idioms): Δ = 0.5547  (95% CI [0.4056, 0.7053]) → idioms > non-idioms, **significant**.

### H_s^reg

*regularized H_s (eps-floored, finite, continuous). ↑ bigger = LESS synergy*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 4.9710 | 4.6744 | 3.7805 | 6.9834 | [4.5543, 5.4048] |
| non-idioms | 18 | 0 | 6.7672 | 6.3941 | 5.3324 | 9.6273 | [6.2694, 7.3153] |

**Cross-dataset gap** (idioms − non-idioms): Δ = -1.7962  (95% CI [-2.4792, -1.1231]) → idioms < non-idioms, **significant**.

### H_s^reg / H(p)

*regularized synergy ratio (>= 1). ↑ bigger = LESS synergy*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 1.2158 | 1.1862 | 1.0891 | 1.5122 | [1.1680, 1.2703] |
| non-idioms | 18 | 0 | 1.4688 | 1.4734 | 1.2206 | 1.7083 | [1.4047, 1.5322] |

**Cross-dataset gap** (idioms − non-idioms): Δ = -0.2530  (95% CI [-0.3330, -0.1701]) → idioms < non-idioms, **significant**.

### H_s (original)

*synergy entropy = -log max{0, p - max(q,r)} (nats); +inf if ANY slot non-synergistic (mostly +inf)*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 15 | 3 | 4.8559 | 4.6030 | 3.7805 | 6.9834 | [4.4156, 5.3365] |
| non-idioms | 4 | 14 | 5.7462 | 5.6952 | 5.3324 | 6.2620 | [5.3333, 6.1591] |

**Cross-dataset gap** (idioms − non-idioms): Δ = -0.8903  (95% CI [-1.4970, -0.2579]) → idioms < non-idioms, **significant**.

### H_s / H(p)

*original synergy ratio (mostly +inf; use H_s^log or syn_frac instead)*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 15 | 3 | 1.1773 | 1.1641 | 1.0891 | 1.3363 | [1.1437, 1.2131] |
| non-idioms | 4 | 14 | 1.3542 | 1.3076 | 1.2206 | 1.5808 | [1.2456, 1.5093] |

**Cross-dataset gap** (idioms − non-idioms): Δ = -0.1769  (95% CI [-0.3293, -0.0618]) → idioms < non-idioms, **significant**.

## Per-phrase detail

#### Idioms (sorted by H_u/H, descending)

| phrase | H(p) | H_u | H_u/H | syn_frac | H_s^log | H_s^log/H | H_s^reg | H_s^reg/H | H_s (orig) | n_idiom | n_head | n_non |
|---|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|
| bite the dust | 3.9160 | 5.3188 | 1.3582 | 1.00 | 1.4028 | 0.3582 | 4.2648 | 1.0891 | 4.2648 | 5 | 5 | 5 |
| cut corners | 3.6547 | 4.9105 | 1.3436 | 1.00 | 1.2558 | 0.3436 | 4.0388 | 1.1051 | 4.0388 | 5 | 5 | 5 |
| strike a chord | 3.6373 | 4.7754 | 1.3129 | 1.00 | 1.1382 | 0.3129 | 4.0634 | 1.1172 | 4.0634 | 5 | 5 | 5 |
| call the shots | 3.5504 | 4.6554 | 1.3112 | 1.00 | 1.1050 | 0.3112 | 3.9710 | 1.1185 | 3.9710 | 5 | 5 | 5 |
| break the mold | 3.3102 | 4.3241 | 1.3063 | 1.00 | 1.0139 | 0.3063 | 3.7805 | 1.1421 | 3.7805 | 5 | 5 | 5 |
| spill the beans | 3.3386 | 4.2754 | 1.2806 | 0.80 | 0.9368 | 0.2806 | 4.5667 | 1.3678 | +inf | 5 | 5 | 5 |
| rock the boat | 3.0850 | 3.7983 | 1.2312 | 1.00 | 0.7134 | 0.2312 | 3.8140 | 1.2363 | 3.8140 | 5 | 5 | 5 |
| turn tail | 4.8765 | 5.9166 | 1.2133 | 1.00 | 1.0401 | 0.2133 | 5.3547 | 1.0981 | 5.3547 | 5 | 5 | 5 |
| clear the air | 3.9902 | 4.8346 | 1.2116 | 1.00 | 0.8445 | 0.2116 | 4.6016 | 1.1532 | 4.6016 | 5 | 5 | 5 |
| lead the field | 4.0766 | 4.8802 | 1.1971 | 1.00 | 0.8036 | 0.1971 | 4.7458 | 1.1641 | 4.7458 | 5 | 5 | 5 |
| get the sack | 4.2843 | 5.0381 | 1.1759 | 0.80 | 0.7538 | 0.1759 | 5.7661 | 1.3459 | +inf | 5 | 5 | 5 |
| pull strings | 4.3273 | 5.0550 | 1.1682 | 1.00 | 0.7277 | 0.1682 | 5.1294 | 1.1854 | 5.1294 | 5 | 5 | 5 |
| have a ball | 3.6612 | 4.2464 | 1.1598 | 1.00 | 0.5851 | 0.1598 | 4.6030 | 1.2572 | 4.6030 | 5 | 5 | 5 |
| run the show | 4.6009 | 5.3065 | 1.1534 | 1.00 | 0.7056 | 0.1534 | 5.5921 | 1.2154 | 5.5921 | 5 | 5 | 5 |
| make waves | 4.1709 | 4.7874 | 1.1478 | 0.60 | 0.6166 | 0.1478 | 6.3071 | 1.5122 | +inf | 5 | 5 | 5 |
| lose ground | 4.8386 | 5.5384 | 1.1446 | 1.00 | 0.6998 | 0.1446 | 5.7436 | 1.1870 | 5.7436 | 5 | 5 | 5 |
| mean business | 5.5690 | 6.1498 | 1.1043 | 1.00 | 0.5808 | 0.1043 | 6.9834 | 1.2540 | 6.9834 | 5 | 5 | 5 |
| raise hell | 4.6037 | 5.0272 | 1.0920 | 1.00 | 0.4235 | 0.0920 | 6.1520 | 1.3363 | 6.1520 | 5 | 5 | 5 |

#### Non-idioms (sorted by H_u/H, descending)

| phrase | H(p) | H_u | H_u/H | syn_frac | H_s^log | H_s^log/H | H_s^reg | H_s^reg/H | H_s (orig) | n_idiom | n_head | n_non |
|---|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|
| cut hair | 4.2337 | 4.8858 | 1.1540 | 0.80 | 0.6520 | 0.1540 | 5.7055 | 1.3476 | +inf | 5 | 5 | 5 |
| clear the table | 4.3686 | 4.9022 | 1.1222 | 1.00 | 0.5336 | 0.1222 | 5.3324 | 1.2206 | 5.3324 | 5 | 5 | 5 |
| eat the apple | 4.3616 | 4.8444 | 1.1107 | 0.80 | 0.4828 | 0.1107 | 6.2131 | 1.4245 | +inf | 5 | 5 | 5 |
| break the window | 4.0394 | 4.4628 | 1.1048 | 1.00 | 0.4235 | 0.1048 | 5.3343 | 1.3206 | 5.3343 | 5 | 5 | 5 |
| raise children | 4.6962 | 5.1845 | 1.1040 | 0.80 | 0.4883 | 0.1040 | 6.5262 | 1.3897 | +inf | 5 | 5 | 5 |
| call the police | 4.2933 | 4.7204 | 1.0995 | 0.80 | 0.4271 | 0.0995 | 5.9559 | 1.3872 | +inf | 5 | 5 | 5 |
| tie knots | 4.8365 | 5.3140 | 1.0987 | 1.00 | 0.4774 | 0.0987 | 6.2620 | 1.2947 | 6.2620 | 5 | 5 | 5 |
| make lunch | 4.2644 | 4.6332 | 1.0865 | 0.60 | 0.3689 | 0.0865 | 6.8753 | 1.6123 | +inf | 5 | 5 | 5 |
| build the boat | 3.5701 | 3.8767 | 1.0859 | 0.80 | 0.3066 | 0.0859 | 5.8415 | 1.6362 | +inf | 5 | 5 | 5 |
| turn dials | 5.5181 | 5.9257 | 1.0739 | 0.80 | 0.4076 | 0.0739 | 7.2145 | 1.3074 | +inf | 5 | 5 | 5 |
| throw a ball | 3.9689 | 4.2325 | 1.0664 | 0.80 | 0.2636 | 0.0664 | 6.1250 | 1.5432 | +inf | 5 | 5 | 5 |
| get a present | 4.3021 | 4.5358 | 1.0543 | 0.60 | 0.2337 | 0.0543 | 6.8892 | 1.6014 | +inf | 5 | 5 | 5 |
| lose keys | 5.0285 | 5.2808 | 1.0502 | 0.80 | 0.2522 | 0.0502 | 7.2706 | 1.4459 | +inf | 5 | 5 | 5 |
| see the show | 5.0175 | 5.2615 | 1.0486 | 0.40 | 0.2440 | 0.0486 | 8.1012 | 1.6146 | +inf | 5 | 5 | 5 |
| lead the meeting | 5.2138 | 5.4298 | 1.0414 | 0.80 | 0.2160 | 0.0414 | 7.8259 | 1.5010 | +inf | 5 | 5 | 5 |
| spill the water | 3.8310 | 3.9809 | 1.0391 | 1.00 | 0.1499 | 0.0391 | 6.0561 | 1.5808 | 6.0561 | 5 | 5 | 5 |
| remember details | 5.7603 | 5.9151 | 1.0269 | 0.60 | 0.1548 | 0.0269 | 8.6543 | 1.5024 | +inf | 5 | 5 | 5 |
| strike a drum | 5.6355 | 5.6838 | 1.0086 | 0.20 | 0.0483 | 0.0086 | 9.6273 | 1.7083 | +inf | 5 | 5 | 5 |

