# H_u / H_s analysis — qwen3-8b-base · full · joint

Idiom vs. parallel literal-VP (non-idiom) datasets. All entropies are uniform MC averages over each phrase's observed contexts; scores are unnormalized geometric-mean (or joint) per-token LM probabilities. See `README.md` / `code/SCORING_MATH.md` for the derivation.

> **How to read these magnitudes** (directions, why H_s is +inf, the new finite synergy metrics): see [`INTERPRETATION.md`](../INTERPRETATION.md). Quick key — ↑`H_u/H(p)`, ↑`syn_frac`, ↑`H_s^log` mean **more** synergy; ↑`H_s^reg` and the original ↑`H_s` mean **less** synergy.

## Configuration

| field | idioms run | non-idioms run |
|---|---|---|
| model | Qwen/Qwen3-8B-Base | Qwen/Qwen3-8B-Base |
| reduction | joint | joint |
| medial_only | False | False |
| dtype | bfloat16 | bfloat16 |
| num_idioms | 18 | 18 |
| dataset | /home/prada/PID_evaluation/data/dataset.tsv | /home/prada/PID_evaluation/data/nonidioms_dataset.tsv |

Bound `H_u + H_s ≥ 2H(p) + 2log2 ≥ H(p)` holds for 18/18 idioms and 18/18 non-idioms.

`H_s = +inf` (≥1 non-synergistic slot) for 13/18 idioms and 18/18 non-idioms. A *finite* H_s means **every** context of that phrase is synergistic (p > max(q,r) everywhere).

## Per-metric summary (idioms vs non-idioms)

Means and 95% bootstrap CIs (20k resamples, percentile method, phrase-level). Non-finite values are dropped per metric before bootstrapping; the drop count is shown.

### H(p)

*base entropy, uniform MC average over contexts (nats). ↓ smaller = idiom more concentrated*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 50.1811 | 51.2663 | 39.7945 | 55.0037 | [48.1464, 52.0212] |
| non-idioms | 18 | 0 | 57.5209 | 58.1081 | 51.0780 | 67.0202 | [55.5500, 59.5759] |

**Cross-dataset gap** (idioms − non-idioms): Δ = -7.3399  (95% CI [-10.1737, -4.5783]) → idioms < non-idioms, **significant**.

### H_u

*unique / redundant entropy = -log min{p, max(q,r)} (nats); >= H(p)*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 56.7438 | 56.9685 | 47.2404 | 63.1979 | [54.8972, 58.4439] |
| non-idioms | 18 | 0 | 58.9463 | 58.7440 | 51.2523 | 67.9595 | [56.9842, 60.9344] |

**Cross-dataset gap** (idioms − non-idioms): Δ = -2.2025  (95% CI [-4.8420, 0.4310]) → idioms < non-idioms, not significant.

### H_u / H(p)

*unique-information ratio (>= 1). ↑ bigger = MORE synergy. THE headline metric*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 1.1336 | 1.1245 | 1.0389 | 1.2275 | [1.1078, 1.1603] |
| non-idioms | 18 | 0 | 1.0252 | 1.0200 | 1.0033 | 1.0721 | [1.0170, 1.0345] |

**Cross-dataset gap** (idioms − non-idioms): Δ = 0.1084  (95% CI [0.0805, 0.1362]) → idioms > non-idioms, **significant**.

### syn_frac

*synergy coverage in [0,1] = frac. of contexts with p>m. ↑ bigger = MORE synergy (most intuitive)*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 0.8111 | 0.9000 | 0.3000 | 1.0000 | [0.7056, 0.9056] |
| non-idioms | 18 | 0 | 0.4389 | 0.3500 | 0.1000 | 0.9000 | [0.3278, 0.5611] |

**Cross-dataset gap** (idioms − non-idioms): Δ = 0.3722  (95% CI [0.2111, 0.5222]) → idioms > non-idioms, **significant**.

### H_s^log

*log-space synergy = mean max{0, log p - log m} (nats). ↑ bigger = MORE synergy; finite always*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 6.5628 | 6.1666 | 2.0923 | 10.0798 | [5.3913, 7.7561] |
| non-idioms | 18 | 0 | 1.4254 | 1.1145 | 0.1743 | 3.8885 | [0.9786, 1.9244] |

**Cross-dataset gap** (idioms − non-idioms): Δ = 5.1374  (95% CI [3.8540, 6.4160]) → idioms > non-idioms, **significant**.

### H_s^log / H(p)

*log-space synergy ratio. ↑ bigger = MORE synergy*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 0.1336 | 0.1245 | 0.0389 | 0.2275 | [0.1078, 0.1603] |
| non-idioms | 18 | 0 | 0.0252 | 0.0200 | 0.0033 | 0.0721 | [0.0170, 0.0345] |

**Cross-dataset gap** (idioms − non-idioms): Δ = 0.1084  (95% CI [0.0805, 0.1362]) → idioms > non-idioms, **significant**.

### H_s^log signed

*signed log-space synergy = mean(log p - log m); can be negative (net anti-synergistic)*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 5.7477 | 6.0895 | -4.1264 | 10.0798 | [3.9838, 7.3757] |
| non-idioms | 18 | 0 | -1.2319 | -1.9224 | -5.1333 | 3.0709 | [-2.3251, -0.0761] |

**Cross-dataset gap** (idioms − non-idioms): Δ = 6.9796  (95% CI [4.8520, 8.9839]) → idioms > non-idioms, **significant**.

### H_s^reg

*regularized H_s (eps-floored, finite, continuous). ↑ bigger = LESS synergy*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 51.0962 | 51.8122 | 40.2838 | 57.3082 | [48.8370, 53.1782] |
| non-idioms | 18 | 0 | 60.2358 | 61.3895 | 54.2270 | 70.4329 | [58.1110, 62.4369] |

**Cross-dataset gap** (idioms − non-idioms): Δ = -9.1396  (95% CI [-12.2212, -6.1277]) → idioms < non-idioms, **significant**.

### H_s^reg / H(p)

*regularized synergy ratio (>= 1). ↑ bigger = LESS synergy*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 1.0176 | 1.0105 | 1.0000 | 1.0600 | [1.0096, 1.0266] |
| non-idioms | 18 | 0 | 1.0473 | 1.0522 | 1.0104 | 1.0815 | [1.0377, 1.0564] |

**Cross-dataset gap** (idioms − non-idioms): Δ = -0.0297  (95% CI [-0.0420, -0.0165]) → idioms < non-idioms, **significant**.

### H_s (original)

*synergy entropy = -log max{0, p - max(q,r)} (nats); +inf if ANY slot non-synergistic (mostly +inf)*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 5 | 13 | 47.8296 | 47.8235 | 44.3115 | 52.4357 | [45.3873, 50.3769] |
| non-idioms | 0 | 18 | — | — | — | — | (no finite values) |

**Cross-dataset gap**: insufficient finite values in one dataset.

### H_s / H(p)

*original synergy ratio (mostly +inf; use H_s^log or syn_frac instead)*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 5 | 13 | 1.0011 | 1.0009 | 1.0000 | 1.0028 | [1.0004, 1.0021] |
| non-idioms | 0 | 18 | — | — | — | — | (no finite values) |

**Cross-dataset gap**: insufficient finite values in one dataset.

## Per-phrase detail

#### Idioms (sorted by H_u/H, descending)

| phrase | H(p) | H_u | H_u/H | syn_frac | H_s^log | H_s^log/H | H_s^reg | H_s^reg/H | H_s (orig) | n_idiom | n_head | n_non |
|---|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|
| strike a chord | 44.3102 | 54.3900 | 1.2275 | 1.00 | 10.0798 | 0.2275 | 44.3115 | 1.0000 | 44.3115 | 10 | 10 | 10 |
| spill the beans | 50.4209 | 60.1509 | 1.1930 | 0.90 | 9.7301 | 0.1930 | 50.8826 | 1.0092 | +inf | 10 | 10 | 10 |
| bite the dust | 51.7872 | 61.6263 | 1.1900 | 0.90 | 9.8391 | 0.1900 | 52.2479 | 1.0089 | +inf | 10 | 10 | 10 |
| cut corners | 45.4120 | 54.0262 | 1.1897 | 0.90 | 8.6142 | 0.1897 | 45.8807 | 1.0103 | +inf | 10 | 10 | 10 |
| break the mold | 49.2864 | 58.6182 | 1.1893 | 1.00 | 9.3318 | 0.1893 | 49.3320 | 1.0009 | 49.3320 | 10 | 10 | 10 |
| rock the boat | 39.7945 | 47.2404 | 1.1871 | 0.90 | 7.4459 | 0.1871 | 40.2838 | 1.0123 | +inf | 10 | 10 | 10 |
| call the shots | 47.7909 | 56.6218 | 1.1848 | 1.00 | 8.8309 | 0.1848 | 47.8235 | 1.0007 | 47.8235 | 10 | 10 | 10 |
| turn tail | 53.9066 | 63.1979 | 1.1724 | 0.90 | 9.2913 | 0.1724 | 54.3675 | 1.0085 | +inf | 10 | 10 | 10 |
| clear the air | 52.3775 | 59.1838 | 1.1299 | 1.00 | 6.8063 | 0.1299 | 52.4357 | 1.0011 | 52.4357 | 10 | 10 | 10 |
| lead the field | 45.1171 | 50.4898 | 1.1191 | 1.00 | 5.3727 | 0.1191 | 45.2451 | 1.0028 | 45.2451 | 10 | 10 | 10 |
| have a ball | 50.8130 | 56.3399 | 1.1088 | 0.90 | 5.5269 | 0.1088 | 51.3765 | 1.0111 | +inf | 10 | 10 | 10 |
| pull strings | 54.9810 | 60.2343 | 1.0955 | 0.90 | 5.2533 | 0.0955 | 55.6935 | 1.0130 | +inf | 10 | 10 | 10 |
| mean business | 48.4524 | 52.7413 | 1.0885 | 0.70 | 4.2889 | 0.0885 | 49.8431 | 1.0287 | +inf | 10 | 10 | 10 |
| raise hell | 54.5448 | 59.0976 | 1.0835 | 0.90 | 4.5529 | 0.0835 | 55.1295 | 1.0107 | +inf | 10 | 10 | 10 |
| lose ground | 55.0037 | 59.0078 | 1.0728 | 0.50 | 4.0041 | 0.0728 | 57.3082 | 1.0419 | +inf | 10 | 10 | 10 |
| run the show | 51.7196 | 55.1951 | 1.0672 | 0.40 | 3.4755 | 0.0672 | 54.4852 | 1.0535 | +inf | 10 | 10 | 10 |
| get the sack | 53.7213 | 57.3152 | 1.0669 | 0.50 | 3.5939 | 0.0669 | 56.0365 | 1.0431 | +inf | 10 | 10 | 10 |
| make waves | 53.8200 | 55.9123 | 1.0389 | 0.30 | 2.0923 | 0.0389 | 57.0488 | 1.0600 | +inf | 10 | 10 | 10 |

#### Non-idioms (sorted by H_u/H, descending)

| phrase | H(p) | H_u | H_u/H | syn_frac | H_s^log | H_s^log/H | H_s^reg | H_s^reg/H | H_s (orig) | n_idiom | n_head | n_non |
|---|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|
| cut hair | 53.9251 | 57.8136 | 1.0721 | 0.90 | 3.8885 | 0.0721 | 54.4838 | 1.0104 | +inf | 10 | 10 | 10 |
| call the police | 53.8268 | 57.0672 | 1.0602 | 0.90 | 3.2404 | 0.0602 | 54.4725 | 1.0120 | +inf | 10 | 10 | 10 |
| clear the table | 54.0834 | 56.8642 | 1.0514 | 0.90 | 2.7808 | 0.0514 | 54.7817 | 1.0129 | +inf | 10 | 10 | 10 |
| break the window | 59.7650 | 62.0605 | 1.0384 | 0.60 | 2.2955 | 0.0384 | 61.7662 | 1.0335 | +inf | 10 | 10 | 10 |
| raise children | 58.6680 | 60.8437 | 1.0371 | 0.50 | 2.1757 | 0.0371 | 61.0474 | 1.0406 | +inf | 10 | 10 | 10 |
| build the boat | 51.2233 | 52.6786 | 1.0284 | 0.40 | 1.4553 | 0.0284 | 54.2270 | 1.0586 | +inf | 10 | 10 | 10 |
| tie knots | 62.7847 | 64.3255 | 1.0245 | 0.70 | 1.5407 | 0.0245 | 64.4872 | 1.0271 | +inf | 10 | 10 | 10 |
| eat the apple | 59.6176 | 60.9960 | 1.0231 | 0.40 | 1.3784 | 0.0231 | 62.4093 | 1.0468 | +inf | 10 | 10 | 10 |
| see the show | 53.4251 | 54.5735 | 1.0215 | 0.40 | 1.1484 | 0.0215 | 56.2585 | 1.0530 | +inf | 10 | 10 | 10 |
| get a present | 58.2723 | 59.3528 | 1.0185 | 0.30 | 1.0805 | 0.0185 | 61.7315 | 1.0594 | +inf | 10 | 10 | 10 |
| make lunch | 53.6916 | 54.6673 | 1.0182 | 0.20 | 0.9757 | 0.0182 | 57.3785 | 1.0687 | +inf | 10 | 10 | 10 |
| turn dials | 67.0202 | 67.9595 | 1.0140 | 0.30 | 0.9393 | 0.0140 | 70.4329 | 1.0509 | +inf | 10 | 10 | 10 |
| throw a ball | 59.1196 | 59.7859 | 1.0113 | 0.30 | 0.6663 | 0.0113 | 62.6048 | 1.0590 | +inf | 10 | 10 | 10 |
| lose keys | 63.0184 | 63.6629 | 1.0102 | 0.30 | 0.6445 | 0.0102 | 66.2833 | 1.0518 | +inf | 10 | 10 | 10 |
| spill the water | 62.8314 | 63.4682 | 1.0101 | 0.30 | 0.6368 | 0.0101 | 66.1313 | 1.0525 | +inf | 10 | 10 | 10 |
| lead the meeting | 55.0819 | 55.5266 | 1.0081 | 0.30 | 0.4447 | 0.0081 | 58.4022 | 1.0603 | +inf | 10 | 10 | 10 |
| remember details | 51.0780 | 51.2523 | 1.0034 | 0.10 | 0.1743 | 0.0034 | 55.2419 | 1.0815 | +inf | 10 | 10 | 10 |
| strike a drum | 57.9438 | 58.1352 | 1.0033 | 0.10 | 0.1913 | 0.0033 | 62.1045 | 1.0718 | +inf | 10 | 10 | 10 |

