# H_u / H_s analysis — gpt2 · medial · geo

Idiom vs. parallel literal-VP (non-idiom) datasets. All entropies are uniform MC averages over each phrase's observed contexts; scores are unnormalized geometric-mean (or joint) per-token LM probabilities. See `README.md` / `code/SCORING_MATH.md` for the derivation.

> **How to read these magnitudes** (directions, why H_s is +inf, the new finite synergy metrics): see [`INTERPRETATION.md`](../INTERPRETATION.md). Quick key — ↑`H_u/H(p)`, ↑`syn_frac`, ↑`H_s^log` mean **more** synergy; ↑`H_s^reg` and the original ↑`H_s` mean **less** synergy.

## Configuration

| field | idioms run | non-idioms run |
|---|---|---|
| model | gpt2 | gpt2 |
| reduction | geometric_mean | geometric_mean |
| medial_only | True | True |
| dtype | float32 | float32 |
| num_idioms | 18 | 18 |
| dataset | /home/prada/PID_evaluation/data/dataset.tsv | /home/prada/PID_evaluation/data/nonidioms_dataset.tsv |

Bound `H_u + H_s ≥ 2H(p) + 2log2 ≥ H(p)` holds for 18/18 idioms and 18/18 non-idioms.

`H_s = +inf` (≥1 non-synergistic slot) for 3/18 idioms and 10/18 non-idioms. A *finite* H_s means **every** context of that phrase is synergistic (p > max(q,r) everywhere).

## Per-metric summary (idioms vs non-idioms)

Means and 95% bootstrap CIs (20k resamples, percentile method, phrase-level). Non-finite values are dropped per metric before bootstrapping; the drop count is shown.

### H(p)

*base entropy, uniform MC average over contexts (nats). ↓ smaller = idiom more concentrated*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 4.4053 | 4.3659 | 3.3171 | 5.6030 | [4.1189, 4.6964] |
| non-idioms | 18 | 0 | 4.7328 | 4.6320 | 3.8735 | 5.9359 | [4.4719, 5.0095] |

**Cross-dataset gap** (idioms − non-idioms): Δ = -0.3275  (95% CI [-0.7284, 0.0684]) → idioms < non-idioms, not significant.

### H_u

*unique / redundant entropy = -log min{p, max(q,r)} (nats); >= H(p)*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 5.0027 | 4.9214 | 4.0037 | 6.0681 | [4.7603, 5.2591] |
| non-idioms | 18 | 0 | 5.0005 | 4.9516 | 4.1539 | 5.9359 | [4.7606, 5.2490] |

**Cross-dataset gap** (idioms − non-idioms): Δ = 0.0022  (95% CI [-0.3518, 0.3554]) → idioms > non-idioms, not significant.

### H_u / H(p)

*unique-information ratio (>= 1). ↑ bigger = MORE synergy. THE headline metric*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 1.1425 | 1.1397 | 1.0374 | 1.3514 | [1.1101, 1.1789] |
| non-idioms | 18 | 0 | 1.0590 | 1.0579 | 1.0000 | 1.1286 | [1.0454, 1.0733] |

**Cross-dataset gap** (idioms − non-idioms): Δ = 0.0835  (95% CI [0.0481, 0.1223]) → idioms > non-idioms, **significant**.

### syn_frac

*synergy coverage in [0,1] = frac. of contexts with p>m. ↑ bigger = MORE synergy (most intuitive)*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 0.9444 | 1.0000 | 0.6000 | 1.0000 | [0.8778, 1.0000] |
| non-idioms | 18 | 0 | 0.7667 | 0.8000 | 0.0000 | 1.0000 | [0.6333, 0.8889] |

**Cross-dataset gap** (idioms − non-idioms): Δ = 0.1778  (95% CI [0.0444, 0.3333]) → idioms > non-idioms, **significant**.

### H_s^log

*log-space synergy = mean max{0, log p - log m} (nats). ↑ bigger = MORE synergy; finite always*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 0.5974 | 0.5976 | 0.1892 | 1.2748 | [0.4807, 0.7215] |
| non-idioms | 18 | 0 | 0.2678 | 0.2533 | 0.0000 | 0.5245 | [0.2079, 0.3285] |

**Cross-dataset gap** (idioms − non-idioms): Δ = 0.3297  (95% CI [0.1995, 0.4679]) → idioms > non-idioms, **significant**.

### H_s^log / H(p)

*log-space synergy ratio. ↑ bigger = MORE synergy*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 0.1425 | 0.1397 | 0.0374 | 0.3514 | [0.1101, 0.1789] |
| non-idioms | 18 | 0 | 0.0590 | 0.0579 | 0.0000 | 0.1286 | [0.0454, 0.0733] |

**Cross-dataset gap** (idioms − non-idioms): Δ = 0.0835  (95% CI [0.0481, 0.1223]) → idioms > non-idioms, **significant**.

### H_s^log signed

*signed log-space synergy = mean(log p - log m); can be negative (net anti-synergistic)*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 0.5781 | 0.5771 | 0.0522 | 1.2748 | [0.4540, 0.7105] |
| non-idioms | 18 | 0 | 0.2016 | 0.2276 | -0.3362 | 0.5245 | [0.1060, 0.2896] |

**Cross-dataset gap** (idioms − non-idioms): Δ = 0.3765  (95% CI [0.2226, 0.5384]) → idioms > non-idioms, **significant**.

### H_s^reg

*regularized H_s (eps-floored, finite, continuous). ↑ bigger = LESS synergy*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 5.5775 | 5.5454 | 3.9678 | 7.7969 | [5.1021, 6.0557] |
| non-idioms | 18 | 0 | 6.9789 | 6.6274 | 5.1008 | 10.5411 | [6.3877, 7.6383] |

**Cross-dataset gap** (idioms − non-idioms): Δ = -1.4014  (95% CI [-2.2167, -0.6279]) → idioms < non-idioms, **significant**.

### H_s^reg / H(p)

*regularized synergy ratio (>= 1). ↑ bigger = LESS synergy*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 1.2635 | 1.2246 | 1.0936 | 1.5340 | [1.2061, 1.3279] |
| non-idioms | 18 | 0 | 1.4675 | 1.4730 | 1.2407 | 1.7758 | [1.3969, 1.5390] |

**Cross-dataset gap** (idioms − non-idioms): Δ = -0.2040  (95% CI [-0.2981, -0.1073]) → idioms < non-idioms, **significant**.

### H_s (original)

*synergy entropy = -log max{0, p - max(q,r)} (nats); +inf if ANY slot non-synergistic (mostly +inf)*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 15 | 3 | 5.3257 | 5.4627 | 3.9678 | 7.3828 | [4.8867, 5.7945] |
| non-idioms | 8 | 10 | 5.8921 | 5.9721 | 5.1008 | 6.5228 | [5.5291, 6.2408] |

**Cross-dataset gap** (idioms − non-idioms): Δ = -0.5664  (95% CI [-1.1328, 0.0193]) → idioms < non-idioms, not significant.

### H_s / H(p)

*original synergy ratio (mostly +inf; use H_s^log or syn_frac instead)*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 15 | 3 | 1.2110 | 1.1894 | 1.0936 | 1.3177 | [1.1769, 1.2446] |
| non-idioms | 8 | 10 | 1.3518 | 1.3581 | 1.2407 | 1.4803 | [1.2884, 1.4155] |

**Cross-dataset gap** (idioms − non-idioms): Δ = -0.1408  (95% CI [-0.2135, -0.0686]) → idioms < non-idioms, **significant**.

## Per-phrase detail

#### Idioms (sorted by H_u/H, descending)

| phrase | H(p) | H_u | H_u/H | syn_frac | H_s^log | H_s^log/H | H_s^reg | H_s^reg/H | H_s (orig) | n_idiom | n_head | n_non |
|---|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|
| strike a chord | 3.6283 | 4.9031 | 1.3514 | 1.00 | 1.2748 | 0.3514 | 3.9678 | 1.0936 | 3.9678 | 5 | 5 | 5 |
| cut corners | 3.9856 | 4.8435 | 1.2153 | 1.00 | 0.8579 | 0.2153 | 4.6708 | 1.1719 | 4.6708 | 5 | 5 | 5 |
| rock the boat | 3.3171 | 4.0037 | 1.2070 | 1.00 | 0.6866 | 0.2070 | 4.0572 | 1.2231 | 4.0572 | 5 | 5 | 5 |
| spill the beans | 3.8214 | 4.5830 | 1.1993 | 1.00 | 0.7616 | 0.1993 | 4.5255 | 1.1843 | 4.5255 | 5 | 5 | 5 |
| break the mold | 3.7958 | 4.5096 | 1.1881 | 1.00 | 0.7138 | 0.1881 | 4.5148 | 1.1894 | 4.5148 | 5 | 5 | 5 |
| lose ground | 4.8783 | 5.7205 | 1.1726 | 1.00 | 0.8422 | 0.1726 | 5.4627 | 1.1198 | 5.4627 | 5 | 5 | 5 |
| bite the dust | 4.8817 | 5.7048 | 1.1686 | 1.00 | 0.8231 | 0.1686 | 5.5486 | 1.1366 | 5.5486 | 5 | 5 | 5 |
| make waves | 4.0999 | 4.7212 | 1.1515 | 0.60 | 0.6213 | 0.1515 | 6.2827 | 1.5324 | +inf | 5 | 5 | 5 |
| call the shots | 3.9929 | 4.5543 | 1.1406 | 1.00 | 0.5614 | 0.1406 | 4.8957 | 1.2261 | 4.8957 | 5 | 5 | 5 |
| pull strings | 4.3375 | 4.9398 | 1.1389 | 1.00 | 0.6023 | 0.1389 | 5.5422 | 1.2778 | 5.5422 | 5 | 5 | 5 |
| clear the air | 4.4632 | 5.0560 | 1.1328 | 1.00 | 0.5928 | 0.1328 | 5.3052 | 1.1886 | 5.3052 | 5 | 5 | 5 |
| run the show | 4.6156 | 5.0782 | 1.1002 | 1.00 | 0.4626 | 0.1002 | 5.7849 | 1.2533 | 5.7849 | 5 | 5 | 5 |
| lead the field | 4.5768 | 5.0151 | 1.0958 | 1.00 | 0.4383 | 0.0958 | 6.0204 | 1.3154 | 6.0204 | 5 | 5 | 5 |
| turn tail | 5.5550 | 6.0681 | 1.0924 | 1.00 | 0.5131 | 0.0924 | 6.5163 | 1.1730 | 6.5163 | 5 | 5 | 5 |
| raise hell | 4.3943 | 4.7578 | 1.0827 | 1.00 | 0.3635 | 0.0827 | 5.6908 | 1.2951 | 5.6908 | 5 | 5 | 5 |
| mean business | 5.6030 | 5.8590 | 1.0457 | 1.00 | 0.2560 | 0.0457 | 7.3828 | 1.3177 | 7.3828 | 5 | 5 | 5 |
| have a ball | 4.1915 | 4.3807 | 1.0451 | 0.80 | 0.1892 | 0.0451 | 6.4295 | 1.5340 | +inf | 5 | 5 | 5 |
| get the sack | 5.1577 | 5.3506 | 1.0374 | 0.60 | 0.1930 | 0.0374 | 7.7969 | 1.5117 | +inf | 5 | 5 | 5 |

#### Non-idioms (sorted by H_u/H, descending)

| phrase | H(p) | H_u | H_u/H | syn_frac | H_s^log | H_s^log/H | H_s^reg | H_s^reg/H | H_s (orig) | n_idiom | n_head | n_non |
|---|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|
| call the police | 4.0781 | 4.6026 | 1.1286 | 1.00 | 0.5245 | 0.1286 | 5.1008 | 1.2508 | 5.1008 | 5 | 5 | 5 |
| cut hair | 4.3999 | 4.8493 | 1.1021 | 1.00 | 0.4494 | 0.1021 | 5.4592 | 1.2407 | 5.4592 | 5 | 5 | 5 |
| raise children | 4.5984 | 5.0256 | 1.0929 | 1.00 | 0.4272 | 0.0929 | 5.7405 | 1.2484 | 5.7405 | 5 | 5 | 5 |
| turn dials | 5.4390 | 5.8738 | 1.0799 | 0.80 | 0.4348 | 0.0799 | 7.0690 | 1.2997 | +inf | 5 | 5 | 5 |
| tie knots | 4.8798 | 5.2406 | 1.0740 | 1.00 | 0.3609 | 0.0740 | 6.5126 | 1.3346 | 6.5126 | 5 | 5 | 5 |
| throw a ball | 3.8735 | 4.1539 | 1.0724 | 1.00 | 0.2804 | 0.0724 | 5.3512 | 1.3815 | 5.3512 | 5 | 5 | 5 |
| eat the apple | 4.6419 | 4.9498 | 1.0663 | 0.80 | 0.3080 | 0.0663 | 6.8644 | 1.4788 | +inf | 5 | 5 | 5 |
| build the boat | 3.9114 | 4.1661 | 1.0651 | 0.80 | 0.2547 | 0.0651 | 6.4600 | 1.6516 | +inf | 5 | 5 | 5 |
| make lunch | 4.7423 | 5.0182 | 1.0582 | 0.60 | 0.2759 | 0.0582 | 7.3028 | 1.5399 | +inf | 5 | 5 | 5 |
| break the window | 4.2282 | 4.4718 | 1.0576 | 1.00 | 0.2437 | 0.0576 | 6.2036 | 1.4672 | 6.2036 | 5 | 5 | 5 |
| see the show | 4.7014 | 4.9533 | 1.0536 | 0.40 | 0.2519 | 0.0536 | 7.7711 | 1.6529 | +inf | 5 | 5 | 5 |
| clear the table | 4.6220 | 4.8540 | 1.0502 | 1.00 | 0.2320 | 0.0502 | 6.5228 | 1.4112 | 6.5228 | 5 | 5 | 5 |
| get a present | 4.5308 | 4.7382 | 1.0458 | 0.80 | 0.2074 | 0.0458 | 6.7320 | 1.4858 | +inf | 5 | 5 | 5 |
| spill the water | 4.2194 | 4.4027 | 1.0434 | 1.00 | 0.1832 | 0.0434 | 6.2459 | 1.4803 | 6.2459 | 5 | 5 | 5 |
| lead the meeting | 5.1811 | 5.3545 | 1.0335 | 0.40 | 0.1734 | 0.0335 | 8.6444 | 1.6684 | +inf | 5 | 5 | 5 |
| lose keys | 5.4762 | 5.6159 | 1.0255 | 0.80 | 0.1397 | 0.0255 | 7.8671 | 1.4366 | +inf | 5 | 5 | 5 |
| remember details | 5.7302 | 5.8030 | 1.0127 | 0.40 | 0.0727 | 0.0127 | 9.2324 | 1.6112 | +inf | 5 | 5 | 5 |
| strike a drum | 5.9359 | 5.9359 | 1.0000 | 0.00 | 0.0000 | 0.0000 | 10.5411 | 1.7758 | +inf | 5 | 5 | 5 |

