# H_u / H_s analysis — gpt2 · medial · joint

Idiom vs. parallel literal-VP (non-idiom) datasets. All entropies are uniform MC averages over each phrase's observed contexts; scores are unnormalized geometric-mean (or joint) per-token LM probabilities. See `README.md` / `code/SCORING_MATH.md` for the derivation.

> **How to read these magnitudes** (directions, why H_s is +inf, the new finite synergy metrics): see [`INTERPRETATION.md`](../INTERPRETATION.md). Quick key — ↑`H_u/H(p)`, ↑`syn_frac`, ↑`H_s^log` mean **more** synergy; ↑`H_s^reg` and the original ↑`H_s` mean **less** synergy.

## Configuration

| field | idioms run | non-idioms run |
|---|---|---|
| model | gpt2 | gpt2 |
| reduction | joint | joint |
| medial_only | True | True |
| dtype | float32 | float32 |
| num_idioms | 18 | 18 |
| dataset | /home/prada/PID_evaluation/data/dataset.tsv | /home/prada/PID_evaluation/data/nonidioms_dataset.tsv |

Bound `H_u + H_s ≥ 2H(p) + 2log2 ≥ H(p)` holds for 18/18 idioms and 18/18 non-idioms.

`H_s = +inf` (≥1 non-synergistic slot) for 8/18 idioms and 16/18 non-idioms. A *finite* H_s means **every** context of that phrase is synergistic (p > max(q,r) everywhere).

## Per-metric summary (idioms vs non-idioms)

Means and 95% bootstrap CIs (20k resamples, percentile method, phrase-level). Non-finite values are dropped per metric before bootstrapping; the drop count is shown.

### H(p)

*base entropy, uniform MC average over contexts (nats). ↓ smaller = idiom more concentrated*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 53.7838 | 54.5660 | 38.1421 | 67.9995 | [50.1247, 57.3834] |
| non-idioms | 18 | 0 | 56.8981 | 56.7745 | 47.1794 | 72.3043 | [53.9765, 59.9425] |

**Cross-dataset gap** (idioms − non-idioms): Δ = -3.1143  (95% CI [-7.7933, 1.5333]) → idioms < non-idioms, not significant.

### H_u

*unique / redundant entropy = -log min{p, max(q,r)} (nats); >= H(p)*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 59.2072 | 57.7657 | 43.4746 | 74.8318 | [55.5362, 62.9943] |
| non-idioms | 18 | 0 | 58.4748 | 59.9332 | 47.1794 | 74.7678 | [55.4896, 61.5377] |

**Cross-dataset gap** (idioms − non-idioms): Δ = 0.7324  (95% CI [-4.0480, 5.5562]) → idioms > non-idioms, not significant.

### H_u / H(p)

*unique-information ratio (>= 1). ↑ bigger = MORE synergy. THE headline metric*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 1.1050 | 1.0997 | 1.0186 | 1.3124 | [1.0753, 1.1394] |
| non-idioms | 18 | 0 | 1.0280 | 1.0222 | 1.0000 | 1.0937 | [1.0184, 1.0391] |

**Cross-dataset gap** (idioms − non-idioms): Δ = 0.0770  (95% CI [0.0456, 0.1127]) → idioms > non-idioms, **significant**.

### syn_frac

*synergy coverage in [0,1] = frac. of contexts with p>m. ↑ bigger = MORE synergy (most intuitive)*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 0.8222 | 1.0000 | 0.4000 | 1.0000 | [0.7111, 0.9222] |
| non-idioms | 18 | 0 | 0.5667 | 0.6000 | 0.0000 | 1.0000 | [0.4333, 0.6889] |

**Cross-dataset gap** (idioms − non-idioms): Δ = 0.2556  (95% CI [0.0889, 0.4222]) → idioms > non-idioms, **significant**.

### H_s^log

*log-space synergy = mean max{0, log p - log m} (nats). ↑ bigger = MORE synergy; finite always*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 5.4234 | 5.4605 | 0.9547 | 12.4974 | [4.0320, 6.8552] |
| non-idioms | 18 | 0 | 1.5767 | 1.2526 | 0.0000 | 5.1576 | [1.0456, 2.1888] |

**Cross-dataset gap** (idioms − non-idioms): Δ = 3.8467  (95% CI [2.3584, 5.3947]) → idioms > non-idioms, **significant**.

### H_s^log / H(p)

*log-space synergy ratio. ↑ bigger = MORE synergy*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 0.1050 | 0.0997 | 0.0186 | 0.3124 | [0.0753, 0.1394] |
| non-idioms | 18 | 0 | 0.0280 | 0.0222 | 0.0000 | 0.0937 | [0.0184, 0.0391] |

**Cross-dataset gap** (idioms − non-idioms): Δ = 0.0770  (95% CI [0.0456, 0.1127]) → idioms > non-idioms, **significant**.

### H_s^log signed

*signed log-space synergy = mean(log p - log m); can be negative (net anti-synergistic)*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 4.8815 | 5.4605 | -1.0043 | 12.4974 | [3.2252, 6.5642] |
| non-idioms | 18 | 0 | -0.0883 | 0.3476 | -4.4377 | 5.1576 | [-1.2456, 1.0750] |

**Cross-dataset gap** (idioms − non-idioms): Δ = 4.9698  (95% CI [2.9395, 7.0310]) → idioms > non-idioms, **significant**.

### H_s^reg

*regularized H_s (eps-floored, finite, continuous). ↑ bigger = LESS synergy*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 54.6881 | 54.9165 | 38.1627 | 68.3421 | [50.9729, 58.2399] |
| non-idioms | 18 | 0 | 59.0665 | 57.7988 | 50.3998 | 73.5221 | [56.1706, 62.0968] |

**Cross-dataset gap** (idioms − non-idioms): Δ = -4.3784  (95% CI [-9.1017, 0.1681]) → idioms < non-idioms, not significant.

### H_s^reg / H(p)

*regularized synergy ratio (>= 1). ↑ bigger = LESS synergy*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 1.0171 | 1.0053 | 1.0000 | 1.0600 | [1.0082, 1.0269] |
| non-idioms | 18 | 0 | 1.0390 | 1.0385 | 1.0010 | 1.0976 | [1.0285, 1.0503] |

**Cross-dataset gap** (idioms − non-idioms): Δ = -0.0218  (95% CI [-0.0363, -0.0077]) → idioms < non-idioms, **significant**.

### H_s (original)

*synergy entropy = -log max{0, p - max(q,r)} (nats); +inf if ANY slot non-synergistic (mostly +inf)*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 10 | 8 | 54.4430 | 54.9165 | 38.1627 | 68.3421 | [48.2674, 60.4389] |
| non-idioms | 2 | 16 | 54.9439 | 54.9439 | 54.4030 | 55.4849 | [54.4030, 55.4849] |

**Cross-dataset gap** (idioms − non-idioms): Δ = -0.5009  (95% CI [-6.7456, 5.5191]) → idioms < non-idioms, not significant.

### H_s / H(p)

*original synergy ratio (mostly +inf; use H_s^log or syn_frac instead)*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 10 | 8 | 1.0012 | 1.0001 | 1.0000 | 1.0055 | [1.0001, 1.0027] |
| non-idioms | 2 | 16 | 1.0045 | 1.0045 | 1.0010 | 1.0079 | [1.0010, 1.0079] |

**Cross-dataset gap** (idioms − non-idioms): Δ = -0.0033  (95% CI [-0.0077, 0.0011]) → idioms < non-idioms, not significant.

## Per-phrase detail

#### Idioms (sorted by H_u/H, descending)

| phrase | H(p) | H_u | H_u/H | syn_frac | H_s^log | H_s^log/H | H_s^reg | H_s^reg/H | H_s (orig) | n_idiom | n_head | n_non |
|---|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|
| strike a chord | 39.9997 | 52.4971 | 1.3124 | 1.00 | 12.4974 | 0.3124 | 39.9997 | 1.0000 | 39.9997 | 5 | 5 | 5 |
| spill the beans | 52.2943 | 61.6047 | 1.1780 | 1.00 | 9.3103 | 0.1780 | 52.2962 | 1.0000 | 52.2962 | 5 | 5 | 5 |
| cut corners | 46.7609 | 54.9800 | 1.1758 | 1.00 | 8.2191 | 0.1758 | 46.7636 | 1.0001 | 46.7636 | 5 | 5 | 5 |
| break the mold | 61.7466 | 70.8018 | 1.1467 | 1.00 | 9.0552 | 0.1467 | 61.7476 | 1.0000 | 61.7476 | 5 | 5 | 5 |
| rock the boat | 38.1421 | 43.4746 | 1.1398 | 1.00 | 5.3325 | 0.1398 | 38.1627 | 1.0005 | 38.1627 | 5 | 5 | 5 |
| lose ground | 55.2312 | 62.6339 | 1.1340 | 1.00 | 7.4027 | 0.1340 | 55.2370 | 1.0001 | 55.2370 | 5 | 5 | 5 |
| turn tail | 67.4779 | 74.8318 | 1.1090 | 1.00 | 7.3539 | 0.1090 | 67.4800 | 1.0000 | 67.4800 | 5 | 5 | 5 |
| call the shots | 54.2970 | 59.9429 | 1.1040 | 1.00 | 5.6460 | 0.1040 | 54.5961 | 1.0055 | 54.5961 | 5 | 5 | 5 |
| bite the dust | 59.7931 | 65.8201 | 1.1008 | 1.00 | 6.0270 | 0.1008 | 59.8048 | 1.0002 | 59.8048 | 5 | 5 | 5 |
| make waves | 45.0877 | 49.5339 | 1.0986 | 0.60 | 4.4462 | 0.0986 | 46.9369 | 1.0410 | +inf | 5 | 5 | 5 |
| clear the air | 67.9995 | 73.5880 | 1.0822 | 1.00 | 5.5885 | 0.0822 | 68.3421 | 1.0050 | 68.3421 | 5 | 5 | 5 |
| pull strings | 56.2171 | 60.7277 | 1.0802 | 0.80 | 4.5106 | 0.0802 | 57.1618 | 1.0168 | +inf | 5 | 5 | 5 |
| raise hell | 52.7186 | 56.1635 | 1.0653 | 0.80 | 3.4449 | 0.0653 | 53.7472 | 1.0195 | +inf | 5 | 5 | 5 |
| run the show | 54.8350 | 57.6054 | 1.0505 | 0.60 | 2.7704 | 0.0505 | 56.7926 | 1.0357 | +inf | 5 | 5 | 5 |
| lead the field | 51.7177 | 54.0441 | 1.0450 | 0.40 | 2.3264 | 0.0450 | 54.4896 | 1.0536 | +inf | 5 | 5 | 5 |
| get the sack | 56.4572 | 57.9259 | 1.0260 | 0.60 | 1.4687 | 0.0260 | 58.3749 | 1.0340 | +inf | 5 | 5 | 5 |
| have a ball | 56.0343 | 57.3002 | 1.0226 | 0.60 | 1.2660 | 0.0226 | 58.0745 | 1.0364 | +inf | 5 | 5 | 5 |
| mean business | 51.2986 | 52.2534 | 1.0186 | 0.40 | 0.9547 | 0.0186 | 54.3780 | 1.0600 | +inf | 5 | 5 | 5 |

#### Non-idioms (sorted by H_u/H, descending)

| phrase | H(p) | H_u | H_u/H | syn_frac | H_s^log | H_s^log/H | H_s^reg | H_s^reg/H | H_s (orig) | n_idiom | n_head | n_non |
|---|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|
| call the police | 55.0520 | 60.2095 | 1.0937 | 1.00 | 5.1576 | 0.0937 | 55.4849 | 1.0079 | 55.4849 | 5 | 5 | 5 |
| raise children | 55.8886 | 59.0173 | 1.0560 | 0.80 | 3.1287 | 0.0560 | 56.8329 | 1.0169 | +inf | 5 | 5 | 5 |
| cut hair | 54.3465 | 57.3731 | 1.0557 | 1.00 | 3.0265 | 0.0557 | 54.4030 | 1.0010 | 54.4030 | 5 | 5 | 5 |
| build the boat | 47.6256 | 49.4870 | 1.0391 | 0.40 | 1.8614 | 0.0391 | 50.3998 | 1.0583 | +inf | 5 | 5 | 5 |
| tie knots | 57.6605 | 59.7556 | 1.0363 | 0.80 | 2.0951 | 0.0363 | 58.7647 | 1.0192 | +inf | 5 | 5 | 5 |
| see the show | 52.2766 | 54.1114 | 1.0351 | 0.40 | 1.8348 | 0.0351 | 55.0438 | 1.0529 | +inf | 5 | 5 | 5 |
| turn dials | 72.3043 | 74.7678 | 1.0341 | 0.80 | 2.4635 | 0.0341 | 73.5221 | 1.0168 | +inf | 5 | 5 | 5 |
| throw a ball | 53.1128 | 54.4268 | 1.0247 | 0.60 | 1.3140 | 0.0247 | 55.1086 | 1.0376 | +inf | 5 | 5 | 5 |
| spill the water | 59.4695 | 60.9106 | 1.0242 | 0.80 | 1.4411 | 0.0242 | 60.7433 | 1.0214 | +inf | 5 | 5 | 5 |
| make lunch | 49.3417 | 50.3327 | 1.0201 | 0.60 | 0.9911 | 0.0201 | 51.3995 | 1.0417 | +inf | 5 | 5 | 5 |
| get a present | 49.7389 | 50.7161 | 1.0196 | 0.60 | 0.9772 | 0.0196 | 51.7620 | 1.0407 | +inf | 5 | 5 | 5 |
| break the window | 62.9167 | 64.1078 | 1.0189 | 0.60 | 1.1911 | 0.0189 | 64.9311 | 1.0320 | +inf | 5 | 5 | 5 |
| lose keys | 65.5953 | 66.6842 | 1.0166 | 0.60 | 1.0889 | 0.0166 | 67.8288 | 1.0340 | +inf | 5 | 5 | 5 |
| eat the apple | 59.6298 | 60.3640 | 1.0123 | 0.40 | 0.7342 | 0.0123 | 62.4626 | 1.0475 | +inf | 5 | 5 | 5 |
| lead the meeting | 59.5318 | 60.1108 | 1.0097 | 0.20 | 0.5790 | 0.0097 | 63.2273 | 1.0621 | +inf | 5 | 5 | 5 |
| clear the table | 60.9320 | 61.4282 | 1.0081 | 0.60 | 0.4962 | 0.0081 | 63.3287 | 1.0393 | +inf | 5 | 5 | 5 |
| remember details | 47.1794 | 47.1794 | 1.0000 | 0.00 | 0.0000 | 0.0000 | 51.7845 | 1.0976 | +inf | 5 | 5 | 5 |
| strike a drum | 61.5643 | 61.5643 | 1.0000 | 0.00 | 0.0000 | 0.0000 | 66.1694 | 1.0748 | +inf | 5 | 5 | 5 |