# H_u / H_s analysis — gpt2 · full · joint

Idiom vs. parallel literal-VP (non-idiom) datasets. All entropies are uniform MC averages over each phrase's observed contexts; scores are unnormalized geometric-mean (or joint) per-token LM probabilities. See `README.md` / `code/SCORING_MATH.md` for the derivation.

> **How to read these magnitudes** (directions, why H_s is +inf, the new finite synergy metrics): see [`INTERPRETATION.md`](../INTERPRETATION.md). Quick key — ↑`H_u/H(p)`, ↑`syn_frac`, ↑`H_s^log` mean **more** synergy; ↑`H_s^reg` and the original ↑`H_s` mean **less** synergy.

## Configuration

| field | idioms run | non-idioms run |
|---|---|---|
| model | gpt2 | gpt2 |
| reduction | joint | joint |
| medial_only | False | False |
| dtype | float32 | float32 |
| num_idioms | 18 | 18 |
| dataset | /home/prada/PID_evaluation/data/dataset.tsv | /home/prada/PID_evaluation/data/nonidioms_dataset.tsv |

Bound `H_u + H_s ≥ 2H(p) + 2log2 ≥ H(p)` holds for 18/18 idioms and 18/18 non-idioms.

`H_s = +inf` (≥1 non-synergistic slot) for 14/18 idioms and 16/18 non-idioms. A *finite* H_s means **every** context of that phrase is synergistic (p > max(q,r) everywhere).

## Per-metric summary (idioms vs non-idioms)

Means and 95% bootstrap CIs (20k resamples, percentile method, phrase-level). Non-finite values are dropped per metric before bootstrapping; the drop count is shown.

### H(p)

*base entropy, uniform MC average over contexts (nats). ↓ smaller = idiom more concentrated*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 56.3263 | 56.1005 | 45.0206 | 67.3161 | [53.6266, 58.9655] |
| non-idioms | 18 | 0 | 60.4903 | 60.3255 | 52.7852 | 69.0433 | [58.2734, 62.6910] |

**Cross-dataset gap** (idioms − non-idioms): Δ = -4.1640  (95% CI [-7.6259, -0.6482]) → idioms < non-idioms, **significant**.

### H_u

*unique / redundant entropy = -log min{p, max(q,r)} (nats); >= H(p)*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 60.4238 | 59.3060 | 50.8576 | 71.4938 | [57.8893, 63.0156] |
| non-idioms | 18 | 0 | 61.5647 | 60.7785 | 52.8563 | 69.7779 | [59.4255, 63.6536] |

**Cross-dataset gap** (idioms − non-idioms): Δ = -1.1409  (95% CI [-4.3791, 2.2842]) → idioms < non-idioms, not significant.

### H_u / H(p)

*unique-information ratio (>= 1). ↑ bigger = MORE synergy. THE headline metric*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 1.0752 | 1.0617 | 1.0174 | 1.2093 | [1.0544, 1.0994] |
| non-idioms | 18 | 0 | 1.0183 | 1.0140 | 1.0000 | 1.0702 | [1.0115, 1.0266] |

**Cross-dataset gap** (idioms − non-idioms): Δ = 0.0569  (95% CI [0.0344, 0.0817]) → idioms > non-idioms, **significant**.

### syn_frac

*synergy coverage in [0,1] = frac. of contexts with p>m. ↑ bigger = MORE synergy (most intuitive)*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 0.7500 | 0.7500 | 0.3000 | 1.0000 | [0.6556, 0.8444] |
| non-idioms | 18 | 0 | 0.4111 | 0.3500 | 0.0000 | 1.0000 | [0.2889, 0.5444] |

**Cross-dataset gap** (idioms − non-idioms): Δ = 0.3389  (95% CI [0.1778, 0.4944]) → idioms > non-idioms, **significant**.

### H_s^log

*log-space synergy = mean max{0, log p - log m} (nats). ↑ bigger = MORE synergy; finite always*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 4.0975 | 3.7436 | 1.1467 | 9.6125 | [3.0468, 5.2358] |
| non-idioms | 18 | 0 | 1.0744 | 0.8667 | 0.0000 | 3.7509 | [0.7040, 1.5195] |

**Cross-dataset gap** (idioms − non-idioms): Δ = 3.0231  (95% CI [1.8949, 4.2233]) → idioms > non-idioms, **significant**.

### H_s^log / H(p)

*log-space synergy ratio. ↑ bigger = MORE synergy*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 0.0752 | 0.0617 | 0.0174 | 0.2093 | [0.0544, 0.0994] |
| non-idioms | 18 | 0 | 0.0183 | 0.0140 | 0.0000 | 0.0702 | [0.0115, 0.0266] |

**Cross-dataset gap** (idioms − non-idioms): Δ = 0.0569  (95% CI [0.0344, 0.0817]) → idioms > non-idioms, **significant**.

### H_s^log signed

*signed log-space synergy = mean(log p - log m); can be negative (net anti-synergistic)*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 3.0399 | 3.0561 | -3.7746 | 9.6125 | [1.4945, 4.5802] |
| non-idioms | 18 | 0 | -1.4432 | -1.0779 | -6.4891 | 3.7509 | [-2.6016, -0.2619] |

**Cross-dataset gap** (idioms − non-idioms): Δ = 4.4831  (95% CI [2.5135, 6.4315]) → idioms > non-idioms, **significant**.

### H_s^reg

*regularized H_s (eps-floored, finite, continuous). ↑ bigger = LESS synergy*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 57.5829 | 57.4220 | 45.0286 | 68.2764 | [54.7304, 60.3305] |
| non-idioms | 18 | 0 | 63.3590 | 63.4881 | 53.7368 | 72.4500 | [61.0561, 65.6509] |

**Cross-dataset gap** (idioms − non-idioms): Δ = -5.7760  (95% CI [-9.4496, -2.1157]) → idioms < non-idioms, **significant**.

### H_s^reg / H(p)

*regularized synergy ratio (>= 1). ↑ bigger = LESS synergy*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 1.0221 | 1.0224 | 1.0001 | 1.0569 | [1.0142, 1.0302] |
| non-idioms | 18 | 0 | 1.0477 | 1.0470 | 1.0026 | 1.0871 | [1.0378, 1.0570] |

**Cross-dataset gap** (idioms − non-idioms): Δ = -0.0256  (95% CI [-0.0379, -0.0128]) → idioms < non-idioms, **significant**.

### H_s (original)

*synergy entropy = -log max{0, p - max(q,r)} (nats); +inf if ANY slot non-synergistic (mostly +inf)*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 4 | 14 | 52.4542 | 52.1450 | 45.0286 | 60.4984 | [45.4843, 59.4242] |
| non-idioms | 2 | 16 | 56.0047 | 56.0047 | 53.7368 | 58.2726 | [53.7368, 58.2726] |

**Cross-dataset gap** (idioms − non-idioms): Δ = -3.5505  (95% CI [-10.7483, 3.4195]) → idioms < non-idioms, not significant.

### H_s / H(p)

*original synergy ratio (mostly +inf; use H_s^log or syn_frac instead)*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 4 | 14 | 1.0004 | 1.0001 | 1.0001 | 1.0011 | [1.0001, 1.0008] |
| non-idioms | 2 | 16 | 1.0040 | 1.0040 | 1.0026 | 1.0055 | [1.0026, 1.0055] |

**Cross-dataset gap** (idioms − non-idioms): Δ = -0.0037  (95% CI [-0.0054, -0.0020]) → idioms < non-idioms, **significant**.

## Per-phrase detail

#### Idioms (sorted by H_u/H, descending)

| phrase | H(p) | H_u | H_u/H | syn_frac | H_s^log | H_s^log/H | H_s^reg | H_s^reg/H | H_s (orig) | n_idiom | n_head | n_non |
|---|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|
| strike a chord | 45.9352 | 55.5476 | 1.2093 | 1.00 | 9.6125 | 0.2093 | 45.9400 | 1.0001 | 45.9400 | 10 | 10 | 10 |
| spill the beans | 58.3437 | 65.9718 | 1.1307 | 1.00 | 7.6281 | 0.1307 | 58.3500 | 1.0001 | 58.3500 | 10 | 10 | 10 |
| rock the boat | 45.0206 | 50.8576 | 1.1297 | 1.00 | 5.8370 | 0.1297 | 45.0286 | 1.0002 | 45.0286 | 10 | 10 | 10 |
| break the mold | 59.4886 | 66.5847 | 1.1193 | 0.90 | 7.0961 | 0.1193 | 59.9647 | 1.0080 | +inf | 10 | 10 | 10 |
| cut corners | 50.5275 | 56.4576 | 1.1174 | 0.90 | 5.9302 | 0.1174 | 50.9924 | 1.0092 | +inf | 10 | 10 | 10 |
| clear the air | 60.4347 | 65.8202 | 1.0891 | 1.00 | 5.3855 | 0.0891 | 60.4984 | 1.0011 | 60.4984 | 10 | 10 | 10 |
| lose ground | 54.8805 | 59.5751 | 1.0855 | 0.70 | 4.6947 | 0.0855 | 56.4721 | 1.0290 | +inf | 10 | 10 | 10 |
| turn tail | 62.0319 | 66.4070 | 1.0705 | 0.80 | 4.3751 | 0.0705 | 62.9765 | 1.0152 | +inf | 10 | 10 | 10 |
| bite the dust | 67.3161 | 71.4938 | 1.0621 | 0.90 | 4.1777 | 0.0621 | 67.8642 | 1.0081 | +inf | 10 | 10 | 10 |
| call the shots | 54.0049 | 57.3144 | 1.0613 | 0.70 | 3.3095 | 0.0613 | 55.6537 | 1.0305 | +inf | 10 | 10 | 10 |
| raise hell | 55.5071 | 58.4753 | 1.0535 | 0.80 | 2.9682 | 0.0535 | 56.4939 | 1.0178 | +inf | 10 | 10 | 10 |
| run the show | 53.7321 | 56.0029 | 1.0423 | 0.60 | 2.2708 | 0.0423 | 55.6953 | 1.0365 | +inf | 10 | 10 | 10 |
| make waves | 56.6938 | 59.0369 | 1.0413 | 0.30 | 2.3430 | 0.0413 | 59.9217 | 1.0569 | +inf | 10 | 10 | 10 |
| pull strings | 60.0803 | 62.4802 | 1.0399 | 0.50 | 2.3999 | 0.0399 | 62.4227 | 1.0390 | +inf | 10 | 10 | 10 |
| lead the field | 53.1601 | 54.8042 | 1.0309 | 0.50 | 1.6441 | 0.0309 | 55.6922 | 1.0476 | +inf | 10 | 10 | 10 |
| mean business | 51.4557 | 52.8527 | 1.0271 | 0.70 | 1.3970 | 0.0271 | 53.2357 | 1.0346 | +inf | 10 | 10 | 10 |
| have a ball | 59.4070 | 60.9457 | 1.0259 | 0.70 | 1.5387 | 0.0259 | 61.0138 | 1.0270 | +inf | 10 | 10 | 10 |
| get the sack | 65.8546 | 67.0013 | 1.0174 | 0.50 | 1.1467 | 0.0174 | 68.2764 | 1.0368 | +inf | 10 | 10 | 10 |

#### Non-idioms (sorted by H_u/H, descending)

| phrase | H(p) | H_u | H_u/H | syn_frac | H_s^log | H_s^log/H | H_s^reg | H_s^reg/H | H_s (orig) | n_idiom | n_head | n_non |
|---|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|
| call the police | 53.4431 | 57.1940 | 1.0702 | 1.00 | 3.7509 | 0.0702 | 53.7368 | 1.0055 | 53.7368 | 10 | 10 | 10 |
| cut hair | 58.1215 | 60.7259 | 1.0448 | 1.00 | 2.6044 | 0.0448 | 58.2726 | 1.0026 | 58.2726 | 10 | 10 | 10 |
| raise children | 58.2477 | 60.1160 | 1.0321 | 0.50 | 1.8683 | 0.0321 | 60.6978 | 1.0421 | +inf | 10 | 10 | 10 |
| see the show | 52.7852 | 54.0456 | 1.0239 | 0.30 | 1.2604 | 0.0239 | 56.0237 | 1.0614 | +inf | 10 | 10 | 10 |
| tie knots | 64.9926 | 66.2917 | 1.0200 | 0.60 | 1.2991 | 0.0200 | 67.1243 | 1.0328 | +inf | 10 | 10 | 10 |
| build the boat | 56.2274 | 57.3454 | 1.0199 | 0.30 | 1.1180 | 0.0199 | 59.5492 | 1.0591 | +inf | 10 | 10 | 10 |
| clear the table | 61.1383 | 62.1429 | 1.0164 | 0.50 | 1.0046 | 0.0164 | 63.7515 | 1.0427 | +inf | 10 | 10 | 10 |
| spill the water | 66.1158 | 67.1812 | 1.0161 | 0.50 | 1.0654 | 0.0161 | 68.5536 | 1.0369 | +inf | 10 | 10 | 10 |
| throw a ball | 59.9662 | 60.8311 | 1.0144 | 0.30 | 0.8649 | 0.0144 | 63.2246 | 1.0543 | +inf | 10 | 10 | 10 |
| break the window | 64.4338 | 65.3024 | 1.0135 | 0.40 | 0.8685 | 0.0135 | 67.3094 | 1.0446 | +inf | 10 | 10 | 10 |
| get a present | 61.5827 | 62.3773 | 1.0129 | 0.50 | 0.7946 | 0.0129 | 64.3152 | 1.0444 | +inf | 10 | 10 | 10 |
| make lunch | 59.7465 | 60.3976 | 1.0109 | 0.30 | 0.6512 | 0.0109 | 63.0233 | 1.0548 | +inf | 10 | 10 | 10 |
| eat the apple | 65.6941 | 66.4020 | 1.0108 | 0.60 | 0.7079 | 0.0108 | 68.0317 | 1.0356 | +inf | 10 | 10 | 10 |
| turn dials | 69.0433 | 69.7779 | 1.0106 | 0.30 | 0.7346 | 0.0106 | 72.4500 | 1.0493 | +inf | 10 | 10 | 10 |
| lead the meeting | 57.1221 | 57.4809 | 1.0063 | 0.10 | 0.3588 | 0.0063 | 61.2695 | 1.0726 | +inf | 10 | 10 | 10 |
| lose keys | 66.6246 | 67.0112 | 1.0058 | 0.20 | 0.3866 | 0.0058 | 70.3764 | 1.0563 | +inf | 10 | 10 | 10 |
| remember details | 52.8563 | 52.8563 | 1.0000 | 0.00 | 0.0000 | 0.0000 | 57.4614 | 1.0871 | +inf | 10 | 10 | 10 |
| strike a drum | 60.6848 | 60.6848 | 1.0000 | 0.00 | 0.0000 | 0.0000 | 65.2900 | 1.0759 | +inf | 10 | 10 | 10 |

