# H_u / H_s analysis — gpt2 · full · geo

Idiom vs. parallel literal-VP (non-idiom) datasets. All entropies are uniform MC averages over each phrase's observed contexts; scores are unnormalized geometric-mean (or joint) per-token LM probabilities. See `README.md` / `code/SCORING_MATH.md` for the derivation.

> **How to read these magnitudes** (directions, why H_s is +inf, the new finite synergy metrics): see [`INTERPRETATION.md`](../INTERPRETATION.md). Quick key — ↑`H_u/H(p)`, ↑`syn_frac`, ↑`H_s^log` mean **more** synergy; ↑`H_s^reg` and the original ↑`H_s` mean **less** synergy.

## Configuration

| field | idioms run | non-idioms run |
|---|---|---|
| model | gpt2 | gpt2 |
| reduction | geometric_mean | geometric_mean |
| medial_only | False | False |
| dtype | float32 | float32 |
| num_idioms | 18 | 18 |
| dataset | /home/prada/PID_evaluation/data/dataset.tsv | /home/prada/PID_evaluation/data/nonidioms_dataset.tsv |

Bound `H_u + H_s ≥ 2H(p) + 2log2 ≥ H(p)` holds for 18/18 idioms and 18/18 non-idioms.

`H_s = +inf` (≥1 non-synergistic slot) for 13/18 idioms and 17/18 non-idioms. A *finite* H_s means **every** context of that phrase is synergistic (p > max(q,r) everywhere).

## Per-metric summary (idioms vs non-idioms)

Means and 95% bootstrap CIs (20k resamples, percentile method, phrase-level). Non-finite values are dropped per metric before bootstrapping; the drop count is shown.

### H(p)

*base entropy, uniform MC average over contexts (nats). ↓ smaller = idiom more concentrated*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 4.7485 | 4.7760 | 3.7820 | 5.4572 | [4.5291, 4.9631] |
| non-idioms | 18 | 0 | 5.1244 | 4.9666 | 4.2651 | 6.4036 | [4.8742, 5.3975] |

**Cross-dataset gap** (idioms − non-idioms): Δ = -0.3759  (95% CI [-0.7234, -0.0432]) → idioms < non-idioms, **significant**.

### H_u

*unique / redundant entropy = -log min{p, max(q,r)} (nats); >= H(p)*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 5.2552 | 5.2806 | 4.5056 | 5.9578 | [5.0527, 5.4626] |
| non-idioms | 18 | 0 | 5.3554 | 5.2029 | 4.6912 | 6.4231 | [5.1357, 5.5962] |

**Cross-dataset gap** (idioms − non-idioms): Δ = -0.1002  (95% CI [-0.4117, 0.2038]) → idioms < non-idioms, not significant.

### H_u / H(p)

*unique-information ratio (>= 1). ↑ bigger = MORE synergy. THE headline metric*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 1.1097 | 1.0917 | 1.0346 | 1.2623 | [1.0864, 1.1366] |
| non-idioms | 18 | 0 | 1.0471 | 1.0485 | 1.0031 | 1.1025 | [1.0372, 1.0574] |

**Cross-dataset gap** (idioms − non-idioms): Δ = 0.0626  (95% CI [0.0369, 0.0913]) → idioms > non-idioms, **significant**.

### syn_frac

*synergy coverage in [0,1] = frac. of contexts with p>m. ↑ bigger = MORE synergy (most intuitive)*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 0.8056 | 0.9000 | 0.3000 | 1.0000 | [0.7111, 0.8889] |
| non-idioms | 18 | 0 | 0.6167 | 0.6500 | 0.3000 | 1.0000 | [0.5333, 0.7000] |

**Cross-dataset gap** (idioms − non-idioms): Δ = 0.1889  (95% CI [0.0611, 0.3111]) → idioms > non-idioms, **significant**.

### H_s^log

*log-space synergy = mean max{0, log p - log m} (nats). ↑ bigger = MORE synergy; finite always*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 0.5067 | 0.4614 | 0.1766 | 1.2079 | [0.4095, 0.6208] |
| non-idioms | 18 | 0 | 0.2310 | 0.2428 | 0.0195 | 0.4374 | [0.1879, 0.2744] |

**Cross-dataset gap** (idioms − non-idioms): Δ = 0.2757  (95% CI [0.1684, 0.3977]) → idioms > non-idioms, **significant**.

### H_s^log / H(p)

*log-space synergy ratio. ↑ bigger = MORE synergy*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 0.1097 | 0.0917 | 0.0346 | 0.2623 | [0.0864, 0.1366] |
| non-idioms | 18 | 0 | 0.0471 | 0.0485 | 0.0031 | 0.1025 | [0.0372, 0.0574] |

**Cross-dataset gap** (idioms − non-idioms): Δ = 0.0626  (95% CI [0.0369, 0.0913]) → idioms > non-idioms, **significant**.

### H_s^log signed

*signed log-space synergy = mean(log p - log m); can be negative (net anti-synergistic)*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 0.4327 | 0.3830 | -0.0442 | 1.2079 | [0.3031, 0.5733] |
| non-idioms | 18 | 0 | 0.0683 | 0.1663 | -0.3340 | 0.4374 | [-0.0315, 0.1639] |

**Cross-dataset gap** (idioms − non-idioms): Δ = 0.3643  (95% CI [0.2018, 0.5367]) → idioms > non-idioms, **significant**.

### H_s^reg

*regularized H_s (eps-floored, finite, continuous). ↑ bigger = LESS synergy*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 6.5112 | 6.6684 | 4.5224 | 8.3913 | [5.9870, 7.0217] |
| non-idioms | 18 | 0 | 7.8842 | 7.5951 | 5.8673 | 10.5709 | [7.4022, 8.4033] |

**Cross-dataset gap** (idioms − non-idioms): Δ = -1.3729  (95% CI [-2.1087, -0.6665]) → idioms < non-idioms, **significant**.

### H_s^reg / H(p)

*regularized synergy ratio (>= 1). ↑ bigger = LESS synergy*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 1.3653 | 1.3786 | 1.0940 | 1.6635 | [1.2964, 1.4357] |
| non-idioms | 18 | 0 | 1.5358 | 1.5286 | 1.3756 | 1.6673 | [1.4960, 1.5748] |

**Cross-dataset gap** (idioms − non-idioms): Δ = -0.1705  (95% CI [-0.2498, -0.0893]) → idioms < non-idioms, **significant**.

### H_s (original)

*synergy entropy = -log max{0, p - max(q,r)} (nats); +inf if ANY slot non-synergistic (mostly +inf)*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 5 | 13 | 5.0187 | 5.0788 | 4.5224 | 5.2950 | [4.7612, 5.2141] |
| non-idioms | 1 | 17 | 5.8947 | 5.8947 | 5.8947 | 5.8947 | [5.8947, 5.8947] |

**Cross-dataset gap** (idioms − non-idioms): Δ = -0.8761  (95% CI [-1.1335, -0.6862]) → idioms < non-idioms, **significant**.

### H_s / H(p)

*original synergy ratio (mostly +inf; use H_s^log or syn_frac instead)*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 5 | 13 | 1.1940 | 1.2121 | 1.0940 | 1.2379 | [1.1431, 1.2297] |
| non-idioms | 1 | 17 | 1.3821 | 1.3821 | 1.3821 | 1.3821 | [1.3821, 1.3821] |

**Cross-dataset gap** (idioms − non-idioms): Δ = -0.1881  (95% CI [-0.2389, -0.1524]) → idioms < non-idioms, **significant**.

## Per-phrase detail

#### Idioms (sorted by H_u/H, descending)

| phrase | H(p) | H_u | H_u/H | syn_frac | H_s^log | H_s^log/H | H_s^reg | H_s^reg/H | H_s (orig) | n_idiom | n_head | n_non |
|---|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|
| strike a chord | 4.6043 | 5.8121 | 1.2623 | 1.00 | 1.2079 | 0.2623 | 5.0371 | 1.0940 | 5.0371 | 10 | 10 | 10 |
| rock the boat | 3.7820 | 4.5056 | 1.1913 | 1.00 | 0.7236 | 0.1913 | 4.5224 | 1.1958 | 4.5224 | 10 | 10 | 10 |
| cut corners | 4.2571 | 4.9383 | 1.1600 | 1.00 | 0.6812 | 0.1600 | 5.1602 | 1.2121 | 5.1602 | 10 | 10 | 10 |
| break the mold | 4.4252 | 5.0867 | 1.1495 | 0.90 | 0.6615 | 0.1495 | 5.6251 | 1.2711 | +inf | 10 | 10 | 10 |
| spill the beans | 4.1027 | 4.7103 | 1.1481 | 1.00 | 0.6076 | 0.1481 | 5.0788 | 1.2379 | 5.0788 | 10 | 10 | 10 |
| clear the air | 4.3037 | 4.8475 | 1.1264 | 1.00 | 0.5438 | 0.1264 | 5.2950 | 1.2303 | 5.2950 | 10 | 10 | 10 |
| bite the dust | 5.2661 | 5.8816 | 1.1169 | 0.90 | 0.6156 | 0.1169 | 6.6705 | 1.2667 | +inf | 10 | 10 | 10 |
| lose ground | 5.3155 | 5.9260 | 1.1149 | 0.60 | 0.6105 | 0.1149 | 7.4393 | 1.3996 | +inf | 10 | 10 | 10 |
| turn tail | 5.4572 | 5.9578 | 1.0917 | 0.90 | 0.5006 | 0.0917 | 6.7870 | 1.2437 | +inf | 10 | 10 | 10 |
| pull strings | 4.6036 | 5.0259 | 1.0917 | 0.90 | 0.4223 | 0.0917 | 6.4115 | 1.3927 | +inf | 10 | 10 | 10 |
| call the shots | 4.3198 | 4.7078 | 1.0898 | 0.70 | 0.3880 | 0.0898 | 6.6018 | 1.5283 | +inf | 10 | 10 | 10 |
| make waves | 5.0444 | 5.4274 | 1.0759 | 0.30 | 0.3830 | 0.0759 | 8.3913 | 1.6635 | +inf | 10 | 10 | 10 |
| run the show | 4.9477 | 5.2878 | 1.0687 | 0.70 | 0.3401 | 0.0687 | 7.2032 | 1.4559 | +inf | 10 | 10 | 10 |
| lead the field | 5.3148 | 5.6558 | 1.0642 | 0.70 | 0.3410 | 0.0642 | 7.7787 | 1.4636 | +inf | 10 | 10 | 10 |
| mean business | 5.1214 | 5.4457 | 1.0633 | 0.90 | 0.3243 | 0.0633 | 6.9886 | 1.3646 | +inf | 10 | 10 | 10 |
| raise hell | 4.9787 | 5.2918 | 1.0629 | 0.70 | 0.3131 | 0.0629 | 7.4357 | 1.4935 | +inf | 10 | 10 | 10 |
| have a ball | 4.5320 | 4.8113 | 1.0616 | 0.80 | 0.2793 | 0.0616 | 6.6663 | 1.4709 | +inf | 10 | 10 | 10 |
| get the sack | 5.0967 | 5.2733 | 1.0346 | 0.50 | 0.1766 | 0.0346 | 8.1100 | 1.5912 | +inf | 10 | 10 | 10 |

#### Non-idioms (sorted by H_u/H, descending)

| phrase | H(p) | H_u | H_u/H | syn_frac | H_s^log | H_s^log/H | H_s^reg | H_s^reg/H | H_s (orig) | n_idiom | n_head | n_non |
|---|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|
| call the police | 4.2651 | 4.7025 | 1.1025 | 1.00 | 0.4374 | 0.1025 | 5.8673 | 1.3756 | 5.8947 | 10 | 10 | 10 |
| throw a ball | 4.5392 | 4.8466 | 1.0677 | 0.70 | 0.3074 | 0.0677 | 6.8715 | 1.5138 | +inf | 10 | 10 | 10 |
| cut hair | 4.7005 | 5.0047 | 1.0647 | 0.60 | 0.3043 | 0.0647 | 7.2421 | 1.5407 | +inf | 10 | 10 | 10 |
| build the boat | 4.7184 | 5.0046 | 1.0607 | 0.70 | 0.2862 | 0.0607 | 7.2766 | 1.5422 | +inf | 10 | 10 | 10 |
| raise children | 5.0207 | 5.3135 | 1.0583 | 0.60 | 0.2929 | 0.0583 | 7.6139 | 1.5165 | +inf | 10 | 10 | 10 |
| clear the table | 4.7630 | 5.0312 | 1.0563 | 0.80 | 0.2682 | 0.0563 | 6.8929 | 1.4472 | +inf | 10 | 10 | 10 |
| turn dials | 5.6042 | 5.9196 | 1.0563 | 0.70 | 0.3154 | 0.0563 | 7.8184 | 1.3951 | +inf | 10 | 10 | 10 |
| eat the apple | 5.0016 | 5.2745 | 1.0546 | 0.80 | 0.2730 | 0.0546 | 7.3605 | 1.4716 | +inf | 10 | 10 | 10 |
| break the window | 4.8572 | 5.1055 | 1.0511 | 0.70 | 0.2483 | 0.0511 | 7.2106 | 1.4845 | +inf | 10 | 10 | 10 |
| tie knots | 5.1738 | 5.4110 | 1.0459 | 0.80 | 0.2373 | 0.0459 | 7.6003 | 1.4690 | +inf | 10 | 10 | 10 |
| spill the water | 4.4907 | 4.6912 | 1.0446 | 0.50 | 0.2005 | 0.0446 | 7.3661 | 1.6403 | +inf | 10 | 10 | 10 |
| make lunch | 5.3989 | 5.6190 | 1.0408 | 0.30 | 0.2200 | 0.0408 | 8.8529 | 1.6397 | +inf | 10 | 10 | 10 |
| get a present | 4.8749 | 5.0731 | 1.0407 | 0.70 | 0.1982 | 0.0407 | 7.5899 | 1.5570 | +inf | 10 | 10 | 10 |
| see the show | 4.9316 | 5.1312 | 1.0405 | 0.40 | 0.1996 | 0.0405 | 8.2224 | 1.6673 | +inf | 10 | 10 | 10 |
| lead the meeting | 5.7041 | 5.8375 | 1.0234 | 0.50 | 0.1334 | 0.0234 | 9.2002 | 1.6129 | +inf | 10 | 10 | 10 |
| remember details | 5.5012 | 5.6140 | 1.0205 | 0.50 | 0.1128 | 0.0205 | 8.8527 | 1.6092 | +inf | 10 | 10 | 10 |
| lose keys | 6.2901 | 6.3936 | 1.0165 | 0.50 | 0.1035 | 0.0165 | 9.5061 | 1.5113 | +inf | 10 | 10 | 10 |
| strike a drum | 6.4036 | 6.4231 | 1.0031 | 0.30 | 0.0195 | 0.0031 | 10.5709 | 1.6508 | +inf | 10 | 10 | 10 |

