# H_u / H_s analysis — gemma2-9b · full · joint

Idiom vs. parallel literal-VP (non-idiom) datasets. All entropies are uniform MC averages over each phrase's observed contexts; scores are unnormalized geometric-mean (or joint) per-token LM probabilities. See `README.md` / `code/SCORING_MATH.md` for the derivation.

> **How to read these magnitudes** (directions, why H_s is +inf, the new finite synergy metrics): see [`INTERPRETATION.md`](../INTERPRETATION.md). Quick key — ↑`H_u/H(p)`, ↑`syn_frac`, ↑`H_s^log` mean **more** synergy; ↑`H_s^reg` and the original ↑`H_s` mean **less** synergy.

## Configuration

| field | idioms run | non-idioms run |
|---|---|---|
| model | google/gemma-2-9b | google/gemma-2-9b |
| reduction | joint | joint |
| medial_only | False | False |
| dtype | bfloat16 | bfloat16 |
| num_idioms | 18 | 18 |
| dataset | /home/prada/PID_evaluation/data/dataset.tsv | /home/prada/PID_evaluation/data/nonidioms_dataset.tsv |

Bound `H_u + H_s ≥ 2H(p) + 2log2 ≥ H(p)` holds for 18/18 idioms and 18/18 non-idioms.

`H_s = +inf` (≥1 non-synergistic slot) for 15/18 idioms and 17/18 non-idioms. A *finite* H_s means **every** context of that phrase is synergistic (p > max(q,r) everywhere).

## Per-metric summary (idioms vs non-idioms)

Means and 95% bootstrap CIs (20k resamples, percentile method, phrase-level). Non-finite values are dropped per metric before bootstrapping; the drop count is shown.

### H(p)

*base entropy, uniform MC average over contexts (nats). ↓ smaller = idiom more concentrated*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 59.1187 | 59.2694 | 50.7051 | 68.9618 | [56.5688, 61.7278] |
| non-idioms | 18 | 0 | 69.5920 | 69.2854 | 58.6982 | 82.4483 | [66.4577, 72.7840] |

**Cross-dataset gap** (idioms − non-idioms): Δ = -10.4733  (95% CI [-14.5435, -6.3820]) → idioms < non-idioms, **significant**.

### H_u

*unique / redundant entropy = -log min{p, max(q,r)} (nats); >= H(p)*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 66.6445 | 66.7160 | 58.5971 | 77.3982 | [64.2482, 69.1242] |
| non-idioms | 18 | 0 | 71.3859 | 69.8508 | 62.8302 | 85.0896 | [68.4636, 74.4549] |

**Cross-dataset gap** (idioms − non-idioms): Δ = -4.7414  (95% CI [-8.5615, -0.8655]) → idioms < non-idioms, **significant**.

### H_u / H(p)

*unique-information ratio (>= 1). ↑ bigger = MORE synergy. THE headline metric*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 1.1299 | 1.1224 | 1.0379 | 1.2027 | [1.1080, 1.1514] |
| non-idioms | 18 | 0 | 1.0268 | 1.0200 | 1.0000 | 1.0827 | [1.0171, 1.0378] |

**Cross-dataset gap** (idioms − non-idioms): Δ = 0.1031  (95% CI [0.0787, 0.1273]) → idioms > non-idioms, **significant**.

### syn_frac

*synergy coverage in [0,1] = frac. of contexts with p>m. ↑ bigger = MORE synergy (most intuitive)*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 0.7944 | 0.9000 | 0.3000 | 1.0000 | [0.6944, 0.8833] |
| non-idioms | 18 | 0 | 0.4222 | 0.4000 | 0.0000 | 1.0000 | [0.3056, 0.5500] |

**Cross-dataset gap** (idioms − non-idioms): Δ = 0.3722  (95% CI [0.2111, 0.5222]) → idioms > non-idioms, **significant**.

### H_s^log

*log-space synergy = mean max{0, log p - log m} (nats). ↑ bigger = MORE synergy; finite always*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 7.5257 | 7.9194 | 2.4199 | 10.8237 | [6.3664, 8.6414] |
| non-idioms | 18 | 0 | 1.7939 | 1.3918 | 0.0000 | 4.8854 | [1.1927, 2.4500] |

**Cross-dataset gap** (idioms − non-idioms): Δ = 5.7318  (95% CI [4.4137, 7.0313]) → idioms > non-idioms, **significant**.

### H_s^log / H(p)

*log-space synergy ratio. ↑ bigger = MORE synergy*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 0.1299 | 0.1224 | 0.0379 | 0.2027 | [0.1080, 0.1514] |
| non-idioms | 18 | 0 | 0.0268 | 0.0200 | 0.0000 | 0.0827 | [0.0171, 0.0378] |

**Cross-dataset gap** (idioms − non-idioms): Δ = 0.1031  (95% CI [0.0787, 0.1273]) → idioms > non-idioms, **significant**.

### H_s^log signed

*signed log-space synergy = mean(log p - log m); can be negative (net anti-synergistic)*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 5.7538 | 7.0805 | -7.1500 | 10.7187 | [3.4152, 7.7723] |
| non-idioms | 18 | 0 | -2.6573 | -1.8479 | -12.8318 | 4.8854 | [-4.6894, -0.7346] |

**Cross-dataset gap** (idioms − non-idioms): Δ = 8.4111  (95% CI [5.3779, 11.2946]) → idioms > non-idioms, **significant**.

### H_s^reg

*regularized H_s (eps-floored, finite, continuous). ↑ bigger = LESS synergy*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 60.1193 | 59.8048 | 51.6295 | 69.4245 | [57.3974, 62.8652] |
| non-idioms | 18 | 0 | 72.3499 | 73.4249 | 59.1382 | 85.3007 | [68.9377, 75.7213] |

**Cross-dataset gap** (idioms − non-idioms): Δ = -12.2307  (95% CI [-16.5625, -7.8474]) → idioms < non-idioms, **significant**.

### H_s^reg / H(p)

*regularized synergy ratio (>= 1). ↑ bigger = LESS synergy*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 1.0166 | 1.0147 | 1.0001 | 1.0516 | [1.0103, 1.0240] |
| non-idioms | 18 | 0 | 1.0393 | 1.0412 | 1.0009 | 1.0662 | [1.0312, 1.0470] |

**Cross-dataset gap** (idioms − non-idioms): Δ = -0.0227  (95% CI [-0.0329, -0.0119]) → idioms < non-idioms, **significant**.

### H_s (original)

*synergy entropy = -log max{0, p - max(q,r)} (nats); +inf if ANY slot non-synergistic (mostly +inf)*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 3 | 15 | 59.7182 | 57.5708 | 56.6748 | 64.9090 | [56.6748, 64.9090] |
| non-idioms | 1 | 17 | 59.1382 | 59.1382 | 59.1382 | 59.1382 | [59.1382, 59.1382] |

**Cross-dataset gap** (idioms − non-idioms): Δ = 0.5800  (95% CI [-2.4634, 5.7708]) → idioms > non-idioms, not significant.

### H_s / H(p)

*original synergy ratio (mostly +inf; use H_s^log or syn_frac instead)*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 3 | 15 | 1.0002 | 1.0002 | 1.0001 | 1.0003 | [1.0001, 1.0003] |
| non-idioms | 1 | 17 | 1.0009 | 1.0009 | 1.0009 | 1.0009 | [1.0009, 1.0009] |

**Cross-dataset gap** (idioms − non-idioms): Δ = -0.0007  (95% CI [-0.0008, -0.0006]) → idioms < non-idioms, **significant**.

## Per-phrase detail

#### Idioms (sorted by H_u/H, descending)

| phrase | H(p) | H_u | H_u/H | syn_frac | H_s^log | H_s^log/H | H_s^reg | H_s^reg/H | H_s (orig) | n_idiom | n_head | n_non |
|---|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|
| cut corners | 51.1818 | 61.5549 | 1.2027 | 0.90 | 10.3731 | 0.2027 | 51.6430 | 1.0090 | +inf | 10 | 10 | 10 |
| break the mold | 55.7376 | 66.4007 | 1.1913 | 0.90 | 10.6630 | 0.1913 | 56.5712 | 1.0150 | +inf | 10 | 10 | 10 |
| strike a chord | 56.6711 | 67.3898 | 1.1891 | 1.00 | 10.7187 | 0.1891 | 56.6748 | 1.0001 | 56.6748 | 10 | 10 | 10 |
| call the shots | 51.6305 | 60.8343 | 1.1783 | 0.90 | 9.2038 | 0.1783 | 52.1065 | 1.0092 | +inf | 10 | 10 | 10 |
| turn tail | 63.0008 | 73.8246 | 1.1718 | 0.90 | 10.8237 | 0.1718 | 63.4621 | 1.0073 | +inf | 10 | 10 | 10 |
| clear the air | 57.5529 | 67.3676 | 1.1705 | 1.00 | 9.8147 | 0.1705 | 57.5708 | 1.0003 | 57.5708 | 10 | 10 | 10 |
| rock the boat | 50.7051 | 58.5971 | 1.1556 | 0.80 | 7.8921 | 0.1556 | 51.6295 | 1.0182 | +inf | 10 | 10 | 10 |
| have a ball | 53.2637 | 61.0281 | 1.1458 | 0.90 | 7.7644 | 0.1458 | 53.7308 | 1.0088 | +inf | 10 | 10 | 10 |
| spill the beans | 64.8975 | 72.8443 | 1.1225 | 1.00 | 7.9467 | 0.1225 | 64.9090 | 1.0002 | 64.9090 | 10 | 10 | 10 |
| lose ground | 65.5918 | 73.6221 | 1.1224 | 0.70 | 8.0303 | 0.1224 | 66.9806 | 1.0212 | +inf | 10 | 10 | 10 |
| bite the dust | 68.9618 | 77.3982 | 1.1223 | 0.90 | 8.4365 | 0.1223 | 69.4245 | 1.0067 | +inf | 10 | 10 | 10 |
| lead the field | 54.7682 | 61.4411 | 1.1218 | 0.90 | 6.6729 | 0.1218 | 55.5636 | 1.0145 | +inf | 10 | 10 | 10 |
| pull strings | 61.0846 | 67.3938 | 1.1033 | 0.80 | 6.3092 | 0.1033 | 62.0387 | 1.0156 | +inf | 10 | 10 | 10 |
| mean business | 55.0232 | 60.1902 | 1.0939 | 0.80 | 5.1670 | 0.0939 | 55.9558 | 1.0170 | +inf | 10 | 10 | 10 |
| run the show | 60.9860 | 65.4971 | 1.0740 | 0.40 | 4.5111 | 0.0740 | 63.7509 | 1.0453 | +inf | 10 | 10 | 10 |
| make waves | 62.4691 | 67.0313 | 1.0730 | 0.30 | 4.5622 | 0.0730 | 65.6943 | 1.0516 | +inf | 10 | 10 | 10 |
| raise hell | 66.7971 | 70.9512 | 1.0622 | 0.80 | 4.1541 | 0.0622 | 67.8567 | 1.0159 | +inf | 10 | 10 | 10 |
| get the sack | 63.8142 | 66.2341 | 1.0379 | 0.40 | 2.4199 | 0.0379 | 66.5845 | 1.0434 | +inf | 10 | 10 | 10 |

#### Non-idioms (sorted by H_u/H, descending)

| phrase | H(p) | H_u | H_u/H | syn_frac | H_s^log | H_s^log/H | H_s^reg | H_s^reg/H | H_s (orig) | n_idiom | n_head | n_non |
|---|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|
| cut hair | 59.0863 | 63.9717 | 1.0827 | 1.00 | 4.8854 | 0.0827 | 59.1382 | 1.0009 | 59.1382 | 10 | 10 | 10 |
| call the police | 58.6982 | 62.8302 | 1.0704 | 0.80 | 4.1320 | 0.0704 | 59.6838 | 1.0168 | +inf | 10 | 10 | 10 |
| clear the table | 64.6017 | 67.7981 | 1.0495 | 0.90 | 3.1964 | 0.0495 | 65.4016 | 1.0124 | +inf | 10 | 10 | 10 |
| build the boat | 62.5635 | 65.3335 | 1.0443 | 0.50 | 2.7700 | 0.0443 | 65.0507 | 1.0398 | +inf | 10 | 10 | 10 |
| tie knots | 75.1205 | 77.9483 | 1.0376 | 0.60 | 2.8278 | 0.0376 | 77.0866 | 1.0262 | +inf | 10 | 10 | 10 |
| turn dials | 82.4483 | 85.0896 | 1.0320 | 0.40 | 2.6413 | 0.0320 | 85.3007 | 1.0346 | +inf | 10 | 10 | 10 |
| raise children | 73.2252 | 75.5245 | 1.0314 | 0.30 | 2.2993 | 0.0314 | 76.5084 | 1.0448 | +inf | 10 | 10 | 10 |
| eat the apple | 77.9324 | 79.9512 | 1.0259 | 0.60 | 2.0188 | 0.0259 | 79.8841 | 1.0250 | +inf | 10 | 10 | 10 |
| break the window | 71.5117 | 73.1206 | 1.0225 | 0.40 | 1.6088 | 0.0225 | 74.4079 | 1.0405 | +inf | 10 | 10 | 10 |
| get a present | 67.2910 | 68.4657 | 1.0175 | 0.30 | 1.1747 | 0.0175 | 70.5399 | 1.0483 | +inf | 10 | 10 | 10 |
| make lunch | 68.7562 | 69.8871 | 1.0164 | 0.20 | 1.1309 | 0.0164 | 72.4419 | 1.0536 | +inf | 10 | 10 | 10 |
| see the show | 63.6295 | 64.6318 | 1.0158 | 0.30 | 1.0024 | 0.0158 | 66.8811 | 1.0511 | +inf | 10 | 10 | 10 |
| throw a ball | 64.9762 | 65.8297 | 1.0131 | 0.40 | 0.8535 | 0.0131 | 67.9788 | 1.0462 | +inf | 10 | 10 | 10 |
| lose keys | 77.9015 | 78.7161 | 1.0105 | 0.30 | 0.8146 | 0.0105 | 81.1731 | 1.0420 | +inf | 10 | 10 | 10 |
| spill the water | 75.1203 | 75.8618 | 1.0099 | 0.40 | 0.7415 | 0.0099 | 77.9905 | 1.0382 | +inf | 10 | 10 | 10 |
| remember details | 62.9839 | 63.1414 | 1.0025 | 0.10 | 0.1575 | 0.0025 | 67.1518 | 1.0662 | +inf | 10 | 10 | 10 |
| strike a drum | 76.9948 | 77.0303 | 1.0005 | 0.10 | 0.0355 | 0.0005 | 81.2603 | 1.0554 | +inf | 10 | 10 | 10 |
| lead the meeting | 69.8145 | 69.8145 | 1.0000 | 0.00 | 0.0000 | 0.0000 | 74.4197 | 1.0660 | +inf | 10 | 10 | 10 |