# H_u / H_s analysis — gemma2-9b · full · geo

Idiom vs. parallel literal-VP (non-idiom) datasets. All entropies are uniform MC averages over each phrase's observed contexts; scores are unnormalized geometric-mean (or joint) per-token LM probabilities. See `README.md` / `code/SCORING_MATH.md` for the derivation.

> **How to read these magnitudes** (directions, why H_s is +inf, the new finite synergy metrics): see [`INTERPRETATION.md`](../INTERPRETATION.md). Quick key — ↑`H_u/H(p)`, ↑`syn_frac`, ↑`H_s^log` mean **more** synergy; ↑`H_s^reg` and the original ↑`H_s` mean **less** synergy.

## Configuration

| field | idioms run | non-idioms run |
|---|---|---|
| model | google/gemma-2-9b | google/gemma-2-9b |
| reduction | geometric_mean | geometric_mean |
| medial_only | False | False |
| dtype | bfloat16 | bfloat16 |
| num_idioms | 18 | 18 |
| dataset | /home/prada/PID_evaluation/data/dataset.tsv | /home/prada/PID_evaluation/data/nonidioms_dataset.tsv |

Bound `H_u + H_s ≥ 2H(p) + 2log2 ≥ H(p)` holds for 18/18 idioms and 18/18 non-idioms.

`H_s = +inf` (≥1 non-synergistic slot) for 9/18 idioms and 17/18 non-idioms. A *finite* H_s means **every** context of that phrase is synergistic (p > max(q,r) everywhere).

## Per-metric summary (idioms vs non-idioms)

Means and 95% bootstrap CIs (20k resamples, percentile method, phrase-level). Non-finite values are dropped per metric before bootstrapping; the drop count is shown.

### H(p)

*base entropy, uniform MC average over contexts (nats). ↓ smaller = idiom more concentrated*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 4.6476 | 4.7657 | 3.6368 | 5.7821 | [4.3481, 4.9471] |
| non-idioms | 18 | 0 | 5.5071 | 5.4914 | 4.2887 | 7.2015 | [5.1316, 5.9022] |

**Cross-dataset gap** (idioms − non-idioms): Δ = -0.8596  (95% CI [-1.3533, -0.3726]) → idioms < non-idioms, **significant**.

### H_u

*unique / redundant entropy = -log min{p, max(q,r)} (nats); >= H(p)*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 5.5396 | 5.4147 | 4.5045 | 6.6563 | [5.2361, 5.8535] |
| non-idioms | 18 | 0 | 5.8965 | 5.8234 | 4.8444 | 7.5323 | [5.5560, 6.2605] |

**Cross-dataset gap** (idioms − non-idioms): Δ = -0.3569  (95% CI [-0.8317, 0.1096]) → idioms < non-idioms, not significant.

### H_u / H(p)

*unique-information ratio (>= 1). ↑ bigger = MORE synergy. THE headline metric*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 1.1968 | 1.1956 | 1.1021 | 1.3031 | [1.1676, 1.2257] |
| non-idioms | 18 | 0 | 1.0748 | 1.0674 | 1.0153 | 1.1623 | [1.0564, 1.0944] |

**Cross-dataset gap** (idioms − non-idioms): Δ = 0.1220  (95% CI [0.0871, 0.1568]) → idioms > non-idioms, **significant**.

### syn_frac

*synergy coverage in [0,1] = frac. of contexts with p>m. ↑ bigger = MORE synergy (most intuitive)*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 0.8833 | 0.9500 | 0.3000 | 1.0000 | [0.7889, 0.9556] |
| non-idioms | 18 | 0 | 0.6167 | 0.6500 | 0.2000 | 1.0000 | [0.5056, 0.7278] |

**Cross-dataset gap** (idioms − non-idioms): Δ = 0.2667  (95% CI [0.1222, 0.4056]) → idioms > non-idioms, **significant**.

### H_s^log

*log-space synergy = mean max{0, log p - log m} (nats). ↑ bigger = MORE synergy; finite always*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 0.8920 | 0.8826 | 0.4635 | 1.4191 | [0.7767, 1.0115] |
| non-idioms | 18 | 0 | 0.3893 | 0.3847 | 0.1104 | 0.7266 | [0.3040, 0.4775] |

**Cross-dataset gap** (idioms − non-idioms): Δ = 0.5027  (95% CI [0.3595, 0.6500]) → idioms > non-idioms, **significant**.

### H_s^log / H(p)

*log-space synergy ratio. ↑ bigger = MORE synergy*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 0.1968 | 0.1956 | 0.1021 | 0.3031 | [0.1676, 0.2257] |
| non-idioms | 18 | 0 | 0.0748 | 0.0674 | 0.0153 | 0.1623 | [0.0564, 0.0944] |

**Cross-dataset gap** (idioms − non-idioms): Δ = 0.1220  (95% CI [0.0871, 0.1568]) → idioms > non-idioms, **significant**.

### H_s^log signed

*signed log-space synergy = mean(log p - log m); can be negative (net anti-synergistic)*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 0.8008 | 0.8719 | 0.0020 | 1.3542 | [0.6282, 0.9591] |
| non-idioms | 18 | 0 | 0.1797 | 0.2683 | -0.4272 | 0.7266 | [0.0261, 0.3309] |

**Cross-dataset gap** (idioms − non-idioms): Δ = 0.6210  (95% CI [0.3945, 0.8441]) → idioms > non-idioms, **significant**.

### H_s^reg

*regularized H_s (eps-floored, finite, continuous). ↑ bigger = LESS synergy*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 5.7541 | 5.5225 | 4.4307 | 8.3596 | [5.2332, 6.3165] |
| non-idioms | 18 | 0 | 7.9027 | 7.8350 | 5.4335 | 10.5804 | [7.2184, 8.5862] |

**Cross-dataset gap** (idioms − non-idioms): Δ = -2.1486  (95% CI [-3.0120, -1.2638]) → idioms < non-idioms, **significant**.

### H_s^reg / H(p)

*regularized synergy ratio (>= 1). ↑ bigger = LESS synergy*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 1.2336 | 1.1837 | 1.0702 | 1.6514 | [1.1715, 1.3077] |
| non-idioms | 18 | 0 | 1.4325 | 1.4499 | 1.2117 | 1.6681 | [1.3666, 1.4983] |

**Cross-dataset gap** (idioms − non-idioms): Δ = -0.1990  (95% CI [-0.2902, -0.1010]) → idioms < non-idioms, **significant**.

### H_s (original)

*synergy entropy = -log max{0, p - max(q,r)} (nats); +inf if ANY slot non-synergistic (mostly +inf)*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 9 | 9 | 4.8710 | 4.7454 | 4.4307 | 5.6457 | [4.6175, 5.1515] |
| non-idioms | 1 | 17 | 5.4335 | 5.4335 | 5.4335 | 5.4335 | [5.4335, 5.4335] |

**Cross-dataset gap** (idioms − non-idioms): Δ = -0.5625  (95% CI [-0.8174, -0.2834]) → idioms < non-idioms, **significant**.

### H_s / H(p)

*original synergy ratio (mostly +inf; use H_s^log or syn_frac instead)*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 9 | 9 | 1.1480 | 1.1537 | 1.0702 | 1.2038 | [1.1218, 1.1733] |
| non-idioms | 1 | 17 | 1.2136 | 1.2136 | 1.2136 | 1.2136 | [1.2136, 1.2136] |

**Cross-dataset gap** (idioms − non-idioms): Δ = -0.0657  (95% CI [-0.0919, -0.0405]) → idioms < non-idioms, **significant**.

## Per-phrase detail

#### Idioms (sorted by H_u/H, descending)

| phrase | H(p) | H_u | H_u/H | syn_frac | H_s^log | H_s^log/H | H_s^reg | H_s^reg/H | H_s (orig) | n_idiom | n_head | n_non |
|---|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|
| cut corners | 3.9649 | 5.1665 | 1.3031 | 1.00 | 1.2016 | 0.3031 | 4.4307 | 1.1175 | 4.4307 | 10 | 10 | 10 |
| turn tail | 5.2372 | 6.6563 | 1.2710 | 0.90 | 1.4191 | 0.2710 | 5.9274 | 1.1318 | +inf | 10 | 10 | 10 |
| strike a chord | 5.0451 | 6.3993 | 1.2684 | 1.00 | 1.3542 | 0.2684 | 5.3993 | 1.0702 | 5.3993 | 10 | 10 | 10 |
| break the mold | 3.9644 | 5.0128 | 1.2645 | 1.00 | 1.0484 | 0.2645 | 4.7454 | 1.1970 | 4.7454 | 10 | 10 | 10 |
| call the shots | 3.8186 | 4.7666 | 1.2483 | 1.00 | 0.9480 | 0.2483 | 4.4984 | 1.1780 | 4.4984 | 10 | 10 | 10 |
| clear the air | 4.0676 | 5.0542 | 1.2426 | 1.00 | 0.9866 | 0.2426 | 4.5837 | 1.1269 | 4.5837 | 10 | 10 | 10 |
| have a ball | 3.6368 | 4.5045 | 1.2386 | 0.90 | 0.8677 | 0.2386 | 4.6188 | 1.2700 | +inf | 10 | 10 | 10 |
| rock the boat | 3.8770 | 4.7743 | 1.2315 | 1.00 | 0.8974 | 0.2315 | 4.4735 | 1.1539 | 4.4735 | 10 | 10 | 10 |
| pull strings | 4.3037 | 5.1767 | 1.2028 | 1.00 | 0.8730 | 0.2028 | 4.9653 | 1.1537 | 4.9653 | 10 | 10 | 10 |
| bite the dust | 5.2715 | 6.2647 | 1.1884 | 0.90 | 0.9932 | 0.1884 | 6.2358 | 1.1829 | +inf | 10 | 10 | 10 |
| mean business | 5.1040 | 5.9962 | 1.1748 | 0.90 | 0.8921 | 0.1748 | 6.0457 | 1.1845 | +inf | 10 | 10 | 10 |
| lead the field | 4.9924 | 5.8537 | 1.1725 | 1.00 | 0.8613 | 0.1725 | 5.6457 | 1.1309 | 5.6457 | 10 | 10 | 10 |
| spill the beans | 4.2340 | 4.9068 | 1.1589 | 1.00 | 0.6728 | 0.1589 | 5.0967 | 1.2038 | 5.0967 | 10 | 10 | 10 |
| lose ground | 5.7821 | 6.6347 | 1.1474 | 0.80 | 0.8525 | 0.1474 | 7.2103 | 1.2470 | +inf | 10 | 10 | 10 |
| make waves | 5.0621 | 5.6528 | 1.1167 | 0.30 | 0.5906 | 0.1167 | 8.3596 | 1.6514 | +inf | 10 | 10 | 10 |
| run the show | 5.2404 | 5.7936 | 1.1056 | 0.50 | 0.5532 | 0.1056 | 7.8056 | 1.4895 | +inf | 10 | 10 | 10 |
| raise hell | 5.5155 | 6.0961 | 1.1053 | 0.90 | 0.5805 | 0.1053 | 6.8142 | 1.2355 | +inf | 10 | 10 | 10 |
| get the sack | 4.5390 | 5.0025 | 1.1021 | 0.80 | 0.4635 | 0.1021 | 6.7172 | 1.4799 | +inf | 10 | 10 | 10 |

#### Non-idioms (sorted by H_u/H, descending)

| phrase | H(p) | H_u | H_u/H | syn_frac | H_s^log | H_s^log/H | H_s^reg | H_s^reg/H | H_s (orig) | n_idiom | n_head | n_non |
|---|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|
| cut hair | 4.4770 | 5.2036 | 1.1623 | 1.00 | 0.7266 | 0.1623 | 5.4335 | 1.2136 | 5.4335 | 10 | 10 | 10 |
| clear the table | 4.8250 | 5.4680 | 1.1333 | 0.90 | 0.6430 | 0.1333 | 6.0721 | 1.2585 | +inf | 10 | 10 | 10 |
| call the police | 4.2887 | 4.8444 | 1.1296 | 0.90 | 0.5557 | 0.1296 | 5.8921 | 1.3739 | +inf | 10 | 10 | 10 |
| build the boat | 4.7950 | 5.3324 | 1.1121 | 0.80 | 0.5374 | 0.1121 | 6.4897 | 1.3534 | +inf | 10 | 10 | 10 |
| eat the apple | 5.6162 | 6.1850 | 1.1013 | 0.70 | 0.5688 | 0.1013 | 7.4819 | 1.3322 | +inf | 10 | 10 | 10 |
| break the window | 5.2365 | 5.7474 | 1.0976 | 0.90 | 0.5109 | 0.0976 | 6.5469 | 1.2503 | +inf | 10 | 10 | 10 |
| turn dials | 6.8953 | 7.5323 | 1.0924 | 0.80 | 0.6370 | 0.0924 | 8.3551 | 1.2117 | +inf | 10 | 10 | 10 |
| throw a ball | 4.5289 | 4.9377 | 1.0902 | 0.70 | 0.4087 | 0.0902 | 6.5368 | 1.4433 | +inf | 10 | 10 | 10 |
| tie knots | 5.4761 | 5.8831 | 1.0743 | 0.80 | 0.4070 | 0.0743 | 7.5337 | 1.3757 | +inf | 10 | 10 | 10 |
| raise children | 5.9868 | 6.3492 | 1.0605 | 0.50 | 0.3624 | 0.0605 | 8.8072 | 1.4711 | +inf | 10 | 10 | 10 |
| make lunch | 5.5909 | 5.8991 | 1.0551 | 0.20 | 0.3082 | 0.0551 | 9.3263 | 1.6681 | +inf | 10 | 10 | 10 |
| get a present | 4.8531 | 5.0970 | 1.0503 | 0.40 | 0.2439 | 0.0503 | 7.9645 | 1.6411 | +inf | 10 | 10 | 10 |
| see the show | 5.5067 | 5.7636 | 1.0467 | 0.30 | 0.2569 | 0.0467 | 8.9072 | 1.6175 | +inf | 10 | 10 | 10 |
| spill the water | 4.8260 | 5.0276 | 1.0418 | 0.50 | 0.2016 | 0.0418 | 7.7055 | 1.5967 | +inf | 10 | 10 | 10 |
| remember details | 6.0408 | 6.2705 | 1.0380 | 0.60 | 0.2296 | 0.0380 | 8.7978 | 1.4564 | +inf | 10 | 10 | 10 |
| lose keys | 6.6354 | 6.8004 | 1.0249 | 0.30 | 0.1650 | 0.0249 | 10.1210 | 1.5253 | +inf | 10 | 10 | 10 |
| lead the meeting | 6.3488 | 6.4838 | 1.0213 | 0.40 | 0.1350 | 0.0213 | 9.6971 | 1.5274 | +inf | 10 | 10 | 10 |
| strike a drum | 7.2015 | 7.3119 | 1.0153 | 0.40 | 0.1104 | 0.0153 | 10.5804 | 1.4692 | +inf | 10 | 10 | 10 |

