# H_u / H_s analysis — llama3.1-8b · medial · geo

Idiom vs. parallel literal-VP (non-idiom) datasets. All entropies are uniform MC averages over each phrase's observed contexts; scores are unnormalized geometric-mean (or joint) per-token LM probabilities. See `README.md` / `code/SCORING_MATH.md` for the derivation.

> **How to read these magnitudes** (directions, why H_s is +inf, the new finite synergy metrics): see [`INTERPRETATION.md`](../INTERPRETATION.md). Quick key — ↑`H_u/H(p)`, ↑`syn_frac`, ↑`H_s^log` mean **more** synergy; ↑`H_s^reg` and the original ↑`H_s` mean **less** synergy.

## Configuration

| field | idioms run | non-idioms run |
|---|---|---|
| model | meta-llama/Llama-3.1-8B | meta-llama/Llama-3.1-8B |
| reduction | geometric_mean | geometric_mean |
| medial_only | True | True |
| dtype | bfloat16 | bfloat16 |
| num_idioms | 18 | 18 |
| dataset | /home/prada/PID_evaluation/data/dataset.tsv | /home/prada/PID_evaluation/data/nonidioms_dataset.tsv |

Bound `H_u + H_s ≥ 2H(p) + 2log2 ≥ H(p)` holds for 18/18 idioms and 18/18 non-idioms.

`H_s = +inf` (≥1 non-synergistic slot) for 1/18 idioms and 13/18 non-idioms. A *finite* H_s means **every** context of that phrase is synergistic (p > max(q,r) everywhere).

## Per-metric summary (idioms vs non-idioms)

Means and 95% bootstrap CIs (20k resamples, percentile method, phrase-level). Non-finite values are dropped per metric before bootstrapping; the drop count is shown.

### H(p)

*base entropy, uniform MC average over contexts (nats). ↓ smaller = idiom more concentrated*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 4.0475 | 4.0517 | 3.2660 | 5.2415 | [3.8087, 4.2932] |
| non-idioms | 18 | 0 | 4.6510 | 4.6364 | 3.7665 | 5.7817 | [4.3866, 4.9324] |

**Cross-dataset gap** (idioms − non-idioms): Δ = -0.6035  (95% CI [-0.9794, -0.2355]) → idioms < non-idioms, **significant**.

### H_u

*unique / redundant entropy = -log min{p, max(q,r)} (nats); >= H(p)*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 4.8651 | 4.8042 | 4.0641 | 5.8812 | [4.6382, 5.1005] |
| non-idioms | 18 | 0 | 4.9637 | 4.9909 | 4.1236 | 5.8794 | [4.7232, 5.2129] |

**Cross-dataset gap** (idioms − non-idioms): Δ = -0.0986  (95% CI [-0.4383, 0.2406]) → idioms < non-idioms, not significant.

### H_u / H(p)

*unique-information ratio (>= 1). ↑ bigger = MORE synergy. THE headline metric*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 1.2072 | 1.1966 | 1.1199 | 1.3123 | [1.1794, 1.2361] |
| non-idioms | 18 | 0 | 1.0703 | 1.0648 | 1.0125 | 1.1617 | [1.0529, 1.0890] |

**Cross-dataset gap** (idioms − non-idioms): Δ = 0.1369  (95% CI [0.1038, 0.1709]) → idioms > non-idioms, **significant**.

### syn_frac

*synergy coverage in [0,1] = frac. of contexts with p>m. ↑ bigger = MORE synergy (most intuitive)*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 0.9778 | 1.0000 | 0.6000 | 1.0000 | [0.9333, 1.0000] |
| non-idioms | 18 | 0 | 0.7444 | 0.8000 | 0.4000 | 1.0000 | [0.6444, 0.8333] |

**Cross-dataset gap** (idioms − non-idioms): Δ = 0.2333  (95% CI [0.1222, 0.3333]) → idioms > non-idioms, **significant**.

### H_s^log

*log-space synergy = mean max{0, log p - log m} (nats). ↑ bigger = MORE synergy; finite always*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 0.8176 | 0.7783 | 0.5453 | 1.2982 | [0.7302, 0.9114] |
| non-idioms | 18 | 0 | 0.3127 | 0.3169 | 0.0725 | 0.6523 | [0.2424, 0.3868] |

**Cross-dataset gap** (idioms − non-idioms): Δ = 0.5049  (95% CI [0.3902, 0.6230]) → idioms > non-idioms, **significant**.

### H_s^log / H(p)

*log-space synergy ratio. ↑ bigger = MORE synergy*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 0.2072 | 0.1966 | 0.1199 | 0.3123 | [0.1794, 0.2361] |
| non-idioms | 18 | 0 | 0.0703 | 0.0648 | 0.0125 | 0.1617 | [0.0529, 0.0890] |

**Cross-dataset gap** (idioms − non-idioms): Δ = 0.1369  (95% CI [0.1038, 0.1709]) → idioms > non-idioms, **significant**.

### H_s^log signed

*signed log-space synergy = mean(log p - log m); can be negative (net anti-synergistic)*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 0.8115 | 0.7783 | 0.5443 | 1.2982 | [0.7209, 0.9082] |
| non-idioms | 18 | 0 | 0.2670 | 0.2597 | -0.0378 | 0.6523 | [0.1846, 0.3548] |

**Cross-dataset gap** (idioms − non-idioms): Δ = 0.5445  (95% CI [0.4176, 0.6740]) → idioms > non-idioms, **significant**.

### H_s^reg

*regularized H_s (eps-floored, finite, continuous). ↑ bigger = LESS synergy*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 4.8183 | 4.6775 | 3.7773 | 6.1694 | [4.4867, 5.1571] |
| non-idioms | 18 | 0 | 6.8554 | 6.8319 | 4.7968 | 8.9615 | [6.3136, 7.3881] |

**Cross-dataset gap** (idioms − non-idioms): Δ = -2.0370  (95% CI [-2.6646, -1.4033]) → idioms < non-idioms, **significant**.

### H_s^reg / H(p)

*regularized synergy ratio (>= 1). ↑ bigger = LESS synergy*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 1.1901 | 1.1742 | 1.0947 | 1.5309 | [1.1550, 1.2386] |
| non-idioms | 18 | 0 | 1.4720 | 1.5090 | 1.1891 | 1.6812 | [1.4000, 1.5386] |

**Cross-dataset gap** (idioms − non-idioms): Δ = -0.2818  (95% CI [-0.3597, -0.1960]) → idioms < non-idioms, **significant**.

### H_s (original)

*synergy entropy = -log max{0, p - max(q,r)} (nats); +inf if ANY slot non-synergistic (mostly +inf)*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 17 | 1 | 4.7389 | 4.5509 | 3.7773 | 6.0544 | [4.4288, 5.0600] |
| non-idioms | 5 | 13 | 5.5824 | 5.4632 | 4.7968 | 6.4748 | [5.0688, 6.1203] |

**Cross-dataset gap** (idioms − non-idioms): Δ = -0.8436  (95% CI [-1.4715, -0.2422]) → idioms < non-idioms, **significant**.

### H_s / H(p)

*original synergy ratio (mostly +inf; use H_s^log or syn_frac instead)*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 17 | 1 | 1.1701 | 1.1690 | 1.0947 | 1.2636 | [1.1498, 1.1905] |
| non-idioms | 5 | 13 | 1.3164 | 1.2803 | 1.1891 | 1.5707 | [1.2085, 1.4508] |

**Cross-dataset gap** (idioms − non-idioms): Δ = -0.1463  (95% CI [-0.2831, -0.0386]) → idioms < non-idioms, **significant**.

## Per-phrase detail

#### Idioms (sorted by H_u/H, descending)

| phrase | H(p) | H_u | H_u/H | syn_frac | H_s^log | H_s^log/H | H_s^reg | H_s^reg/H | H_s (orig) | n_idiom | n_head | n_non |
|---|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|
| bite the dust | 4.1571 | 5.4554 | 1.3123 | 1.00 | 1.2982 | 0.3123 | 4.5509 | 1.0947 | 4.5509 | 5 | 5 | 5 |
| break the mold | 3.3007 | 4.2764 | 1.2956 | 1.00 | 0.9757 | 0.2956 | 3.7773 | 1.1444 | 3.7773 | 5 | 5 | 5 |
| cut corners | 3.7012 | 4.7917 | 1.2946 | 1.00 | 1.0905 | 0.2946 | 4.1546 | 1.1225 | 4.1546 | 5 | 5 | 5 |
| call the shots | 3.4719 | 4.4920 | 1.2938 | 1.00 | 1.0201 | 0.2938 | 3.9274 | 1.1312 | 3.9274 | 5 | 5 | 5 |
| strike a chord | 3.7034 | 4.6598 | 1.2583 | 1.00 | 0.9564 | 0.2583 | 4.1933 | 1.1323 | 4.1933 | 5 | 5 | 5 |
| rock the boat | 3.2660 | 4.0641 | 1.2444 | 1.00 | 0.7981 | 0.2444 | 3.8735 | 1.1860 | 3.8735 | 5 | 5 | 5 |
| turn tail | 4.7003 | 5.7247 | 1.2179 | 1.00 | 1.0244 | 0.2179 | 5.2572 | 1.1185 | 5.2572 | 5 | 5 | 5 |
| clear the air | 3.9053 | 4.7060 | 1.2050 | 1.00 | 0.8007 | 0.2050 | 4.5254 | 1.1588 | 4.5254 | 5 | 5 | 5 |
| have a ball | 3.4835 | 4.1708 | 1.1973 | 1.00 | 0.6873 | 0.1973 | 4.2723 | 1.2264 | 4.2723 | 5 | 5 | 5 |
| pull strings | 4.0734 | 4.8716 | 1.1959 | 1.00 | 0.7982 | 0.1959 | 4.8041 | 1.1794 | 4.8041 | 5 | 5 | 5 |
| spill the beans | 3.6861 | 4.4017 | 1.1941 | 1.00 | 0.7156 | 0.1941 | 4.4434 | 1.2054 | 4.4434 | 5 | 5 | 5 |
| run the show | 4.4776 | 5.2324 | 1.1686 | 1.00 | 0.7548 | 0.1686 | 5.3119 | 1.1863 | 5.3119 | 5 | 5 | 5 |
| lose ground | 4.5653 | 5.3238 | 1.1661 | 1.00 | 0.7585 | 0.1661 | 5.3369 | 1.1690 | 5.3369 | 5 | 5 | 5 |
| make waves | 4.0299 | 4.6839 | 1.1623 | 0.60 | 0.6541 | 0.1623 | 6.1694 | 1.5309 | +inf | 5 | 5 | 5 |
| lead the field | 4.1642 | 4.8167 | 1.1567 | 1.00 | 0.6526 | 0.1567 | 4.9916 | 1.1987 | 4.9916 | 5 | 5 | 5 |
| raise hell | 4.3678 | 4.9131 | 1.1248 | 1.00 | 0.5453 | 0.1248 | 5.3255 | 1.2193 | 5.3255 | 5 | 5 | 5 |
| mean business | 5.2415 | 5.8812 | 1.1220 | 1.00 | 0.6396 | 0.1220 | 6.0544 | 1.1551 | 6.0544 | 5 | 5 | 5 |
| get the sack | 4.5592 | 5.1061 | 1.1199 | 1.00 | 0.5469 | 0.1199 | 5.7609 | 1.2636 | 5.7609 | 5 | 5 | 5 |

#### Non-idioms (sorted by H_u/H, descending)

| phrase | H(p) | H_u | H_u/H | syn_frac | H_s^log | H_s^log/H | H_s^reg | H_s^reg/H | H_s (orig) | n_idiom | n_head | n_non |
|---|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|
| cut hair | 4.0340 | 4.6862 | 1.1617 | 1.00 | 0.6523 | 0.1617 | 4.7968 | 1.1891 | 4.7968 | 5 | 5 | 5 |
| clear the table | 4.3151 | 4.8920 | 1.1337 | 1.00 | 0.5770 | 0.1337 | 5.1436 | 1.1920 | 5.1436 | 5 | 5 | 5 |
| call the police | 4.0741 | 4.5319 | 1.1124 | 0.80 | 0.4579 | 0.1124 | 5.7187 | 1.4037 | +inf | 5 | 5 | 5 |
| break the window | 4.0473 | 4.4329 | 1.0953 | 1.00 | 0.3856 | 0.0953 | 5.4632 | 1.3498 | 5.4632 | 5 | 5 | 5 |
| build the boat | 3.7665 | 4.1236 | 1.0948 | 0.80 | 0.3572 | 0.0948 | 5.8930 | 1.5646 | +inf | 5 | 5 | 5 |
| eat the apple | 4.7155 | 5.1622 | 1.0947 | 0.80 | 0.4467 | 0.0947 | 6.5328 | 1.3854 | +inf | 5 | 5 | 5 |
| tie knots | 4.7127 | 5.0898 | 1.0800 | 1.00 | 0.3772 | 0.0800 | 6.0338 | 1.2803 | 6.0338 | 5 | 5 | 5 |
| raise children | 4.7376 | 5.1131 | 1.0793 | 0.40 | 0.3756 | 0.0793 | 7.7006 | 1.6254 | +inf | 5 | 5 | 5 |
| turn dials | 5.3355 | 5.6966 | 1.0677 | 0.80 | 0.3611 | 0.0677 | 7.1291 | 1.3362 | +inf | 5 | 5 | 5 |
| make lunch | 4.4657 | 4.7423 | 1.0619 | 0.40 | 0.2766 | 0.0619 | 7.5078 | 1.6812 | +inf | 5 | 5 | 5 |
| see the show | 4.9123 | 5.1775 | 1.0540 | 0.40 | 0.2652 | 0.0540 | 7.9679 | 1.6220 | +inf | 5 | 5 | 5 |
| throw a ball | 3.9977 | 4.1916 | 1.0485 | 0.80 | 0.1938 | 0.0485 | 6.5347 | 1.6346 | +inf | 5 | 5 | 5 |
| lose keys | 5.0988 | 5.3361 | 1.0465 | 0.60 | 0.2373 | 0.0465 | 7.6729 | 1.5048 | +inf | 5 | 5 | 5 |
| get a present | 4.5602 | 4.7466 | 1.0409 | 0.60 | 0.1864 | 0.0409 | 7.3646 | 1.6150 | +inf | 5 | 5 | 5 |
| lead the meeting | 5.2835 | 5.4443 | 1.0304 | 0.60 | 0.1609 | 0.0304 | 7.9948 | 1.5132 | +inf | 5 | 5 | 5 |
| spill the water | 4.1221 | 4.2459 | 1.0300 | 1.00 | 0.1238 | 0.0300 | 6.4748 | 1.5707 | 6.4748 | 5 | 5 | 5 |
| remember details | 5.7583 | 5.8794 | 1.0210 | 0.80 | 0.1211 | 0.0210 | 8.5061 | 1.4772 | +inf | 5 | 5 | 5 |
| strike a drum | 5.7817 | 5.8541 | 1.0125 | 0.60 | 0.0725 | 0.0125 | 8.9615 | 1.5500 | +inf | 5 | 5 | 5 |