# H_u / H_s analysis — llama3.1-8b · full · geo

Idiom vs. parallel literal-VP (non-idiom) datasets. All entropies are uniform MC averages over each phrase's observed contexts; scores are unnormalized geometric-mean (or joint) per-token LM probabilities. See `README.md` / `code/SCORING_MATH.md` for the derivation.

> **How to read these magnitudes** (directions, why H_s is +inf, the new finite synergy metrics): see [`INTERPRETATION.md`](../INTERPRETATION.md). Quick key — ↑`H_u/H(p)`, ↑`syn_frac`, ↑`H_s^log` mean **more** synergy; ↑`H_s^reg` and the original ↑`H_s` mean **less** synergy.

## Configuration

| field | idioms run | non-idioms run |
|---|---|---|
| model | meta-llama/Llama-3.1-8B | meta-llama/Llama-3.1-8B |
| reduction | geometric_mean | geometric_mean |
| medial_only | False | False |
| dtype | bfloat16 | bfloat16 |
| num_idioms | 18 | 18 |
| dataset | /home/prada/PID_evaluation/data/dataset.tsv | /home/prada/PID_evaluation/data/nonidioms_dataset.tsv |

Bound `H_u + H_s ≥ 2H(p) + 2log2 ≥ H(p)` holds for 18/18 idioms and 18/18 non-idioms.

`H_s = +inf` (≥1 non-synergistic slot) for 10/18 idioms and 17/18 non-idioms. A *finite* H_s means **every** context of that phrase is synergistic (p > max(q,r) everywhere).

## Per-metric summary (idioms vs non-idioms)

Means and 95% bootstrap CIs (20k resamples, percentile method, phrase-level). Non-finite values are dropped per metric before bootstrapping; the drop count is shown.

### H(p)

*base entropy, uniform MC average over contexts (nats). ↓ smaller = idiom more concentrated*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 4.2930 | 4.4169 | 3.6428 | 4.9992 | [4.0970, 4.4905] |
| non-idioms | 18 | 0 | 4.8940 | 4.8063 | 4.1070 | 6.0341 | [4.6622, 5.1408] |

**Cross-dataset gap** (idioms − non-idioms): Δ = -0.6010  (95% CI [-0.9161, -0.2945]) → idioms < non-idioms, **significant**.

### H_u

*unique / redundant entropy = -log min{p, max(q,r)} (nats); >= H(p)*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 4.9786 | 4.8970 | 4.4200 | 5.6592 | [4.7920, 5.1731] |
| non-idioms | 18 | 0 | 5.1527 | 5.0831 | 4.5225 | 6.1509 | [4.9530, 5.3667] |

**Cross-dataset gap** (idioms − non-idioms): Δ = -0.1742  (95% CI [-0.4582, 0.1098]) → idioms < non-idioms, not significant.

### H_u / H(p)

*unique-information ratio (>= 1). ↑ bigger = MORE synergy. THE headline metric*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 1.1626 | 1.1640 | 1.0796 | 1.2361 | [1.1397, 1.1857] |
| non-idioms | 18 | 0 | 1.0551 | 1.0512 | 1.0163 | 1.1083 | [1.0431, 1.0680] |

**Cross-dataset gap** (idioms − non-idioms): Δ = 0.1075  (95% CI [0.0814, 0.1337]) → idioms > non-idioms, **significant**.

### syn_frac

*synergy coverage in [0,1] = frac. of contexts with p>m. ↑ bigger = MORE synergy (most intuitive)*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 0.8889 | 0.9000 | 0.3000 | 1.0000 | [0.8000, 0.9500] |
| non-idioms | 18 | 0 | 0.6611 | 0.7000 | 0.3000 | 1.0000 | [0.5611, 0.7611] |

**Cross-dataset gap** (idioms − non-idioms): Δ = 0.2278  (95% CI [0.1000, 0.3500]) → idioms > non-idioms, **significant**.

### H_s^log

*log-space synergy = mean max{0, log p - log m} (nats). ↑ bigger = MORE synergy; finite always*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 0.6856 | 0.6476 | 0.3709 | 1.0811 | [0.6020, 0.7739] |
| non-idioms | 18 | 0 | 0.2588 | 0.2502 | 0.0901 | 0.4661 | [0.2098, 0.3103] |

**Cross-dataset gap** (idioms − non-idioms): Δ = 0.4268  (95% CI [0.3275, 0.5297]) → idioms > non-idioms, **significant**.

### H_s^log / H(p)

*log-space synergy ratio. ↑ bigger = MORE synergy*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 0.1626 | 0.1640 | 0.0796 | 0.2361 | [0.1397, 0.1857] |
| non-idioms | 18 | 0 | 0.0551 | 0.0512 | 0.0163 | 0.1083 | [0.0431, 0.0680] |

**Cross-dataset gap** (idioms − non-idioms): Δ = 0.1075  (95% CI [0.0814, 0.1337]) → idioms > non-idioms, **significant**.

### H_s^log signed

*signed log-space synergy = mean(log p - log m); can be negative (net anti-synergistic)*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 0.6564 | 0.6476 | 0.1399 | 1.0811 | [0.5517, 0.7593] |
| non-idioms | 18 | 0 | 0.1565 | 0.1790 | -0.0950 | 0.4593 | [0.0757, 0.2408] |

**Cross-dataset gap** (idioms − non-idioms): Δ = 0.4999  (95% CI [0.3644, 0.6331]) → idioms > non-idioms, **significant**.

### H_s^reg

*regularized H_s (eps-floored, finite, continuous). ↑ bigger = LESS synergy*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 5.5104 | 5.2071 | 4.2587 | 8.0078 | [5.1032, 5.9698] |
| non-idioms | 18 | 0 | 7.4515 | 7.4104 | 5.4245 | 9.4169 | [6.9380, 7.9704] |

**Cross-dataset gap** (idioms − non-idioms): Δ = -1.9412  (95% CI [-2.6140, -1.2608]) → idioms < non-idioms, **significant**.

### H_s^reg / H(p)

*regularized synergy ratio (>= 1). ↑ bigger = LESS synergy*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 1.2796 | 1.2418 | 1.1001 | 1.7177 | [1.2215, 1.3516] |
| non-idioms | 18 | 0 | 1.5182 | 1.5191 | 1.3122 | 1.6983 | [1.4626, 1.5724] |

**Cross-dataset gap** (idioms − non-idioms): Δ = -0.2386  (95% CI [-0.3190, -0.1496]) → idioms < non-idioms, **significant**.

### H_s (original)

*synergy entropy = -log max{0, p - max(q,r)} (nats); +inf if ANY slot non-synergistic (mostly +inf)*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 8 | 10 | 4.9599 | 4.9003 | 4.2587 | 6.1856 | [4.5967, 5.3930] |
| non-idioms | 1 | 17 | 5.4245 | 5.4245 | 5.4245 | 5.4245 | [5.4245, 5.4245] |

**Cross-dataset gap** (idioms − non-idioms): Δ = -0.4646  (95% CI [-0.8286, -0.0418]) → idioms < non-idioms, **significant**.

### H_s / H(p)

*original synergy ratio (mostly +inf; use H_s^log or syn_frac instead)*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 8 | 10 | 1.1965 | 1.1897 | 1.1001 | 1.2682 | [1.1617, 1.2285] |
| non-idioms | 1 | 17 | 1.3208 | 1.3208 | 1.3208 | 1.3208 | [1.3208, 1.3208] |

**Cross-dataset gap** (idioms − non-idioms): Δ = -0.1243  (95% CI [-0.1591, -0.0922]) → idioms < non-idioms, **significant**.

## Per-phrase detail

#### Idioms (sorted by H_u/H, descending)

| phrase | H(p) | H_u | H_u/H | syn_frac | H_s^log | H_s^log/H | H_s^reg | H_s^reg/H | H_s (orig) | n_idiom | n_head | n_non |
|---|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|
| strike a chord | 4.5781 | 5.6592 | 1.2361 | 1.00 | 1.0811 | 0.2361 | 5.0366 | 1.1001 | 5.0366 | 10 | 10 | 10 |
| rock the boat | 3.6428 | 4.4707 | 1.2273 | 1.00 | 0.8279 | 0.2273 | 4.2587 | 1.1691 | 4.2587 | 10 | 10 | 10 |
| call the shots | 3.7561 | 4.5806 | 1.2195 | 1.00 | 0.8246 | 0.2195 | 4.4538 | 1.1858 | 4.4538 | 10 | 10 | 10 |
| bite the dust | 4.4583 | 5.4296 | 1.2179 | 0.90 | 0.9713 | 0.2179 | 5.3980 | 1.2108 | +inf | 10 | 10 | 10 |
| cut corners | 3.8928 | 4.6975 | 1.2067 | 0.90 | 0.8047 | 0.2067 | 4.9030 | 1.2595 | +inf | 10 | 10 | 10 |
| break the mold | 3.9046 | 4.6993 | 1.2035 | 0.90 | 0.7947 | 0.2035 | 4.9341 | 1.2637 | +inf | 10 | 10 | 10 |
| clear the air | 3.8555 | 4.5911 | 1.1908 | 1.00 | 0.7356 | 0.1908 | 4.5664 | 1.1844 | 4.5664 | 10 | 10 | 10 |
| turn tail | 4.7188 | 5.5523 | 1.1766 | 0.90 | 0.8334 | 0.1766 | 5.7036 | 1.2087 | +inf | 10 | 10 | 10 |
| have a ball | 3.7841 | 4.4200 | 1.1681 | 0.90 | 0.6359 | 0.1681 | 4.9362 | 1.3045 | +inf | 10 | 10 | 10 |
| pull strings | 4.0875 | 4.7409 | 1.1598 | 1.00 | 0.6534 | 0.1598 | 5.0264 | 1.2297 | 5.0264 | 10 | 10 | 10 |
| spill the beans | 3.8478 | 4.4628 | 1.1598 | 1.00 | 0.6149 | 0.1598 | 4.7741 | 1.2407 | 4.7741 | 10 | 10 | 10 |
| lead the field | 4.5053 | 5.1472 | 1.1425 | 1.00 | 0.6419 | 0.1425 | 5.3777 | 1.1936 | 5.3777 | 10 | 10 | 10 |
| mean business | 4.7222 | 5.3268 | 1.1280 | 0.90 | 0.6046 | 0.1280 | 5.8690 | 1.2429 | +inf | 10 | 10 | 10 |
| lose ground | 4.9992 | 5.6325 | 1.1267 | 0.80 | 0.6334 | 0.1267 | 6.4719 | 1.2946 | +inf | 10 | 10 | 10 |
| run the show | 4.6056 | 5.0664 | 1.1001 | 0.70 | 0.4608 | 0.1001 | 6.7057 | 1.4560 | +inf | 10 | 10 | 10 |
| raise hell | 4.8775 | 5.3431 | 1.0955 | 1.00 | 0.4656 | 0.0955 | 6.1856 | 1.2682 | 6.1856 | 10 | 10 | 10 |
| get the sack | 4.3756 | 4.7613 | 1.0882 | 0.80 | 0.3858 | 0.0882 | 6.5776 | 1.5033 | +inf | 10 | 10 | 10 |
| make waves | 4.6618 | 5.0327 | 1.0796 | 0.30 | 0.3709 | 0.0796 | 8.0078 | 1.7177 | +inf | 10 | 10 | 10 |

#### Non-idioms (sorted by H_u/H, descending)

| phrase | H(p) | H_u | H_u/H | syn_frac | H_s^log | H_s^log/H | H_s^reg | H_s^reg/H | H_s (orig) | n_idiom | n_head | n_non |
|---|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|
| call the police | 4.1070 | 4.5517 | 1.1083 | 1.00 | 0.4447 | 0.1083 | 5.4245 | 1.3208 | 5.4245 | 10 | 10 | 10 |
| clear the table | 4.3338 | 4.7998 | 1.1075 | 0.90 | 0.4661 | 0.1075 | 5.6866 | 1.3122 | +inf | 10 | 10 | 10 |
| cut hair | 4.3356 | 4.7209 | 1.0889 | 0.80 | 0.3853 | 0.0889 | 6.3286 | 1.4597 | +inf | 10 | 10 | 10 |
| eat the apple | 4.7806 | 5.1885 | 1.0853 | 0.70 | 0.4079 | 0.0853 | 6.8337 | 1.4295 | +inf | 10 | 10 | 10 |
| break the window | 4.5739 | 4.9024 | 1.0718 | 0.90 | 0.3285 | 0.0718 | 6.5298 | 1.4276 | +inf | 10 | 10 | 10 |
| build the boat | 4.5861 | 4.8608 | 1.0599 | 0.80 | 0.2747 | 0.0599 | 6.9102 | 1.5068 | +inf | 10 | 10 | 10 |
| tie knots | 4.8559 | 5.1287 | 1.0562 | 0.90 | 0.2728 | 0.0562 | 6.9311 | 1.4273 | +inf | 10 | 10 | 10 |
| throw a ball | 4.4664 | 4.7069 | 1.0538 | 0.70 | 0.2404 | 0.0538 | 6.7675 | 1.5152 | +inf | 10 | 10 | 10 |
| turn dials | 5.4570 | 5.7374 | 1.0514 | 0.80 | 0.2804 | 0.0514 | 7.4943 | 1.3733 | +inf | 10 | 10 | 10 |
| raise children | 5.0956 | 5.3556 | 1.0510 | 0.70 | 0.2600 | 0.0510 | 7.7953 | 1.5298 | +inf | 10 | 10 | 10 |
| get a present | 4.6945 | 4.8967 | 1.0431 | 0.50 | 0.2022 | 0.0431 | 7.6837 | 1.6368 | +inf | 10 | 10 | 10 |
| see the show | 4.8320 | 5.0375 | 1.0425 | 0.50 | 0.2054 | 0.0425 | 8.0553 | 1.6671 | +inf | 10 | 10 | 10 |
| make lunch | 5.0007 | 5.2123 | 1.0423 | 0.30 | 0.2116 | 0.0423 | 8.4928 | 1.6983 | +inf | 10 | 10 | 10 |
| spill the water | 4.3605 | 4.5225 | 1.0372 | 0.50 | 0.1621 | 0.0372 | 7.3266 | 1.6802 | +inf | 10 | 10 | 10 |
| remember details | 5.3221 | 5.5042 | 1.0342 | 0.80 | 0.1822 | 0.0342 | 8.1050 | 1.5229 | +inf | 10 | 10 | 10 |
| lose keys | 5.7128 | 5.8391 | 1.0221 | 0.40 | 0.1263 | 0.0221 | 9.1725 | 1.6056 | +inf | 10 | 10 | 10 |
| strike a drum | 6.0341 | 6.1509 | 1.0194 | 0.40 | 0.1169 | 0.0194 | 9.4169 | 1.5606 | +inf | 10 | 10 | 10 |
| lead the meeting | 5.5434 | 5.6335 | 1.0163 | 0.30 | 0.0901 | 0.0163 | 9.1726 | 1.6547 | +inf | 10 | 10 | 10 |