# H_u / H_s analysis — llama3.1-8b · medial · joint

Idiom vs. parallel literal-VP (non-idiom) datasets. All entropies are uniform MC averages over each phrase's observed contexts; scores are unnormalized geometric-mean (or joint) per-token LM probabilities. See `README.md` / `code/SCORING_MATH.md` for the derivation.

> **How to read these magnitudes** (directions, why H_s is +inf, the new finite synergy metrics): see [`INTERPRETATION.md`](../INTERPRETATION.md). Quick key — ↑`H_u/H(p)`, ↑`syn_frac`, ↑`H_s^log` mean **more** synergy; ↑`H_s^reg` and the original ↑`H_s` mean **less** synergy.

## Configuration

| field | idioms run | non-idioms run |
|---|---|---|
| model | meta-llama/Llama-3.1-8B | meta-llama/Llama-3.1-8B |
| reduction | joint | joint |
| medial_only | True | True |
| dtype | bfloat16 | bfloat16 |
| num_idioms | 18 | 18 |
| dataset | /home/prada/PID_evaluation/data/dataset.tsv | /home/prada/PID_evaluation/data/nonidioms_dataset.tsv |

Bound `H_u + H_s ≥ 2H(p) + 2log2 ≥ H(p)` holds for 18/18 idioms and 18/18 non-idioms.

`H_s = +inf` (≥1 non-synergistic slot) for 3/18 idioms and 16/18 non-idioms. A *finite* H_s means **every** context of that phrase is synergistic (p > max(q,r) everywhere).

## Per-metric summary (idioms vs non-idioms)

Means and 95% bootstrap CIs (20k resamples, percentile method, phrase-level). Non-finite values are dropped per metric before bootstrapping; the drop count is shown.

### H(p)

*base entropy, uniform MC average over contexts (nats). ↓ smaller = idiom more concentrated*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 52.8500 | 53.4878 | 40.0568 | 62.3579 | [50.2466, 55.2550] |
| non-idioms | 18 | 0 | 60.1445 | 60.1995 | 48.1474 | 74.8238 | [57.2243, 63.1298] |

**Cross-dataset gap** (idioms − non-idioms): Δ = -7.2944  (95% CI [-11.1818, -3.4995]) → idioms < non-idioms, **significant**.

### H_u

*unique / redundant entropy = -log min{p, max(q,r)} (nats); >= H(p)*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 61.4343 | 61.9483 | 47.5051 | 74.8390 | [58.4333, 64.4391] |
| non-idioms | 18 | 0 | 62.3194 | 63.3907 | 51.5787 | 76.9897 | [59.4424, 65.2237] |

**Cross-dataset gap** (idioms − non-idioms): Δ = -0.8851  (95% CI [-4.9835, 3.2544]) → idioms < non-idioms, not significant.

### H_u / H(p)

*unique-information ratio (>= 1). ↑ bigger = MORE synergy. THE headline metric*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 1.1642 | 1.1508 | 1.0618 | 1.2697 | [1.1349, 1.1943] |
| non-idioms | 18 | 0 | 1.0371 | 1.0278 | 1.0000 | 1.1098 | [1.0247, 1.0508] |

**Cross-dataset gap** (idioms − non-idioms): Δ = 0.1271  (95% CI [0.0949, 0.1602]) → idioms > non-idioms, **significant**.

### syn_frac

*synergy coverage in [0,1] = frac. of contexts with p>m. ↑ bigger = MORE synergy (most intuitive)*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 0.9444 | 1.0000 | 0.6000 | 1.0000 | [0.8778, 1.0000] |
| non-idioms | 18 | 0 | 0.5778 | 0.6000 | 0.0000 | 1.0000 | [0.4556, 0.7000] |

**Cross-dataset gap** (idioms − non-idioms): Δ = 0.3667  (95% CI [0.2333, 0.5000]) → idioms > non-idioms, **significant**.

### H_s^log

*log-space synergy = mean max{0, log p - log m} (nats). ↑ bigger = MORE synergy; finite always*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 8.5843 | 7.6329 | 3.3784 | 14.9071 | [7.0869, 10.1209] |
| non-idioms | 18 | 0 | 2.1750 | 1.9212 | 0.0000 | 5.9146 | [1.4754, 2.9209] |

**Cross-dataset gap** (idioms − non-idioms): Δ = 6.4093  (95% CI [4.7327, 8.1159]) → idioms > non-idioms, **significant**.

### H_s^log / H(p)

*log-space synergy ratio. ↑ bigger = MORE synergy*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 0.1642 | 0.1508 | 0.0618 | 0.2697 | [0.1349, 0.1943] |
| non-idioms | 18 | 0 | 0.0371 | 0.0278 | 0.0000 | 0.1098 | [0.0247, 0.0508] |

**Cross-dataset gap** (idioms − non-idioms): Δ = 0.1271  (95% CI [0.0949, 0.1602]) → idioms > non-idioms, **significant**.

### H_s^log signed

*signed log-space synergy = mean(log p - log m); can be negative (net anti-synergistic)*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 8.3453 | 7.6329 | 2.3986 | 14.9071 | [6.6909, 10.0184] |
| non-idioms | 18 | 0 | 0.4121 | 0.2359 | -3.5141 | 5.9146 | [-0.8073, 1.6857] |

**Cross-dataset gap** (idioms − non-idioms): Δ = 7.9332  (95% CI [5.8691, 10.0493]) → idioms > non-idioms, **significant**.

### H_s^reg

*regularized H_s (eps-floored, finite, continuous). ↑ bigger = LESS synergy*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 53.1524 | 53.5627 | 40.0615 | 62.3593 | [50.5165, 55.5745] |
| non-idioms | 18 | 0 | 62.2120 | 61.8917 | 49.9935 | 76.7427 | [59.2795, 65.2244] |

**Cross-dataset gap** (idioms − non-idioms): Δ = -9.0596  (95% CI [-13.0067, -5.2245]) → idioms < non-idioms, **significant**.

### H_s^reg / H(p)

*regularized synergy ratio (>= 1). ↑ bigger = LESS synergy*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 1.0057 | 1.0002 | 1.0000 | 1.0383 | [1.0009, 1.0115] |
| non-idioms | 18 | 0 | 1.0349 | 1.0346 | 1.0004 | 1.0865 | [1.0254, 1.0449] |

**Cross-dataset gap** (idioms − non-idioms): Δ = -0.0292  (95% CI [-0.0403, -0.0182]) → idioms < non-idioms, **significant**.

### H_s (original)

*synergy entropy = -log max{0, p - max(q,r)} (nats); +inf if ANY slot non-synergistic (mostly +inf)*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 15 | 3 | 52.8261 | 52.8832 | 40.0615 | 62.3593 | [49.8945, 55.5695] |
| non-idioms | 2 | 16 | 57.0311 | 57.0311 | 53.9056 | 60.1565 | [53.9056, 60.1565] |

**Cross-dataset gap** (idioms − non-idioms): Δ = -4.2049  (95% CI [-9.2139, 0.8122]) → idioms < non-idioms, not significant.

### H_s / H(p)

*original synergy ratio (mostly +inf; use H_s^log or syn_frac instead)*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 15 | 3 | 1.0008 | 1.0001 | 1.0000 | 1.0051 | [1.0002, 1.0016] |
| non-idioms | 2 | 16 | 1.0006 | 1.0006 | 1.0004 | 1.0008 | [1.0004, 1.0008] |

**Cross-dataset gap** (idioms − non-idioms): Δ = 0.0002  (95% CI [-0.0004, 0.0011]) → idioms > non-idioms, not significant.

## Per-phrase detail

#### Idioms (sorted by H_u/H, descending)

| phrase | H(p) | H_u | H_u/H | syn_frac | H_s^log | H_s^log/H | H_s^reg | H_s^reg/H | H_s (orig) | n_idiom | n_head | n_non |
|---|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|
| break the mold | 55.2785 | 70.1856 | 1.2697 | 1.00 | 14.9071 | 0.2697 | 55.2785 | 1.0000 | 55.2785 | 5 | 5 | 5 |
| cut corners | 47.0226 | 58.8339 | 1.2512 | 1.00 | 11.8113 | 0.2512 | 47.0227 | 1.0000 | 47.0227 | 5 | 5 | 5 |
| strike a chord | 44.3236 | 55.3918 | 1.2497 | 1.00 | 11.0682 | 0.2497 | 44.3237 | 1.0000 | 44.3237 | 5 | 5 | 5 |
| call the shots | 50.9307 | 63.1955 | 1.2408 | 1.00 | 12.2649 | 0.2408 | 50.9307 | 1.0000 | 50.9307 | 5 | 5 | 5 |
| bite the dust | 51.2297 | 63.5380 | 1.2403 | 1.00 | 12.3083 | 0.2403 | 51.2300 | 1.0000 | 51.2300 | 5 | 5 | 5 |
| turn tail | 62.3579 | 74.8390 | 1.2002 | 1.00 | 12.4811 | 0.2002 | 62.3593 | 1.0000 | 62.3593 | 5 | 5 | 5 |
| clear the air | 59.2380 | 70.5546 | 1.1910 | 1.00 | 11.3166 | 0.1910 | 59.2387 | 1.0000 | 59.2387 | 5 | 5 | 5 |
| rock the boat | 40.0568 | 47.5051 | 1.1859 | 1.00 | 7.4483 | 0.1859 | 40.0615 | 1.0001 | 40.0615 | 5 | 5 | 5 |
| spill the beans | 54.2353 | 62.7136 | 1.1563 | 1.00 | 8.4783 | 0.1563 | 54.2422 | 1.0001 | 54.2422 | 5 | 5 | 5 |
| have a ball | 50.7636 | 58.1358 | 1.1452 | 1.00 | 7.3722 | 0.1452 | 50.7780 | 1.0003 | 50.7780 | 5 | 5 | 5 |
| lose ground | 58.2284 | 66.0458 | 1.1343 | 1.00 | 7.8174 | 0.1343 | 58.5263 | 1.0051 | 58.5263 | 5 | 5 | 5 |
| pull strings | 56.8018 | 63.8537 | 1.1241 | 1.00 | 7.0519 | 0.1241 | 56.8518 | 1.0009 | 56.8518 | 5 | 5 | 5 |
| make waves | 48.1887 | 54.0558 | 1.1218 | 0.60 | 5.8671 | 0.1218 | 50.0331 | 1.0383 | +inf | 5 | 5 | 5 |
| run the show | 56.7012 | 63.2264 | 1.1151 | 0.60 | 6.5252 | 0.1151 | 58.5443 | 1.0325 | +inf | 5 | 5 | 5 |
| lead the field | 52.1864 | 57.3356 | 1.0987 | 1.00 | 5.1492 | 0.0987 | 52.2550 | 1.0013 | 52.2550 | 5 | 5 | 5 |
| raise hell | 56.3097 | 61.1829 | 1.0865 | 1.00 | 4.8732 | 0.0865 | 56.4103 | 1.0018 | 56.4103 | 5 | 5 | 5 |
| mean business | 52.7402 | 57.1388 | 1.0834 | 1.00 | 4.3986 | 0.0834 | 52.8832 | 1.0027 | 52.8832 | 5 | 5 | 5 |
| get the sack | 54.7076 | 58.0860 | 1.0618 | 0.80 | 3.3784 | 0.0618 | 55.7733 | 1.0195 | +inf | 5 | 5 | 5 |

#### Non-idioms (sorted by H_u/H, descending)

| phrase | H(p) | H_u | H_u/H | syn_frac | H_s^log | H_s^log/H | H_s^reg | H_s^reg/H | H_s (orig) | n_idiom | n_head | n_non |
|---|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|
| cut hair | 53.8866 | 59.8012 | 1.1098 | 1.00 | 5.9146 | 0.1098 | 53.9056 | 1.0004 | 53.9056 | 5 | 5 | 5 |
| call the police | 58.7349 | 63.3548 | 1.0787 | 0.80 | 4.6199 | 0.0787 | 59.9234 | 1.0202 | +inf | 5 | 5 | 5 |
| build the boat | 48.1474 | 51.5787 | 1.0713 | 0.60 | 3.4312 | 0.0713 | 49.9935 | 1.0383 | +inf | 5 | 5 | 5 |
| clear the table | 60.1079 | 63.4389 | 1.0554 | 1.00 | 3.3310 | 0.0554 | 60.1565 | 1.0008 | 60.1565 | 5 | 5 | 5 |
| break the window | 63.1806 | 66.5088 | 1.0527 | 0.80 | 3.3282 | 0.0527 | 64.1334 | 1.0151 | +inf | 5 | 5 | 5 |
| tie knots | 60.2910 | 63.4265 | 1.0520 | 0.80 | 3.1355 | 0.0520 | 61.3268 | 1.0172 | +inf | 5 | 5 | 5 |
| raise children | 62.3520 | 65.4813 | 1.0502 | 0.40 | 3.1293 | 0.0502 | 65.1155 | 1.0443 | +inf | 5 | 5 | 5 |
| see the show | 59.6914 | 61.8365 | 1.0359 | 0.40 | 2.1451 | 0.0359 | 62.4566 | 1.0463 | +inf | 5 | 5 | 5 |
| turn dials | 74.8238 | 76.9897 | 1.0289 | 0.60 | 2.1659 | 0.0289 | 76.7427 | 1.0256 | +inf | 5 | 5 | 5 |
| make lunch | 51.2100 | 52.5761 | 1.0267 | 0.40 | 1.3661 | 0.0267 | 53.9897 | 1.0543 | +inf | 5 | 5 | 5 |
| eat the apple | 64.5349 | 66.2321 | 1.0263 | 0.60 | 1.6972 | 0.0263 | 66.4478 | 1.0296 | +inf | 5 | 5 | 5 |
| lose keys | 68.4999 | 70.1421 | 1.0240 | 0.80 | 1.6422 | 0.0240 | 69.9631 | 1.0214 | +inf | 5 | 5 | 5 |
| get a present | 54.2917 | 55.4369 | 1.0211 | 0.40 | 1.1452 | 0.0211 | 57.0836 | 1.0514 | +inf | 5 | 5 | 5 |
| spill the water | 60.5070 | 61.4500 | 1.0156 | 0.60 | 0.9429 | 0.0156 | 62.5300 | 1.0334 | +inf | 5 | 5 | 5 |
| throw a ball | 57.7731 | 58.5317 | 1.0131 | 0.60 | 0.7586 | 0.0131 | 59.8385 | 1.0357 | +inf | 5 | 5 | 5 |
| lead the meeting | 65.4610 | 65.7824 | 1.0049 | 0.40 | 0.3215 | 0.0049 | 68.5804 | 1.0477 | +inf | 5 | 5 | 5 |
| strike a drum | 65.8717 | 65.9467 | 1.0011 | 0.20 | 0.0751 | 0.0011 | 69.7881 | 1.0595 | +inf | 5 | 5 | 5 |
| remember details | 53.2356 | 53.2356 | 1.0000 | 0.00 | 0.0000 | 0.0000 | 57.8408 | 1.0865 | +inf | 5 | 5 | 5 |