# H_u / H_s analysis — llama3.1-8b · full · joint

Idiom vs. parallel literal-VP (non-idiom) datasets. All entropies are uniform MC averages over each phrase's observed contexts; scores are unnormalized geometric-mean (or joint) per-token LM probabilities. See `README.md` / `code/SCORING_MATH.md` for the derivation.

> **How to read these magnitudes** (directions, why H_s is +inf, the new finite synergy metrics): see [`INTERPRETATION.md`](../INTERPRETATION.md). Quick key — ↑`H_u/H(p)`, ↑`syn_frac`, ↑`H_s^log` mean **more** synergy; ↑`H_s^reg` and the original ↑`H_s` mean **less** synergy.

## Configuration

| field | idioms run | non-idioms run |
|---|---|---|
| model | meta-llama/Llama-3.1-8B | meta-llama/Llama-3.1-8B |
| reduction | joint | joint |
| medial_only | False | False |
| dtype | bfloat16 | bfloat16 |
| num_idioms | 18 | 18 |
| dataset | /home/prada/PID_evaluation/data/dataset.tsv | /home/prada/PID_evaluation/data/nonidioms_dataset.tsv |

Bound `H_u + H_s ≥ 2H(p) + 2log2 ≥ H(p)` holds for 18/18 idioms and 18/18 non-idioms.

`H_s = +inf` (≥1 non-synergistic slot) for 10/18 idioms and 16/18 non-idioms. A *finite* H_s means **every** context of that phrase is synergistic (p > max(q,r) everywhere).

## Per-metric summary (idioms vs non-idioms)

Means and 95% bootstrap CIs (20k resamples, percentile method, phrase-level). Non-finite values are dropped per metric before bootstrapping; the drop count is shown.

### H(p)

*base entropy, uniform MC average over contexts (nats). ↓ smaller = idiom more concentrated*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 54.6661 | 55.4008 | 46.6490 | 60.3954 | [52.7890, 56.4564] |
| non-idioms | 18 | 0 | 62.5048 | 63.3137 | 55.5563 | 72.6323 | [60.4423, 64.6272] |

**Cross-dataset gap** (idioms − non-idioms): Δ = -7.8387  (95% CI [-10.6501, -5.0731]) → idioms < non-idioms, **significant**.

### H_u

*unique / redundant entropy = -log min{p, max(q,r)} (nats); >= H(p)*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 61.5422 | 61.4203 | 54.7067 | 68.8190 | [59.6792, 63.3846] |
| non-idioms | 18 | 0 | 64.1330 | 63.8538 | 55.7439 | 73.9212 | [62.1434, 66.1770] |

**Cross-dataset gap** (idioms − non-idioms): Δ = -2.5908  (95% CI [-5.3041, 0.1513]) → idioms < non-idioms, not significant.

### H_u / H(p)

*unique-information ratio (>= 1). ↑ bigger = MORE synergy. THE headline metric*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 1.1274 | 1.1260 | 1.0413 | 1.1940 | [1.1061, 1.1484] |
| non-idioms | 18 | 0 | 1.0266 | 1.0229 | 1.0001 | 1.0749 | [1.0175, 1.0369] |

**Cross-dataset gap** (idioms − non-idioms): Δ = 0.1007  (95% CI [0.0771, 0.1242]) → idioms > non-idioms, **significant**.

### syn_frac

*synergy coverage in [0,1] = frac. of contexts with p>m. ↑ bigger = MORE synergy (most intuitive)*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 0.8556 | 0.9000 | 0.3000 | 1.0000 | [0.7500, 0.9444] |
| non-idioms | 18 | 0 | 0.4722 | 0.4000 | 0.1000 | 1.0000 | [0.3500, 0.6056] |

**Cross-dataset gap** (idioms − non-idioms): Δ = 0.3833  (95% CI [0.2167, 0.5333]) → idioms > non-idioms, **significant**.

### H_s^log

*log-space synergy = mean max{0, log p - log m} (nats). ↑ bigger = MORE synergy; finite always*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 6.8761 | 7.0852 | 2.4926 | 10.3728 | [5.7838, 7.9566] |
| non-idioms | 18 | 0 | 1.6282 | 1.4446 | 0.0078 | 4.3129 | [1.1015, 2.2056] |

**Cross-dataset gap** (idioms − non-idioms): Δ = 5.2479  (95% CI [4.0159, 6.4792]) → idioms > non-idioms, **significant**.

### H_s^log / H(p)

*log-space synergy ratio. ↑ bigger = MORE synergy*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 0.1274 | 0.1260 | 0.0413 | 0.1940 | [0.1061, 0.1484] |
| non-idioms | 18 | 0 | 0.0266 | 0.0229 | 0.0001 | 0.0749 | [0.0175, 0.0369] |

**Cross-dataset gap** (idioms − non-idioms): Δ = 0.1007  (95% CI [0.0771, 0.1242]) → idioms > non-idioms, **significant**.

### H_s^log signed

*signed log-space synergy = mean(log p - log m); can be negative (net anti-synergistic)*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 6.2743 | 6.9507 | -2.3124 | 10.3728 | [4.6889, 7.7159] |
| non-idioms | 18 | 0 | -0.4869 | -0.7531 | -5.5187 | 4.3129 | [-1.6632, 0.7205] |

**Cross-dataset gap** (idioms − non-idioms): Δ = 6.7612  (95% CI [4.7588, 8.6639]) → idioms > non-idioms, **significant**.

### H_s^reg

*regularized H_s (eps-floored, finite, continuous). ↑ bigger = LESS synergy*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 55.3957 | 56.4098 | 46.6529 | 62.7307 | [53.3421, 57.3617] |
| non-idioms | 18 | 0 | 65.0589 | 65.7149 | 56.1010 | 75.8688 | [62.7707, 67.3705] |

**Cross-dataset gap** (idioms − non-idioms): Δ = -9.6632  (95% CI [-12.7387, -6.6328]) → idioms < non-idioms, **significant**.

### H_s^reg / H(p)

*regularized synergy ratio (>= 1). ↑ bigger = LESS synergy*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 1.0130 | 1.0085 | 1.0000 | 1.0567 | [1.0060, 1.0212] |
| non-idioms | 18 | 0 | 1.0408 | 1.0464 | 1.0009 | 1.0790 | [1.0312, 1.0499] |

**Cross-dataset gap** (idioms − non-idioms): Δ = -0.0278  (95% CI [-0.0395, -0.0152]) → idioms < non-idioms, **significant**.

### H_s (original)

*synergy entropy = -log max{0, p - max(q,r)} (nats); +inf if ANY slot non-synergistic (mostly +inf)*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 8 | 10 | 53.7268 | 53.2326 | 46.6529 | 59.5698 | [50.4088, 56.9606] |
| non-idioms | 2 | 16 | 58.8669 | 58.8669 | 57.6250 | 60.1088 | [57.6250, 60.1088] |

**Cross-dataset gap** (idioms − non-idioms): Δ = -5.1401  (95% CI [-8.7879, -1.4534]) → idioms < non-idioms, **significant**.

### H_s / H(p)

*original synergy ratio (mostly +inf; use H_s^log or syn_frac instead)*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 8 | 10 | 1.0006 | 1.0003 | 1.0000 | 1.0024 | [1.0002, 1.0012] |
| non-idioms | 2 | 16 | 1.0030 | 1.0030 | 1.0009 | 1.0051 | [1.0009, 1.0051] |

**Cross-dataset gap** (idioms − non-idioms): Δ = -0.0024  (95% CI [-0.0048, 0.0001]) → idioms < non-idioms, not significant.

## Per-phrase detail

#### Idioms (sorted by H_u/H, descending)

| phrase | H(p) | H_u | H_u/H | syn_frac | H_s^log | H_s^log/H | H_s^reg | H_s^reg/H | H_s (orig) | n_idiom | n_head | n_non |
|---|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|
| strike a chord | 49.5980 | 59.2207 | 1.1940 | 1.00 | 9.6228 | 0.1940 | 49.6095 | 1.0002 | 49.6095 | 10 | 10 | 10 |
| call the shots | 50.5529 | 59.8466 | 1.1838 | 1.00 | 9.2937 | 0.1838 | 50.5680 | 1.0003 | 50.5680 | 10 | 10 | 10 |
| bite the dust | 58.4462 | 68.8190 | 1.1775 | 1.00 | 10.3728 | 0.1775 | 58.5859 | 1.0024 | 58.5859 | 10 | 10 | 10 |
| rock the boat | 46.6490 | 54.7067 | 1.1727 | 1.00 | 8.0577 | 0.1727 | 46.6529 | 1.0001 | 46.6529 | 10 | 10 | 10 |
| clear the air | 55.8956 | 65.4253 | 1.1705 | 1.00 | 9.5297 | 0.1705 | 55.8973 | 1.0000 | 55.8973 | 10 | 10 | 10 |
| break the mold | 54.9061 | 64.2000 | 1.1693 | 0.90 | 9.2939 | 0.1693 | 55.8395 | 1.0170 | +inf | 10 | 10 | 10 |
| cut corners | 50.1505 | 58.3508 | 1.1635 | 0.90 | 8.2004 | 0.1635 | 50.7486 | 1.0119 | +inf | 10 | 10 | 10 |
| turn tail | 58.2357 | 66.8801 | 1.1484 | 0.90 | 8.6445 | 0.1484 | 58.7674 | 1.0091 | +inf | 10 | 10 | 10 |
| spill the beans | 58.7397 | 66.7057 | 1.1356 | 1.00 | 7.9660 | 0.1356 | 58.7448 | 1.0001 | 58.7448 | 10 | 10 | 10 |
| have a ball | 53.2698 | 59.4741 | 1.1165 | 0.90 | 6.2043 | 0.1165 | 53.7360 | 1.0088 | +inf | 10 | 10 | 10 |
| lead the field | 50.1646 | 55.6578 | 1.1095 | 1.00 | 5.4932 | 0.1095 | 50.1859 | 1.0004 | 50.1859 | 10 | 10 | 10 |
| pull strings | 57.3824 | 63.4485 | 1.1057 | 0.90 | 6.0661 | 0.1057 | 57.8600 | 1.0083 | +inf | 10 | 10 | 10 |
| lose ground | 57.7360 | 63.3456 | 1.0972 | 0.70 | 5.6096 | 0.0972 | 59.1671 | 1.0248 | +inf | 10 | 10 | 10 |
| mean business | 50.9223 | 55.5949 | 1.0918 | 0.90 | 4.6727 | 0.0918 | 51.4184 | 1.0097 | +inf | 10 | 10 | 10 |
| run the show | 54.5450 | 59.0813 | 1.0832 | 0.50 | 4.5362 | 0.0832 | 56.9224 | 1.0436 | +inf | 10 | 10 | 10 |
| raise hell | 59.5071 | 64.1624 | 1.0782 | 1.00 | 4.6553 | 0.0782 | 59.5698 | 1.0011 | 59.5698 | 10 | 10 | 10 |
| make waves | 56.8937 | 59.9528 | 1.0538 | 0.30 | 3.0591 | 0.0538 | 60.1179 | 1.0567 | +inf | 10 | 10 | 10 |
| get the sack | 60.3954 | 62.8879 | 1.0413 | 0.50 | 2.4926 | 0.0413 | 62.7307 | 1.0387 | +inf | 10 | 10 | 10 |

#### Non-idioms (sorted by H_u/H, descending)

| phrase | H(p) | H_u | H_u/H | syn_frac | H_s^log | H_s^log/H | H_s^reg | H_s^reg/H | H_s (orig) | n_idiom | n_head | n_non |
|---|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|
| cut hair | 57.5739 | 61.8868 | 1.0749 | 1.00 | 4.3129 | 0.0749 | 57.6250 | 1.0009 | 57.6250 | 10 | 10 | 10 |
| call the police | 55.5563 | 59.5837 | 1.0725 | 0.90 | 4.0274 | 0.0725 | 56.1010 | 1.0098 | +inf | 10 | 10 | 10 |
| clear the table | 59.8062 | 62.8253 | 1.0505 | 1.00 | 3.0191 | 0.0505 | 60.1088 | 1.0051 | 60.1088 | 10 | 10 | 10 |
| break the window | 64.5973 | 67.1046 | 1.0388 | 0.60 | 2.5073 | 0.0388 | 66.4800 | 1.0291 | +inf | 10 | 10 | 10 |
| build the boat | 58.5402 | 60.5388 | 1.0341 | 0.40 | 1.9986 | 0.0341 | 61.3682 | 1.0483 | +inf | 10 | 10 | 10 |
| tie knots | 65.8388 | 67.9675 | 1.0323 | 0.50 | 2.1287 | 0.0323 | 68.1921 | 1.0357 | +inf | 10 | 10 | 10 |
| raise children | 64.3326 | 66.2032 | 1.0291 | 0.50 | 1.8706 | 0.0291 | 67.0944 | 1.0429 | +inf | 10 | 10 | 10 |
| see the show | 56.5163 | 58.1167 | 1.0283 | 0.40 | 1.6004 | 0.0283 | 59.3602 | 1.0503 | +inf | 10 | 10 | 10 |
| eat the apple | 66.7990 | 68.6682 | 1.0280 | 0.80 | 1.8692 | 0.0280 | 68.1183 | 1.0198 | +inf | 10 | 10 | 10 |
| turn dials | 72.6323 | 73.9212 | 1.0177 | 0.30 | 1.2889 | 0.0177 | 75.8688 | 1.0446 | +inf | 10 | 10 | 10 |
| throw a ball | 62.5466 | 63.4349 | 1.0142 | 0.50 | 0.8883 | 0.0142 | 64.9497 | 1.0384 | +inf | 10 | 10 | 10 |
| get a present | 64.0808 | 64.9907 | 1.0142 | 0.30 | 0.9099 | 0.0142 | 67.3261 | 1.0506 | +inf | 10 | 10 | 10 |
| lose keys | 67.7568 | 68.6819 | 1.0137 | 0.30 | 0.9251 | 0.0137 | 71.0383 | 1.0484 | +inf | 10 | 10 | 10 |
| make lunch | 60.5947 | 61.4121 | 1.0135 | 0.20 | 0.8174 | 0.0135 | 64.2829 | 1.0609 | +inf | 10 | 10 | 10 |
| spill the water | 67.2452 | 67.9246 | 1.0101 | 0.30 | 0.6794 | 0.0101 | 70.5093 | 1.0485 | +inf | 10 | 10 | 10 |
| lead the meeting | 60.8190 | 61.1174 | 1.0049 | 0.20 | 0.2985 | 0.0049 | 64.5660 | 1.0616 | +inf | 10 | 10 | 10 |
| strike a drum | 64.1142 | 64.2727 | 1.0025 | 0.20 | 0.1586 | 0.0025 | 67.9311 | 1.0595 | +inf | 10 | 10 | 10 |
| remember details | 55.7361 | 55.7439 | 1.0001 | 0.10 | 0.0078 | 0.0001 | 60.1402 | 1.0790 | +inf | 10 | 10 | 10 |