# H_u / H_s analysis — qwen3-8b · medial · joint

Idiom vs. parallel literal-VP (non-idiom) datasets. All entropies are uniform MC averages over each phrase's observed contexts; scores are unnormalized geometric-mean (or joint) per-token LM probabilities. See `README.md` / `code/SCORING_MATH.md` for the derivation.

> **How to read these magnitudes** (directions, why H_s is +inf, the new finite synergy metrics): see [`INTERPRETATION.md`](../INTERPRETATION.md). Quick key — ↑`H_u/H(p)`, ↑`syn_frac`, ↑`H_s^log` mean **more** synergy; ↑`H_s^reg` and the original ↑`H_s` mean **less** synergy.

## Configuration

| field | idioms run | non-idioms run |
|---|---|---|
| model | Qwen/Qwen3-8B | Qwen/Qwen3-8B |
| reduction | joint | joint |
| medial_only | True | True |
| dtype | bfloat16 | bfloat16 |
| num_idioms | 18 | 18 |
| dataset | /home/prada/PID_evaluation/data/dataset.tsv | /home/prada/PID_evaluation/data/nonidioms_dataset.tsv |

Bound `H_u + H_s ≥ 2H(p) + 2log2 ≥ H(p)` holds for 18/18 idioms and 18/18 non-idioms.

`H_s = +inf` (≥1 non-synergistic slot) for 9/18 idioms and 17/18 non-idioms. A *finite* H_s means **every** context of that phrase is synergistic (p > max(q,r) everywhere).

## Per-metric summary (idioms vs non-idioms)

Means and 95% bootstrap CIs (20k resamples, percentile method, phrase-level). Non-finite values are dropped per metric before bootstrapping; the drop count is shown.

### H(p)

*base entropy, uniform MC average over contexts (nats). ↓ smaller = idiom more concentrated*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 53.9408 | 53.7758 | 41.6363 | 67.0872 | [50.9665, 56.8620] |
| non-idioms | 18 | 0 | 60.2107 | 61.2629 | 46.7000 | 71.8342 | [57.1512, 63.1402] |

**Cross-dataset gap** (idioms − non-idioms): Δ = -6.2699  (95% CI [-10.3977, -2.0752]) → idioms < non-idioms, **significant**.

### H_u

*unique / redundant entropy = -log min{p, max(q,r)} (nats); >= H(p)*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 61.9966 | 62.3430 | 46.3793 | 75.9778 | [58.8059, 65.1325] |
| non-idioms | 18 | 0 | 62.5098 | 64.0187 | 50.7058 | 75.1086 | [59.4011, 65.4793] |

**Cross-dataset gap** (idioms − non-idioms): Δ = -0.5133  (95% CI [-4.7768, 3.8394]) → idioms < non-idioms, not significant.

### H_u / H(p)

*unique-information ratio (>= 1). ↑ bigger = MORE synergy. THE headline metric*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 1.1527 | 1.1232 | 1.0703 | 1.2941 | [1.1200, 1.1878] |
| non-idioms | 18 | 0 | 1.0388 | 1.0312 | 1.0009 | 1.0989 | [1.0265, 1.0524] |

**Cross-dataset gap** (idioms − non-idioms): Δ = 0.1139  (95% CI [0.0786, 0.1521]) → idioms > non-idioms, **significant**.

### syn_frac

*synergy coverage in [0,1] = frac. of contexts with p>m. ↑ bigger = MORE synergy (most intuitive)*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 0.8556 | 0.9000 | 0.6000 | 1.0000 | [0.7778, 0.9222] |
| non-idioms | 18 | 0 | 0.5444 | 0.6000 | 0.2000 | 1.0000 | [0.4222, 0.6667] |

**Cross-dataset gap** (idioms − non-idioms): Δ = 0.3111  (95% CI [0.1667, 0.4556]) → idioms > non-idioms, **significant**.

### H_s^log

*log-space synergy = mean max{0, log p - log m} (nats). ↑ bigger = MORE synergy; finite always*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 8.0558 | 7.3765 | 3.8752 | 15.5114 | [6.4916, 9.7062] |
| non-idioms | 18 | 0 | 2.2992 | 1.7144 | 0.0462 | 5.8992 | [1.5893, 3.0547] |

**Cross-dataset gap** (idioms − non-idioms): Δ = 5.7566  (95% CI [4.0135, 7.5770]) → idioms > non-idioms, **significant**.

### H_s^log / H(p)

*log-space synergy ratio. ↑ bigger = MORE synergy*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 0.1527 | 0.1232 | 0.0703 | 0.2941 | [0.1200, 0.1878] |
| non-idioms | 18 | 0 | 0.0388 | 0.0312 | 0.0009 | 0.0989 | [0.0265, 0.0524] |

**Cross-dataset gap** (idioms − non-idioms): Δ = 0.1139  (95% CI [0.0786, 0.1521]) → idioms > non-idioms, **significant**.

### H_s^log signed

*signed log-space synergy = mean(log p - log m); can be negative (net anti-synergistic)*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 7.4960 | 7.3765 | -1.3449 | 15.5114 | [5.6056, 9.3990] |
| non-idioms | 18 | 0 | 0.4464 | -0.3664 | -2.8207 | 5.6473 | [-0.7551, 1.7131] |

**Cross-dataset gap** (idioms − non-idioms): Δ = 7.0496  (95% CI [4.7590, 9.3263]) → idioms > non-idioms, **significant**.

### H_s^reg

*regularized H_s (eps-floored, finite, continuous). ↑ bigger = LESS synergy*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 54.6720 | 54.2188 | 42.7059 | 68.0087 | [51.6290, 57.6825] |
| non-idioms | 18 | 0 | 62.4614 | 62.9464 | 48.5436 | 73.4888 | [59.5173, 65.2306] |

**Cross-dataset gap** (idioms − non-idioms): Δ = -7.7894  (95% CI [-11.8579, -3.6549]) → idioms < non-idioms, **significant**.

### H_s^reg / H(p)

*regularized synergy ratio (>= 1). ↑ bigger = LESS synergy*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 1.0135 | 1.0117 | 1.0000 | 1.0405 | [1.0074, 1.0200] |
| non-idioms | 18 | 0 | 1.0385 | 1.0367 | 1.0024 | 1.0781 | [1.0289, 1.0484] |

**Cross-dataset gap** (idioms − non-idioms): Δ = -0.0251  (95% CI [-0.0367, -0.0133]) → idioms < non-idioms, **significant**.

### H_s (original)

*synergy entropy = -log max{0, p - max(q,r)} (nats); +inf if ANY slot non-synergistic (mostly +inf)*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 9 | 9 | 52.5033 | 53.5921 | 43.8912 | 58.4471 | [49.2373, 55.6036] |
| non-idioms | 1 | 17 | 62.2472 | 62.2472 | 62.2472 | 62.2472 | [62.2472, 62.2472] |

**Cross-dataset gap** (idioms − non-idioms): Δ = -9.7439  (95% CI [-12.9840, -6.6622]) → idioms < non-idioms, **significant**.

### H_s / H(p)

*original synergy ratio (mostly +inf; use H_s^log or syn_frac instead)*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 9 | 9 | 1.0016 | 1.0001 | 1.0000 | 1.0097 | [1.0001, 1.0038] |
| non-idioms | 1 | 17 | 1.0024 | 1.0024 | 1.0024 | 1.0024 | [1.0024, 1.0024] |

**Cross-dataset gap** (idioms − non-idioms): Δ = -0.0008  (95% CI [-0.0023, 0.0014]) → idioms < non-idioms, not significant.

## Per-phrase detail

#### Idioms (sorted by H_u/H, descending)

| phrase | H(p) | H_u | H_u/H | syn_frac | H_s^log | H_s^log/H | H_s^reg | H_s^reg/H | H_s (orig) | n_idiom | n_head | n_non |
|---|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|
| strike a chord | 43.8912 | 56.7999 | 1.2941 | 1.00 | 12.9087 | 0.2941 | 43.8912 | 1.0000 | 43.8912 | 5 | 5 | 5 |
| break the mold | 53.9715 | 69.4829 | 1.2874 | 1.00 | 15.5114 | 0.2874 | 53.9715 | 1.0000 | 53.9715 | 5 | 5 | 5 |
| cut corners | 46.6895 | 59.8262 | 1.2814 | 1.00 | 13.1368 | 0.2814 | 46.6953 | 1.0001 | 46.6953 | 5 | 5 | 5 |
| call the shots | 53.5802 | 65.5193 | 1.2228 | 1.00 | 11.9391 | 0.2228 | 53.5921 | 1.0002 | 53.5921 | 5 | 5 | 5 |
| spill the beans | 51.8955 | 62.4203 | 1.2028 | 0.80 | 10.5249 | 0.2028 | 52.8165 | 1.0177 | +inf | 5 | 5 | 5 |
| bite the dust | 47.9287 | 57.2660 | 1.1948 | 1.00 | 9.3373 | 0.1948 | 47.9325 | 1.0001 | 47.9325 | 5 | 5 | 5 |
| clear the air | 62.5644 | 72.1593 | 1.1534 | 0.80 | 9.5949 | 0.1534 | 63.4884 | 1.0148 | +inf | 5 | 5 | 5 |
| lose ground | 58.4404 | 66.6070 | 1.1397 | 1.00 | 8.1666 | 0.1397 | 58.4471 | 1.0001 | 58.4471 | 5 | 5 | 5 |
| turn tail | 67.0872 | 75.9778 | 1.1325 | 0.80 | 8.8906 | 0.1325 | 68.0087 | 1.0137 | +inf | 5 | 5 | 5 |
| rock the boat | 41.6363 | 46.3793 | 1.1139 | 0.80 | 4.7430 | 0.1139 | 42.7059 | 1.0257 | +inf | 5 | 5 | 5 |
| pull strings | 58.2272 | 64.8137 | 1.1131 | 1.00 | 6.5865 | 0.1131 | 58.2458 | 1.0003 | 58.2458 | 5 | 5 | 5 |
| get the sack | 52.6146 | 58.2279 | 1.1067 | 0.60 | 5.6133 | 0.1067 | 54.4662 | 1.0352 | +inf | 5 | 5 | 5 |
| have a ball | 56.3443 | 62.2656 | 1.1051 | 0.80 | 5.9214 | 0.1051 | 57.2695 | 1.0164 | +inf | 5 | 5 | 5 |
| run the show | 58.3229 | 63.3454 | 1.0861 | 0.60 | 5.0224 | 0.0861 | 60.2042 | 1.0323 | +inf | 5 | 5 | 5 |
| mean business | 52.8547 | 57.3056 | 1.0842 | 1.00 | 4.4509 | 0.0842 | 53.0699 | 1.0041 | 53.0699 | 5 | 5 | 5 |
| lead the field | 56.1382 | 60.7719 | 1.0825 | 1.00 | 4.6337 | 0.0825 | 56.6844 | 1.0097 | 56.6844 | 5 | 5 | 5 |
| make waves | 49.7432 | 53.6185 | 1.0779 | 0.60 | 3.8752 | 0.0779 | 51.7593 | 1.0405 | +inf | 5 | 5 | 5 |
| raise hell | 59.0044 | 63.1516 | 1.0703 | 0.60 | 4.1472 | 0.0703 | 60.8471 | 1.0312 | +inf | 5 | 5 | 5 |

#### Non-idioms (sorted by H_u/H, descending)

| phrase | H(p) | H_u | H_u/H | syn_frac | H_s^log | H_s^log/H | H_s^reg | H_s^reg/H | H_s (orig) | n_idiom | n_head | n_non |
|---|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|
| call the police | 59.6542 | 65.5534 | 1.0989 | 0.80 | 5.8992 | 0.0989 | 60.5772 | 1.0155 | +inf | 5 | 5 | 5 |
| build the boat | 46.7000 | 51.1941 | 1.0962 | 0.60 | 4.4941 | 0.0962 | 48.5436 | 1.0395 | +inf | 5 | 5 | 5 |
| break the window | 61.7442 | 65.7247 | 1.0645 | 0.80 | 3.9806 | 0.0645 | 63.0118 | 1.0205 | +inf | 5 | 5 | 5 |
| tie knots | 60.0650 | 63.5749 | 1.0584 | 0.60 | 3.5099 | 0.0584 | 61.9091 | 1.0307 | +inf | 5 | 5 | 5 |
| cut hair | 60.2177 | 63.6838 | 1.0576 | 0.80 | 3.4661 | 0.0576 | 61.4127 | 1.0198 | +inf | 5 | 5 | 5 |
| clear the table | 62.0990 | 65.5198 | 1.0551 | 1.00 | 3.4208 | 0.0551 | 62.2472 | 1.0024 | 62.2472 | 5 | 5 | 5 |
| raise children | 62.5933 | 65.5999 | 1.0480 | 0.80 | 3.0066 | 0.0480 | 63.8063 | 1.0194 | +inf | 5 | 5 | 5 |
| turn dials | 71.8342 | 75.1086 | 1.0456 | 0.80 | 3.2744 | 0.0456 | 72.8498 | 1.0141 | +inf | 5 | 5 | 5 |
| make lunch | 48.8996 | 50.7058 | 1.0369 | 0.40 | 1.8063 | 0.0369 | 51.6674 | 1.0566 | +inf | 5 | 5 | 5 |
| see the show | 63.7362 | 65.3588 | 1.0255 | 0.20 | 1.6226 | 0.0255 | 67.4204 | 1.0578 | +inf | 5 | 5 | 5 |
| eat the apple | 60.9388 | 62.4445 | 1.0247 | 0.40 | 1.5056 | 0.0247 | 63.7119 | 1.0455 | +inf | 5 | 5 | 5 |
| get a present | 53.9577 | 55.2301 | 1.0236 | 0.20 | 1.2724 | 0.0236 | 57.6422 | 1.0683 | +inf | 5 | 5 | 5 |
| throw a ball | 63.1881 | 64.3535 | 1.0184 | 0.40 | 1.1653 | 0.0184 | 65.9946 | 1.0444 | +inf | 5 | 5 | 5 |
| lose keys | 71.7230 | 72.7891 | 1.0149 | 0.80 | 1.0661 | 0.0149 | 73.4888 | 1.0246 | +inf | 5 | 5 | 5 |
| lead the meeting | 64.4478 | 65.1632 | 1.0111 | 0.60 | 0.7155 | 0.0111 | 66.6350 | 1.0339 | +inf | 5 | 5 | 5 |
| strike a drum | 61.5869 | 62.2141 | 1.0102 | 0.20 | 0.6272 | 0.0102 | 65.2799 | 1.0600 | +inf | 5 | 5 | 5 |
| spill the water | 59.1802 | 59.6862 | 1.0085 | 0.20 | 0.5060 | 0.0085 | 62.8810 | 1.0625 | +inf | 5 | 5 | 5 |
| remember details | 51.2260 | 51.2722 | 1.0009 | 0.20 | 0.0462 | 0.0009 | 55.2259 | 1.0781 | +inf | 5 | 5 | 5 |