# H_u / H_s analysis — qwen3-8b · full · geo

Idiom vs. parallel literal-VP (non-idiom) datasets. All entropies are uniform MC averages over each phrase's observed contexts; scores are unnormalized geometric-mean (or joint) per-token LM probabilities. See `README.md` / `code/SCORING_MATH.md` for the derivation.

> **How to read these magnitudes** (directions, why H_s is +inf, the new finite synergy metrics): see [`INTERPRETATION.md`](../INTERPRETATION.md). Quick key — ↑`H_u/H(p)`, ↑`syn_frac`, ↑`H_s^log` mean **more** synergy; ↑`H_s^reg` and the original ↑`H_s` mean **less** synergy.

## Configuration

| field | idioms run | non-idioms run |
|---|---|---|
| model | Qwen/Qwen3-8B | Qwen/Qwen3-8B |
| reduction | geometric_mean | geometric_mean |
| medial_only | False | False |
| dtype | bfloat16 | bfloat16 |
| num_idioms | 18 | 18 |
| dataset | /home/prada/PID_evaluation/data/dataset.tsv | /home/prada/PID_evaluation/data/nonidioms_dataset.tsv |

Bound `H_u + H_s ≥ 2H(p) + 2log2 ≥ H(p)` holds for 18/18 idioms and 18/18 non-idioms.

`H_s = +inf` (≥1 non-synergistic slot) for 14/18 idioms and 18/18 non-idioms. A *finite* H_s means **every** context of that phrase is synergistic (p > max(q,r) everywhere).

## Per-metric summary (idioms vs non-idioms)

Means and 95% bootstrap CIs (20k resamples, percentile method, phrase-level). Non-finite values are dropped per metric before bootstrapping; the drop count is shown.

### H(p)

*base entropy, uniform MC average over contexts (nats). ↓ smaller = idiom more concentrated*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 4.7172 | 4.7325 | 3.9702 | 5.4387 | [4.4829, 4.9504] |
| non-idioms | 18 | 0 | 5.3205 | 5.1615 | 4.4693 | 6.4391 | [5.0848, 5.5729] |

**Cross-dataset gap** (idioms − non-idioms): Δ = -0.6032  (95% CI [-0.9443, -0.2674]) → idioms < non-idioms, **significant**.

### H_u

*unique / redundant entropy = -log min{p, max(q,r)} (nats); >= H(p)*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 5.3699 | 5.3884 | 4.6754 | 6.0759 | [5.1595, 5.5829] |
| non-idioms | 18 | 0 | 5.5635 | 5.4193 | 4.8624 | 6.5246 | [5.3565, 5.7849] |

**Cross-dataset gap** (idioms − non-idioms): Δ = -0.1936  (95% CI [-0.4997, 0.1066]) → idioms < non-idioms, not significant.

### H_u / H(p)

*unique-information ratio (>= 1). ↑ bigger = MORE synergy. THE headline metric*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 1.1420 | 1.1333 | 1.0547 | 1.2251 | [1.1185, 1.1662] |
| non-idioms | 18 | 0 | 1.0474 | 1.0482 | 1.0131 | 1.0880 | [1.0369, 1.0582] |

**Cross-dataset gap** (idioms − non-idioms): Δ = 0.0946  (95% CI [0.0687, 0.1208]) → idioms > non-idioms, **significant**.

### syn_frac

*synergy coverage in [0,1] = frac. of contexts with p>m. ↑ bigger = MORE synergy (most intuitive)*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 0.8389 | 0.9000 | 0.3000 | 1.0000 | [0.7500, 0.9111] |
| non-idioms | 18 | 0 | 0.5667 | 0.5500 | 0.3000 | 0.9000 | [0.4833, 0.6500] |

**Cross-dataset gap** (idioms − non-idioms): Δ = 0.2722  (95% CI [0.1500, 0.3833]) → idioms > non-idioms, **significant**.

### H_s^log

*log-space synergy = mean max{0, log p - log m} (nats). ↑ bigger = MORE synergy; finite always*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 0.6527 | 0.6296 | 0.2925 | 1.0958 | [0.5592, 0.7525] |
| non-idioms | 18 | 0 | 0.2431 | 0.2539 | 0.0841 | 0.4247 | [0.1949, 0.2915] |

**Cross-dataset gap** (idioms − non-idioms): Δ = 0.4096  (95% CI [0.3022, 0.5199]) → idioms > non-idioms, **significant**.

### H_s^log / H(p)

*log-space synergy ratio. ↑ bigger = MORE synergy*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 0.1420 | 0.1333 | 0.0547 | 0.2251 | [0.1185, 0.1662] |
| non-idioms | 18 | 0 | 0.0474 | 0.0482 | 0.0131 | 0.0880 | [0.0369, 0.0582] |

**Cross-dataset gap** (idioms − non-idioms): Δ = 0.0946  (95% CI [0.0687, 0.1208]) → idioms > non-idioms, **significant**.

### H_s^log signed

*signed log-space synergy = mean(log p - log m); can be negative (net anti-synergistic)*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 0.5886 | 0.6017 | -0.0318 | 1.0958 | [0.4606, 0.7131] |
| non-idioms | 18 | 0 | 0.0516 | 0.1262 | -0.3760 | 0.3966 | [-0.0544, 0.1534] |

**Cross-dataset gap** (idioms − non-idioms): Δ = 0.5370  (95% CI [0.3723, 0.7024]) → idioms > non-idioms, **significant**.

### H_s^reg

*regularized H_s (eps-floored, finite, continuous). ↑ bigger = LESS synergy*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 6.1708 | 5.8585 | 4.7373 | 8.5294 | [5.6976, 6.6717] |
| non-idioms | 18 | 0 | 8.1369 | 7.8979 | 6.3251 | 10.1912 | [7.6559, 8.6287] |

**Cross-dataset gap** (idioms − non-idioms): Δ = -1.9661  (95% CI [-2.6504, -1.2737]) → idioms < non-idioms, **significant**.

### H_s^reg / H(p)

*regularized synergy ratio (>= 1). ↑ bigger = LESS synergy*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 1.3033 | 1.2577 | 1.1155 | 1.6459 | [1.2485, 1.3656] |
| non-idioms | 18 | 0 | 1.5271 | 1.5508 | 1.3758 | 1.7048 | [1.4839, 1.5699] |

**Cross-dataset gap** (idioms − non-idioms): Δ = -0.2237  (95% CI [-0.2943, -0.1465]) → idioms < non-idioms, **significant**.

### H_s (original)

*synergy entropy = -log max{0, p - max(q,r)} (nats); +inf if ANY slot non-synergistic (mostly +inf)*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 4 | 14 | 5.0890 | 5.0938 | 4.7373 | 5.4309 | [4.8414, 5.3317] |
| non-idioms | 0 | 18 | — | — | — | — | (no finite values) |

**Cross-dataset gap**: insufficient finite values in one dataset.

### H_s / H(p)

*original synergy ratio (mostly +inf; use H_s^log or syn_frac instead)*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 4 | 14 | 1.2033 | 1.2149 | 1.1155 | 1.2680 | [1.1466, 1.2540] |
| non-idioms | 0 | 18 | — | — | — | — | (no finite values) |

**Cross-dataset gap**: insufficient finite values in one dataset.

## Per-phrase detail

#### Idioms (sorted by H_u/H, descending)

| phrase | H(p) | H_u | H_u/H | syn_frac | H_s^log | H_s^log/H | H_s^reg | H_s^reg/H | H_s (orig) | n_idiom | n_head | n_non |
|---|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|
| strike a chord | 4.8685 | 5.9643 | 1.2251 | 1.00 | 1.0958 | 0.2251 | 5.4309 | 1.1155 | 5.4309 | 10 | 10 | 10 |
| cut corners | 4.1966 | 5.1093 | 1.2175 | 0.90 | 0.9127 | 0.2175 | 5.2063 | 1.2406 | +inf | 10 | 10 | 10 |
| bite the dust | 4.8609 | 5.8842 | 1.2105 | 0.90 | 1.0232 | 0.2105 | 5.7797 | 1.1890 | +inf | 10 | 10 | 10 |
| break the mold | 4.0735 | 4.8791 | 1.1978 | 0.90 | 0.8055 | 0.1978 | 5.1050 | 1.2532 | +inf | 10 | 10 | 10 |
| call the shots | 4.1562 | 4.8985 | 1.1786 | 1.00 | 0.7423 | 0.1786 | 5.1535 | 1.2400 | 5.1535 | 10 | 10 | 10 |
| spill the beans | 3.9702 | 4.6754 | 1.1776 | 1.00 | 0.7052 | 0.1776 | 5.0341 | 1.2680 | 5.0341 | 10 | 10 | 10 |
| rock the boat | 3.9817 | 4.6827 | 1.1760 | 1.00 | 0.7009 | 0.1760 | 4.7373 | 1.1898 | 4.7373 | 10 | 10 | 10 |
| clear the air | 4.3966 | 5.0845 | 1.1565 | 0.90 | 0.6880 | 0.1565 | 5.5494 | 1.2622 | +inf | 10 | 10 | 10 |
| pull strings | 4.6040 | 5.2261 | 1.1351 | 0.90 | 0.6221 | 0.1351 | 5.9374 | 1.2896 | +inf | 10 | 10 | 10 |
| have a ball | 4.3275 | 4.8967 | 1.1315 | 0.90 | 0.5692 | 0.1315 | 5.5963 | 1.2932 | +inf | 10 | 10 | 10 |
| lead the field | 5.0036 | 5.5948 | 1.1181 | 0.90 | 0.5911 | 0.1181 | 6.2240 | 1.2439 | +inf | 10 | 10 | 10 |
| turn tail | 5.4387 | 6.0759 | 1.1172 | 0.90 | 0.6372 | 0.1172 | 6.6861 | 1.2294 | +inf | 10 | 10 | 10 |
| lose ground | 5.3857 | 5.9977 | 1.1136 | 0.70 | 0.6120 | 0.1136 | 7.1834 | 1.3338 | +inf | 10 | 10 | 10 |
| mean business | 5.2846 | 5.8409 | 1.1053 | 0.90 | 0.5564 | 0.1053 | 6.5949 | 1.2480 | +inf | 10 | 10 | 10 |
| get the sack | 4.5725 | 5.0336 | 1.1008 | 0.70 | 0.4611 | 0.1008 | 6.8961 | 1.5082 | +inf | 10 | 10 | 10 |
| make waves | 5.1823 | 5.5507 | 1.0711 | 0.30 | 0.3683 | 0.0711 | 8.5294 | 1.6459 | +inf | 10 | 10 | 10 |
| run the show | 5.2614 | 5.6262 | 1.0693 | 0.60 | 0.3648 | 0.0693 | 7.8056 | 1.4836 | +inf | 10 | 10 | 10 |
| raise hell | 5.3455 | 5.6380 | 1.0547 | 0.70 | 0.2925 | 0.0547 | 7.6252 | 1.4265 | +inf | 10 | 10 | 10 |

#### Non-idioms (sorted by H_u/H, descending)

| phrase | H(p) | H_u | H_u/H | syn_frac | H_s^log | H_s^log/H | H_s^reg | H_s^reg/H | H_s (orig) | n_idiom | n_head | n_non |
|---|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|
| call the police | 4.4693 | 4.8624 | 1.0880 | 0.90 | 0.3932 | 0.0880 | 6.3251 | 1.4152 | +inf | 10 | 10 | 10 |
| clear the table | 4.8523 | 5.2770 | 1.0875 | 0.80 | 0.4247 | 0.0875 | 6.7553 | 1.3922 | +inf | 10 | 10 | 10 |
| build the boat | 4.7749 | 5.1477 | 1.0781 | 0.80 | 0.3728 | 0.0781 | 6.7833 | 1.4206 | +inf | 10 | 10 | 10 |
| turn dials | 5.5760 | 5.9391 | 1.0651 | 0.70 | 0.3631 | 0.0651 | 7.6713 | 1.3758 | +inf | 10 | 10 | 10 |
| break the window | 4.8559 | 5.1541 | 1.0614 | 0.70 | 0.2983 | 0.0614 | 7.2268 | 1.4883 | +inf | 10 | 10 | 10 |
| raise children | 5.4626 | 5.7930 | 1.0605 | 0.70 | 0.3304 | 0.0605 | 7.6469 | 1.3999 | +inf | 10 | 10 | 10 |
| cut hair | 5.0767 | 5.3624 | 1.0563 | 0.40 | 0.2856 | 0.0563 | 8.1413 | 1.6036 | +inf | 10 | 10 | 10 |
| eat the apple | 5.1181 | 5.3857 | 1.0523 | 0.60 | 0.2676 | 0.0523 | 7.8642 | 1.5366 | +inf | 10 | 10 | 10 |
| tie knots | 5.3250 | 5.5846 | 1.0488 | 0.70 | 0.2596 | 0.0488 | 7.8209 | 1.4687 | +inf | 10 | 10 | 10 |
| make lunch | 5.2049 | 5.4530 | 1.0477 | 0.40 | 0.2481 | 0.0477 | 8.4070 | 1.6152 | +inf | 10 | 10 | 10 |
| get a present | 4.9764 | 5.1659 | 1.0381 | 0.30 | 0.1895 | 0.0381 | 8.4840 | 1.7048 | +inf | 10 | 10 | 10 |
| throw a ball | 5.0545 | 5.2463 | 1.0380 | 0.70 | 0.1918 | 0.0380 | 7.8024 | 1.5437 | +inf | 10 | 10 | 10 |
| see the show | 5.6095 | 5.7813 | 1.0306 | 0.50 | 0.1718 | 0.0306 | 8.8008 | 1.5689 | +inf | 10 | 10 | 10 |
| remember details | 5.8005 | 5.9621 | 1.0279 | 0.50 | 0.1616 | 0.0279 | 9.0367 | 1.5579 | +inf | 10 | 10 | 10 |
| spill the water | 4.8684 | 5.0019 | 1.0274 | 0.50 | 0.1335 | 0.0274 | 7.9317 | 1.6292 | +inf | 10 | 10 | 10 |
| lead the meeting | 5.8989 | 6.0126 | 1.0193 | 0.30 | 0.1136 | 0.0193 | 9.5269 | 1.6150 | +inf | 10 | 10 | 10 |
| strike a drum | 6.4391 | 6.5246 | 1.0133 | 0.30 | 0.0855 | 0.0133 | 10.1912 | 1.5827 | +inf | 10 | 10 | 10 |
| lose keys | 6.4056 | 6.4898 | 1.0131 | 0.40 | 0.0841 | 0.0131 | 10.0493 | 1.5688 | +inf | 10 | 10 | 10 |