# H_u / H_s analysis — qwen3-8b-base · full · geo

Idiom vs. parallel literal-VP (non-idiom) datasets. All entropies are uniform MC averages over each phrase's observed contexts; scores are unnormalized geometric-mean (or joint) per-token LM probabilities. See `README.md` / `code/SCORING_MATH.md` for the derivation.

> **How to read these magnitudes** (directions, why H_s is +inf, the new finite synergy metrics): see [`INTERPRETATION.md`](../INTERPRETATION.md). Quick key — ↑`H_u/H(p)`, ↑`syn_frac`, ↑`H_s^log` mean **more** synergy; ↑`H_s^reg` and the original ↑`H_s` mean **less** synergy.

## Configuration

| field | idioms run | non-idioms run |
|---|---|---|
| model | Qwen/Qwen3-8B-Base | Qwen/Qwen3-8B-Base |
| reduction | geometric_mean | geometric_mean |
| medial_only | False | False |
| dtype | bfloat16 | bfloat16 |
| num_idioms | 18 | 18 |
| dataset | /home/prada/PID_evaluation/data/dataset.tsv | /home/prada/PID_evaluation/data/nonidioms_dataset.tsv |

Bound `H_u + H_s ≥ 2H(p) + 2log2 ≥ H(p)` holds for 18/18 idioms and 18/18 non-idioms.

`H_s = +inf` (≥1 non-synergistic slot) for 9/18 idioms and 18/18 non-idioms. A *finite* H_s means **every** context of that phrase is synergistic (p > max(q,r) everywhere).

## Per-metric summary (idioms vs non-idioms)

Means and 95% bootstrap CIs (20k resamples, percentile method, phrase-level). Non-finite values are dropped per metric before bootstrapping; the drop count is shown.

### H(p)

*base entropy, uniform MC average over contexts (nats). ↓ smaller = idiom more concentrated*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 4.2743 | 4.2728 | 3.3468 | 5.1653 | [4.0399, 4.5088] |
| non-idioms | 18 | 0 | 4.8678 | 4.7189 | 4.2048 | 6.0180 | [4.6335, 5.1190] |

**Cross-dataset gap** (idioms − non-idioms): Δ = -0.5935  (95% CI [-0.9365, -0.2544]) → idioms < non-idioms, **significant**.

### H_u

*unique / redundant entropy = -log min{p, max(q,r)} (nats); >= H(p)*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 4.9810 | 4.9882 | 4.1945 | 5.7506 | [4.7687, 5.1975] |
| non-idioms | 18 | 0 | 5.1304 | 5.0552 | 4.4972 | 6.1041 | [4.9240, 5.3499] |

**Cross-dataset gap** (idioms − non-idioms): Δ = -0.1494  (95% CI [-0.4568, 0.1553]) → idioms < non-idioms, not significant.

### H_u / H(p)

*unique-information ratio (>= 1). ↑ bigger = MORE synergy. THE headline metric*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 1.1702 | 1.1785 | 1.0724 | 1.2533 | [1.1428, 1.1984] |
| non-idioms | 18 | 0 | 1.0561 | 1.0540 | 1.0143 | 1.1167 | [1.0446, 1.0682] |

**Cross-dataset gap** (idioms − non-idioms): Δ = 0.1141  (95% CI [0.0837, 0.1447]) → idioms > non-idioms, **significant**.

### syn_frac

*synergy coverage in [0,1] = frac. of contexts with p>m. ↑ bigger = MORE synergy (most intuitive)*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 0.8833 | 0.9500 | 0.3000 | 1.0000 | [0.7889, 0.9556] |
| non-idioms | 18 | 0 | 0.6222 | 0.7000 | 0.3000 | 0.9000 | [0.5389, 0.7056] |

**Cross-dataset gap** (idioms − non-idioms): Δ = 0.2611  (95% CI [0.1389, 0.3722]) → idioms > non-idioms, **significant**.

### H_s^log

*log-space synergy = mean max{0, log p - log m} (nats). ↑ bigger = MORE synergy; finite always*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 0.7067 | 0.7333 | 0.3484 | 1.1386 | [0.6101, 0.8084] |
| non-idioms | 18 | 0 | 0.2626 | 0.2655 | 0.0861 | 0.4909 | [0.2160, 0.3112] |

**Cross-dataset gap** (idioms − non-idioms): Δ = 0.4441  (95% CI [0.3342, 0.5560]) → idioms > non-idioms, **significant**.

### H_s^log / H(p)

*log-space synergy ratio. ↑ bigger = MORE synergy*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 0.1702 | 0.1785 | 0.0724 | 0.2533 | [0.1428, 0.1984] |
| non-idioms | 18 | 0 | 0.0561 | 0.0540 | 0.0143 | 0.1167 | [0.0446, 0.0682] |

**Cross-dataset gap** (idioms − non-idioms): Δ = 0.1141  (95% CI [0.0837, 0.1447]) → idioms > non-idioms, **significant**.

### H_s^log signed

*signed log-space synergy = mean(log p - log m); can be negative (net anti-synergistic)*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 0.6568 | 0.7333 | 0.0004 | 1.1386 | [0.5260, 0.7824] |
| non-idioms | 18 | 0 | 0.1152 | 0.1496 | -0.2374 | 0.4749 | [0.0270, 0.2042] |

**Cross-dataset gap** (idioms − non-idioms): Δ = 0.5416  (95% CI [0.3834, 0.6964]) → idioms > non-idioms, **significant**.

### H_s^reg

*regularized H_s (eps-floored, finite, continuous). ↑ bigger = LESS synergy*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 5.5187 | 5.1999 | 3.9637 | 8.1564 | [5.0411, 6.0411] |
| non-idioms | 18 | 0 | 7.4809 | 7.3725 | 5.5175 | 9.5220 | [7.0086, 7.9593] |

**Cross-dataset gap** (idioms − non-idioms): Δ = -1.9623  (95% CI [-2.6468, -1.2640]) → idioms < non-idioms, **significant**.

### H_s^reg / H(p)

*regularized synergy ratio (>= 1). ↑ bigger = LESS synergy*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 18 | 0 | 1.2845 | 1.2319 | 1.1034 | 1.6960 | [1.2235, 1.3554] |
| non-idioms | 18 | 0 | 1.5350 | 1.5573 | 1.3122 | 1.7131 | [1.4845, 1.5839] |

**Cross-dataset gap** (idioms − non-idioms): Δ = -0.2505  (95% CI [-0.3295, -0.1624]) → idioms < non-idioms, **significant**.

### H_s (original)

*synergy entropy = -log max{0, p - max(q,r)} (nats); +inf if ANY slot non-synergistic (mostly +inf)*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 9 | 9 | 4.7735 | 4.6630 | 3.9637 | 6.1522 | [4.4404, 5.1824] |
| non-idioms | 0 | 18 | — | — | — | — | (no finite values) |

**Cross-dataset gap**: insufficient finite values in one dataset.

### H_s / H(p)

*original synergy ratio (mostly +inf; use H_s^log or syn_frac instead)*

| dataset | N finite | non-finite dropped | mean | median | min | max | 95% CI of mean |
|---|---:|---:|---:|---:|---:|---:|---|
| idioms | 9 | 9 | 1.1979 | 1.2073 | 1.1034 | 1.2717 | [1.1674, 1.2272] |
| non-idioms | 0 | 18 | — | — | — | — | (no finite values) |

**Cross-dataset gap**: insufficient finite values in one dataset.

## Per-phrase detail

#### Idioms (sorted by H_u/H, descending)

| phrase | H(p) | H_u | H_u/H | syn_frac | H_s^log | H_s^log/H | H_s^reg | H_s^reg/H | H_s (orig) | n_idiom | n_head | n_non |
|---|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|
| rock the boat | 3.3468 | 4.1945 | 1.2533 | 1.00 | 0.8477 | 0.2533 | 3.9637 | 1.1843 | 3.9637 | 10 | 10 | 10 |
| strike a chord | 4.5004 | 5.6390 | 1.2530 | 1.00 | 1.1386 | 0.2530 | 4.9658 | 1.1034 | 4.9658 | 10 | 10 | 10 |
| cut corners | 3.8571 | 4.7944 | 1.2430 | 1.00 | 0.9373 | 0.2430 | 4.5363 | 1.1761 | 4.5363 | 10 | 10 | 10 |
| bite the dust | 4.3001 | 5.3424 | 1.2424 | 0.90 | 1.0423 | 0.2424 | 5.1872 | 1.2063 | +inf | 10 | 10 | 10 |
| break the mold | 3.7412 | 4.5793 | 1.2240 | 1.00 | 0.8380 | 0.2240 | 4.5168 | 1.2073 | 4.5168 | 10 | 10 | 10 |
| spill the beans | 3.5265 | 4.2775 | 1.2129 | 1.00 | 0.7510 | 0.2129 | 4.4090 | 1.2502 | 4.4090 | 10 | 10 | 10 |
| call the shots | 3.8577 | 4.6589 | 1.2077 | 1.00 | 0.8012 | 0.2077 | 4.6630 | 1.2087 | 4.6630 | 10 | 10 | 10 |
| clear the air | 3.8423 | 4.5580 | 1.1863 | 1.00 | 0.7157 | 0.1863 | 4.6630 | 1.2136 | 4.6630 | 10 | 10 | 10 |
| turn tail | 4.7377 | 5.6142 | 1.1850 | 0.90 | 0.8765 | 0.1850 | 5.6708 | 1.1970 | +inf | 10 | 10 | 10 |
| lead the field | 4.3678 | 5.1192 | 1.1720 | 1.00 | 0.7515 | 0.1720 | 5.0919 | 1.1658 | 5.0919 | 10 | 10 | 10 |
| have a ball | 3.9072 | 4.4859 | 1.1481 | 0.90 | 0.5787 | 0.1481 | 5.2127 | 1.3341 | +inf | 10 | 10 | 10 |
| pull strings | 4.2456 | 4.8572 | 1.1440 | 0.90 | 0.6116 | 0.1440 | 5.4246 | 1.2777 | +inf | 10 | 10 | 10 |
| get the sack | 4.1702 | 4.6656 | 1.1188 | 0.80 | 0.4954 | 0.1188 | 6.2359 | 1.4953 | +inf | 10 | 10 | 10 |
| mean business | 4.9520 | 5.5370 | 1.1181 | 0.90 | 0.5850 | 0.1181 | 6.2728 | 1.2667 | +inf | 10 | 10 | 10 |
| lose ground | 5.1653 | 5.7506 | 1.1133 | 0.70 | 0.5853 | 0.1133 | 7.0151 | 1.3581 | +inf | 10 | 10 | 10 |
| raise hell | 4.8379 | 5.2621 | 1.0877 | 1.00 | 0.4242 | 0.0877 | 6.1522 | 1.2717 | 6.1522 | 10 | 10 | 10 |
| run the show | 4.7728 | 5.1649 | 1.0821 | 0.60 | 0.3920 | 0.0821 | 7.1987 | 1.5083 | +inf | 10 | 10 | 10 |
| make waves | 4.8093 | 5.1576 | 1.0724 | 0.30 | 0.3484 | 0.0724 | 8.1564 | 1.6960 | +inf | 10 | 10 | 10 |

#### Non-idioms (sorted by H_u/H, descending)

| phrase | H(p) | H_u | H_u/H | syn_frac | H_s^log | H_s^log/H | H_s^reg | H_s^reg/H | H_s (orig) | n_idiom | n_head | n_non |
|---|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|
| clear the table | 4.2048 | 4.6957 | 1.1167 | 0.90 | 0.4909 | 0.1167 | 5.5175 | 1.3122 | +inf | 10 | 10 | 10 |
| eat the apple | 4.6068 | 5.0155 | 1.0887 | 0.70 | 0.4086 | 0.0887 | 6.7631 | 1.4680 | +inf | 10 | 10 | 10 |
| call the police | 4.2984 | 4.6628 | 1.0848 | 0.80 | 0.3645 | 0.0848 | 6.1412 | 1.4287 | +inf | 10 | 10 | 10 |
| cut hair | 4.3842 | 4.7445 | 1.0822 | 0.70 | 0.3603 | 0.0822 | 6.8837 | 1.5701 | +inf | 10 | 10 | 10 |
| break the window | 4.5365 | 4.8590 | 1.0711 | 0.80 | 0.3224 | 0.0711 | 6.6494 | 1.4657 | +inf | 10 | 10 | 10 |
| build the boat | 4.3066 | 4.5942 | 1.0668 | 0.80 | 0.2876 | 0.0668 | 6.4697 | 1.5023 | +inf | 10 | 10 | 10 |
| raise children | 5.0214 | 5.3343 | 1.0623 | 0.70 | 0.3128 | 0.0623 | 7.4367 | 1.4810 | +inf | 10 | 10 | 10 |
| turn dials | 5.4262 | 5.7276 | 1.0555 | 0.80 | 0.3014 | 0.0555 | 7.4661 | 1.3759 | +inf | 10 | 10 | 10 |
| make lunch | 4.8309 | 5.0950 | 1.0547 | 0.30 | 0.2641 | 0.0547 | 8.2756 | 1.7131 | +inf | 10 | 10 | 10 |
| tie knots | 5.0053 | 5.2722 | 1.0533 | 0.80 | 0.2669 | 0.0533 | 7.1150 | 1.4215 | +inf | 10 | 10 | 10 |
| get a present | 4.5261 | 4.7601 | 1.0517 | 0.50 | 0.2339 | 0.0517 | 7.5221 | 1.6619 | +inf | 10 | 10 | 10 |
| throw a ball | 4.5520 | 4.7838 | 1.0509 | 0.70 | 0.2318 | 0.0509 | 7.0719 | 1.5536 | +inf | 10 | 10 | 10 |
| see the show | 4.9623 | 5.1616 | 1.0402 | 0.40 | 0.1993 | 0.0402 | 8.1768 | 1.6478 | +inf | 10 | 10 | 10 |
| spill the water | 4.3282 | 4.4972 | 1.0391 | 0.50 | 0.1690 | 0.0391 | 7.3083 | 1.6885 | +inf | 10 | 10 | 10 |
| remember details | 5.3838 | 5.5515 | 1.0311 | 0.60 | 0.1676 | 0.0311 | 8.4041 | 1.5610 | +inf | 10 | 10 | 10 |
| lose keys | 5.7634 | 5.8993 | 1.0236 | 0.40 | 0.1358 | 0.0236 | 9.0523 | 1.5706 | +inf | 10 | 10 | 10 |
| lead the meeting | 5.4656 | 5.5895 | 1.0227 | 0.40 | 0.1240 | 0.0227 | 8.8812 | 1.6249 | +inf | 10 | 10 | 10 |
| strike a drum | 6.0180 | 6.1041 | 1.0143 | 0.40 | 0.0861 | 0.0143 | 9.5220 | 1.5823 | +inf | 10 | 10 | 10 |

