# Master sweep summary — H_u / H_s across models, reductions, context-modes

Configs discovered: 20 (from `results/sweep/json`). Each row pairs an idiom run with its parallel non-idiom run. Δ is the idiom−nonidiom mean gap with a 20k-resample independent bootstrap 95% CI; `*` marks a CI excluding 0.

Means drop non-finite values (chiefly `H_s = +inf` on phrases with a non-synergistic slot); `fin` columns show how many phrases were finite (out of N).

## H_u/H  (↑ more synergy; headline)

| model | mode | reduction | idiom mean (fin/N) | nonidiom mean (fin/N) | Δ (idiom−nonidiom) | 95% CI | sig |
|---|---|---|---:|---:|---:|---|:---:|
| gemma2-9b | full | geo | 1.197 (18/18) | 1.075 (18/18) | 0.122 | [0.087, 0.157] | * |
| gemma2-9b | full | joint | 1.130 (18/18) | 1.027 (18/18) | 0.103 | [0.079, 0.127] | * |
| gemma2-9b | medial | geo | 1.240 (18/18) | 1.091 (18/18) | 0.148 | [0.106, 0.192] | * |
| gemma2-9b | medial | joint | 1.173 (18/18) | 1.041 (18/18) | 0.132 | [0.095, 0.171] | * |
| gpt2 | full | geo | 1.110 (18/18) | 1.047 (18/18) | 0.063 | [0.037, 0.091] | * |
| gpt2 | full | joint | 1.075 (18/18) | 1.018 (18/18) | 0.057 | [0.034, 0.082] | * |
| gpt2 | medial | geo | 1.143 (18/18) | 1.059 (18/18) | 0.084 | [0.048, 0.122] | * |
| gpt2 | medial | joint | 1.105 (18/18) | 1.028 (18/18) | 0.077 | [0.046, 0.113] | * |
| llama3.1-8b | full | geo | 1.163 (18/18) | 1.055 (18/18) | 0.108 | [0.081, 0.134] | * |
| llama3.1-8b | full | joint | 1.127 (18/18) | 1.027 (18/18) | 0.101 | [0.077, 0.124] | * |
| llama3.1-8b | medial | geo | 1.207 (18/18) | 1.070 (18/18) | 0.137 | [0.104, 0.171] | * |
| llama3.1-8b | medial | joint | 1.164 (18/18) | 1.037 (18/18) | 0.127 | [0.095, 0.160] | * |
| qwen3-8b | full | geo | 1.142 (18/18) | 1.047 (18/18) | 0.095 | [0.069, 0.121] | * |
| qwen3-8b | full | joint | 1.109 (18/18) | 1.022 (18/18) | 0.087 | [0.064, 0.112] | * |
| qwen3-8b | medial | geo | 1.201 (18/18) | 1.072 (18/18) | 0.130 | [0.091, 0.170] | * |
| qwen3-8b | medial | joint | 1.153 (18/18) | 1.039 (18/18) | 0.114 | [0.079, 0.152] | * |
| qwen3-8b-base | full | geo | 1.170 (18/18) | 1.056 (18/18) | 0.114 | [0.084, 0.145] | * |
| qwen3-8b-base | full | joint | 1.134 (18/18) | 1.025 (18/18) | 0.108 | [0.081, 0.136] | * |
| qwen3-8b-base | medial | geo | 1.217 (18/18) | 1.076 (18/18) | 0.141 | [0.100, 0.182] | * |
| qwen3-8b-base | medial | joint | 1.168 (18/18) | 1.037 (18/18) | 0.132 | [0.095, 0.170] | * |

## syn_frac  (↑ more synergy; coverage 0..1)

| model | mode | reduction | idiom mean (fin/N) | nonidiom mean (fin/N) | Δ (idiom−nonidiom) | 95% CI | sig |
|---|---|---|---:|---:|---:|---|:---:|
| gemma2-9b | full | geo | 0.883 (18/18) | 0.617 (18/18) | 0.267 | [0.122, 0.406] | * |
| gemma2-9b | full | joint | 0.794 (18/18) | 0.422 (18/18) | 0.372 | [0.211, 0.522] | * |
| gemma2-9b | medial | geo | 0.933 (18/18) | 0.700 (18/18) | 0.233 | [0.122, 0.344] | * |
| gemma2-9b | medial | joint | 0.856 (18/18) | 0.511 (18/18) | 0.344 | [0.200, 0.489] | * |
| gpt2 | full | geo | 0.806 (18/18) | 0.617 (18/18) | 0.189 | [0.061, 0.311] | * |
| gpt2 | full | joint | 0.750 (18/18) | 0.411 (18/18) | 0.339 | [0.178, 0.494] | * |
| gpt2 | medial | geo | 0.944 (18/18) | 0.767 (18/18) | 0.178 | [0.044, 0.333] | * |
| gpt2 | medial | joint | 0.822 (18/18) | 0.567 (18/18) | 0.256 | [0.089, 0.422] | * |
| llama3.1-8b | full | geo | 0.889 (18/18) | 0.661 (18/18) | 0.228 | [0.100, 0.350] | * |
| llama3.1-8b | full | joint | 0.856 (18/18) | 0.472 (18/18) | 0.383 | [0.217, 0.533] | * |
| llama3.1-8b | medial | geo | 0.978 (18/18) | 0.744 (18/18) | 0.233 | [0.122, 0.333] | * |
| llama3.1-8b | medial | joint | 0.944 (18/18) | 0.578 (18/18) | 0.367 | [0.233, 0.500] | * |
| qwen3-8b | full | geo | 0.839 (18/18) | 0.567 (18/18) | 0.272 | [0.150, 0.383] | * |
| qwen3-8b | full | joint | 0.778 (18/18) | 0.333 (18/18) | 0.444 | [0.317, 0.567] | * |
| qwen3-8b | medial | geo | 0.956 (18/18) | 0.733 (18/18) | 0.222 | [0.100, 0.333] | * |
| qwen3-8b | medial | joint | 0.856 (18/18) | 0.544 (18/18) | 0.311 | [0.167, 0.456] | * |
| qwen3-8b-base | full | geo | 0.883 (18/18) | 0.622 (18/18) | 0.261 | [0.139, 0.372] | * |
| qwen3-8b-base | full | joint | 0.811 (18/18) | 0.439 (18/18) | 0.372 | [0.211, 0.522] | * |
| qwen3-8b-base | medial | geo | 0.956 (18/18) | 0.756 (18/18) | 0.200 | [0.100, 0.311] | * |
| qwen3-8b-base | medial | joint | 0.833 (18/18) | 0.556 (18/18) | 0.278 | [0.156, 0.411] | * |

## H_s^log/H  (↑ more synergy)

| model | mode | reduction | idiom mean (fin/N) | nonidiom mean (fin/N) | Δ (idiom−nonidiom) | 95% CI | sig |
|---|---|---|---:|---:|---:|---|:---:|
| gemma2-9b | full | geo | 0.197 (18/18) | 0.075 (18/18) | 0.122 | [0.087, 0.157] | * |
| gemma2-9b | full | joint | 0.130 (18/18) | 0.027 (18/18) | 0.103 | [0.079, 0.127] | * |
| gemma2-9b | medial | geo | 0.240 (18/18) | 0.091 (18/18) | 0.148 | [0.106, 0.192] | * |
| gemma2-9b | medial | joint | 0.173 (18/18) | 0.041 (18/18) | 0.132 | [0.095, 0.171] | * |
| gpt2 | full | geo | 0.110 (18/18) | 0.047 (18/18) | 0.063 | [0.037, 0.091] | * |
| gpt2 | full | joint | 0.075 (18/18) | 0.018 (18/18) | 0.057 | [0.034, 0.082] | * |
| gpt2 | medial | geo | 0.143 (18/18) | 0.059 (18/18) | 0.084 | [0.048, 0.122] | * |
| gpt2 | medial | joint | 0.105 (18/18) | 0.028 (18/18) | 0.077 | [0.046, 0.113] | * |
| llama3.1-8b | full | geo | 0.163 (18/18) | 0.055 (18/18) | 0.108 | [0.081, 0.134] | * |
| llama3.1-8b | full | joint | 0.127 (18/18) | 0.027 (18/18) | 0.101 | [0.077, 0.124] | * |
| llama3.1-8b | medial | geo | 0.207 (18/18) | 0.070 (18/18) | 0.137 | [0.104, 0.171] | * |
| llama3.1-8b | medial | joint | 0.164 (18/18) | 0.037 (18/18) | 0.127 | [0.095, 0.160] | * |
| qwen3-8b | full | geo | 0.142 (18/18) | 0.047 (18/18) | 0.095 | [0.069, 0.121] | * |
| qwen3-8b | full | joint | 0.109 (18/18) | 0.022 (18/18) | 0.087 | [0.064, 0.112] | * |
| qwen3-8b | medial | geo | 0.201 (18/18) | 0.072 (18/18) | 0.130 | [0.091, 0.170] | * |
| qwen3-8b | medial | joint | 0.153 (18/18) | 0.039 (18/18) | 0.114 | [0.079, 0.152] | * |
| qwen3-8b-base | full | geo | 0.170 (18/18) | 0.056 (18/18) | 0.114 | [0.084, 0.145] | * |
| qwen3-8b-base | full | joint | 0.134 (18/18) | 0.025 (18/18) | 0.108 | [0.081, 0.136] | * |
| qwen3-8b-base | medial | geo | 0.217 (18/18) | 0.076 (18/18) | 0.141 | [0.100, 0.182] | * |
| qwen3-8b-base | medial | joint | 0.168 (18/18) | 0.037 (18/18) | 0.132 | [0.095, 0.170] | * |

## H_s^log  (↑ more synergy, nats)

| model | mode | reduction | idiom mean (fin/N) | nonidiom mean (fin/N) | Δ (idiom−nonidiom) | 95% CI | sig |
|---|---|---|---:|---:|---:|---|:---:|
| gemma2-9b | full | geo | 0.892 (18/18) | 0.389 (18/18) | 0.503 | [0.360, 0.650] | * |
| gemma2-9b | full | joint | 7.526 (18/18) | 1.794 (18/18) | 5.732 | [4.414, 7.031] | * |
| gemma2-9b | medial | geo | 1.016 (18/18) | 0.446 (18/18) | 0.570 | [0.404, 0.737] | * |
| gemma2-9b | medial | joint | 9.732 (18/18) | 2.653 (18/18) | 7.079 | [5.042, 9.162] | * |
| gpt2 | full | geo | 0.507 (18/18) | 0.231 (18/18) | 0.276 | [0.168, 0.398] | * |
| gpt2 | full | joint | 4.097 (18/18) | 1.074 (18/18) | 3.023 | [1.895, 4.223] | * |
| gpt2 | medial | geo | 0.597 (18/18) | 0.268 (18/18) | 0.330 | [0.200, 0.468] | * |
| gpt2 | medial | joint | 5.423 (18/18) | 1.577 (18/18) | 3.847 | [2.358, 5.395] | * |
| llama3.1-8b | full | geo | 0.686 (18/18) | 0.259 (18/18) | 0.427 | [0.328, 0.530] | * |
| llama3.1-8b | full | joint | 6.876 (18/18) | 1.628 (18/18) | 5.248 | [4.016, 6.479] | * |
| llama3.1-8b | medial | geo | 0.818 (18/18) | 0.313 (18/18) | 0.505 | [0.390, 0.623] | * |
| llama3.1-8b | medial | joint | 8.584 (18/18) | 2.175 (18/18) | 6.409 | [4.733, 8.116] | * |
| qwen3-8b | full | geo | 0.653 (18/18) | 0.243 (18/18) | 0.410 | [0.302, 0.520] | * |
| qwen3-8b | full | joint | 5.928 (18/18) | 1.361 (18/18) | 4.566 | [3.373, 5.789] | * |
| qwen3-8b | medial | geo | 0.867 (18/18) | 0.348 (18/18) | 0.519 | [0.376, 0.662] | * |
| qwen3-8b | medial | joint | 8.056 (18/18) | 2.299 (18/18) | 5.757 | [4.013, 7.577] | * |
| qwen3-8b-base | full | geo | 0.707 (18/18) | 0.263 (18/18) | 0.444 | [0.334, 0.556] | * |
| qwen3-8b-base | full | joint | 6.563 (18/18) | 1.425 (18/18) | 5.137 | [3.854, 6.416] | * |
| qwen3-8b-base | medial | geo | 0.853 (18/18) | 0.341 (18/18) | 0.512 | [0.376, 0.651] | * |
| qwen3-8b-base | medial | joint | 8.100 (18/18) | 2.004 (18/18) | 6.096 | [4.425, 7.808] | * |

## H_s^reg/H  (↑ less synergy; finite)

| model | mode | reduction | idiom mean (fin/N) | nonidiom mean (fin/N) | Δ (idiom−nonidiom) | 95% CI | sig |
|---|---|---|---:|---:|---:|---|:---:|
| gemma2-9b | full | geo | 1.234 (18/18) | 1.433 (18/18) | -0.199 | [-0.290, -0.101] | * |
| gemma2-9b | full | joint | 1.017 (18/18) | 1.039 (18/18) | -0.023 | [-0.033, -0.012] | * |
| gemma2-9b | medial | geo | 1.187 (18/18) | 1.416 (18/18) | -0.228 | [-0.312, -0.137] | * |
| gemma2-9b | medial | joint | 1.012 (18/18) | 1.036 (18/18) | -0.024 | [-0.034, -0.013] | * |
| gpt2 | full | geo | 1.365 (18/18) | 1.536 (18/18) | -0.171 | [-0.250, -0.089] | * |
| gpt2 | full | joint | 1.022 (18/18) | 1.048 (18/18) | -0.026 | [-0.038, -0.013] | * |
| gpt2 | medial | geo | 1.264 (18/18) | 1.468 (18/18) | -0.204 | [-0.298, -0.107] | * |
| gpt2 | medial | joint | 1.017 (18/18) | 1.039 (18/18) | -0.022 | [-0.036, -0.008] | * |
| llama3.1-8b | full | geo | 1.280 (18/18) | 1.518 (18/18) | -0.239 | [-0.319, -0.150] | * |
| llama3.1-8b | full | joint | 1.013 (18/18) | 1.041 (18/18) | -0.028 | [-0.039, -0.015] | * |
| llama3.1-8b | medial | geo | 1.190 (18/18) | 1.472 (18/18) | -0.282 | [-0.360, -0.196] | * |
| llama3.1-8b | medial | joint | 1.006 (18/18) | 1.035 (18/18) | -0.029 | [-0.040, -0.018] | * |
| qwen3-8b | full | geo | 1.303 (18/18) | 1.527 (18/18) | -0.224 | [-0.294, -0.147] | * |
| qwen3-8b | full | joint | 1.019 (18/18) | 1.050 (18/18) | -0.031 | [-0.040, -0.021] | * |
| qwen3-8b | medial | geo | 1.194 (18/18) | 1.442 (18/18) | -0.248 | [-0.344, -0.151] | * |
| qwen3-8b | medial | joint | 1.013 (18/18) | 1.039 (18/18) | -0.025 | [-0.037, -0.013] | * |
| qwen3-8b-base | full | geo | 1.284 (18/18) | 1.535 (18/18) | -0.250 | [-0.329, -0.162] | * |
| qwen3-8b-base | full | joint | 1.018 (18/18) | 1.047 (18/18) | -0.030 | [-0.042, -0.016] | * |
| qwen3-8b-base | medial | geo | 1.216 (18/18) | 1.469 (18/18) | -0.253 | [-0.333, -0.170] | * |
| qwen3-8b-base | medial | joint | 1.016 (18/18) | 1.041 (18/18) | -0.025 | [-0.037, -0.013] | * |

## H_s^reg  (↑ less synergy, nats)

| model | mode | reduction | idiom mean (fin/N) | nonidiom mean (fin/N) | Δ (idiom−nonidiom) | 95% CI | sig |
|---|---|---|---:|---:|---:|---|:---:|
| gemma2-9b | full | geo | 5.754 (18/18) | 7.903 (18/18) | -2.149 | [-3.012, -1.264] | * |
| gemma2-9b | full | joint | 60.119 (18/18) | 72.350 (18/18) | -12.231 | [-16.562, -7.847] | * |
| gemma2-9b | medial | geo | 5.199 (18/18) | 7.318 (18/18) | -2.119 | [-2.936, -1.310] | * |
| gemma2-9b | medial | joint | 58.195 (18/18) | 69.151 (18/18) | -10.956 | [-16.106, -5.805] | * |
| gpt2 | full | geo | 6.511 (18/18) | 7.884 (18/18) | -1.373 | [-2.109, -0.666] | * |
| gpt2 | full | joint | 57.583 (18/18) | 63.359 (18/18) | -5.776 | [-9.450, -2.116] | * |
| gpt2 | medial | geo | 5.577 (18/18) | 6.979 (18/18) | -1.401 | [-2.217, -0.628] | * |
| gpt2 | medial | joint | 54.688 (18/18) | 59.067 (18/18) | -4.378 | [-9.102, 0.168] |  |
| llama3.1-8b | full | geo | 5.510 (18/18) | 7.452 (18/18) | -1.941 | [-2.614, -1.261] | * |
| llama3.1-8b | full | joint | 55.396 (18/18) | 65.059 (18/18) | -9.663 | [-12.739, -6.633] | * |
| llama3.1-8b | medial | geo | 4.818 (18/18) | 6.855 (18/18) | -2.037 | [-2.665, -1.403] | * |
| llama3.1-8b | medial | joint | 53.152 (18/18) | 62.212 (18/18) | -9.060 | [-13.007, -5.224] | * |
| qwen3-8b | full | geo | 6.171 (18/18) | 8.137 (18/18) | -1.966 | [-2.650, -1.274] | * |
| qwen3-8b | full | joint | 56.448 (18/18) | 65.905 (18/18) | -9.457 | [-12.559, -6.406] | * |
| qwen3-8b | medial | geo | 5.333 (18/18) | 7.263 (18/18) | -1.930 | [-2.645, -1.207] | * |
| qwen3-8b | medial | joint | 54.672 (18/18) | 62.461 (18/18) | -7.789 | [-11.858, -3.655] | * |
| qwen3-8b-base | full | geo | 5.519 (18/18) | 7.481 (18/18) | -1.962 | [-2.647, -1.264] | * |
| qwen3-8b-base | full | joint | 51.096 (18/18) | 60.236 (18/18) | -9.140 | [-12.221, -6.128] | * |
| qwen3-8b-base | medial | geo | 4.971 (18/18) | 6.767 (18/18) | -1.796 | [-2.479, -1.123] | * |
| qwen3-8b-base | medial | joint | 50.115 (18/18) | 57.523 (18/18) | -7.408 | [-11.787, -3.121] | * |

## H_s/H  (original; mostly +inf)

| model | mode | reduction | idiom mean (fin/N) | nonidiom mean (fin/N) | Δ (idiom−nonidiom) | 95% CI | sig |
|---|---|---|---:|---:|---:|---|:---:|
| gemma2-9b | full | geo | 1.148 (9/18) | 1.214 (1/18) | -0.066 | [-0.092, -0.040] | * |
| gemma2-9b | full | joint | 1.000 (3/18) | 1.001 (1/18) | -0.001 | [-0.001, -0.001] | * |
| gemma2-9b | medial | geo | 1.124 (13/18) | 1.302 (3/18) | -0.178 | [-0.414, 0.006] |  |
| gemma2-9b | medial | joint | 1.000 (10/18) | 1.000 (1/18) | 0.000 | [-0.000, 0.001] |  |
| gpt2 | full | geo | 1.194 (5/18) | 1.382 (1/18) | -0.188 | [-0.239, -0.152] | * |
| gpt2 | full | joint | 1.000 (4/18) | 1.004 (2/18) | -0.004 | [-0.005, -0.002] | * |
| gpt2 | medial | geo | 1.211 (15/18) | 1.352 (8/18) | -0.141 | [-0.213, -0.069] | * |
| gpt2 | medial | joint | 1.001 (10/18) | 1.004 (2/18) | -0.003 | [-0.008, 0.001] |  |
| llama3.1-8b | full | geo | 1.196 (8/18) | 1.321 (1/18) | -0.124 | [-0.159, -0.092] | * |
| llama3.1-8b | full | joint | 1.001 (8/18) | 1.003 (2/18) | -0.002 | [-0.005, 0.000] |  |
| llama3.1-8b | medial | geo | 1.170 (17/18) | 1.316 (5/18) | -0.146 | [-0.283, -0.039] | * |
| llama3.1-8b | medial | joint | 1.001 (15/18) | 1.001 (2/18) | 0.000 | [-0.000, 0.001] |  |
| qwen3-8b | full | geo | 1.203 (4/18) | — (0/18) | — | — |  |
| qwen3-8b | full | joint | 1.001 (2/18) | — (0/18) | — | — |  |
| qwen3-8b | medial | geo | 1.153 (15/18) | 1.256 (6/18) | -0.103 | [-0.151, -0.054] | * |
| qwen3-8b | medial | joint | 1.002 (9/18) | 1.002 (1/18) | -0.001 | [-0.002, 0.001] |  |
| qwen3-8b-base | full | geo | 1.198 (9/18) | — (0/18) | — | — |  |
| qwen3-8b-base | full | joint | 1.001 (5/18) | — (0/18) | — | — |  |
| qwen3-8b-base | medial | geo | 1.177 (15/18) | 1.354 (4/18) | -0.177 | [-0.329, -0.062] | * |
| qwen3-8b-base | medial | joint | 1.001 (8/18) | — (0/18) | — | — |  |

## H_u  (nats)

| model | mode | reduction | idiom mean (fin/N) | nonidiom mean (fin/N) | Δ (idiom−nonidiom) | 95% CI | sig |
|---|---|---|---:|---:|---:|---|:---:|
| gemma2-9b | full | geo | 5.540 (18/18) | 5.896 (18/18) | -0.357 | [-0.832, 0.110] |  |
| gemma2-9b | full | joint | 66.644 (18/18) | 71.386 (18/18) | -4.741 | [-8.562, -0.865] | * |
| gemma2-9b | medial | geo | 5.404 (18/18) | 5.627 (18/18) | -0.222 | [-0.791, 0.346] |  |
| gemma2-9b | medial | joint | 67.242 (18/18) | 69.420 (18/18) | -2.179 | [-7.442, 2.991] |  |
| gpt2 | full | geo | 5.255 (18/18) | 5.355 (18/18) | -0.100 | [-0.412, 0.204] |  |
| gpt2 | full | joint | 60.424 (18/18) | 61.565 (18/18) | -1.141 | [-4.379, 2.284] |  |
| gpt2 | medial | geo | 5.003 (18/18) | 5.001 (18/18) | 0.002 | [-0.352, 0.355] |  |
| gpt2 | medial | joint | 59.207 (18/18) | 58.475 (18/18) | 0.732 | [-4.048, 5.556] |  |
| llama3.1-8b | full | geo | 4.979 (18/18) | 5.153 (18/18) | -0.174 | [-0.458, 0.110] |  |
| llama3.1-8b | full | joint | 61.542 (18/18) | 64.133 (18/18) | -2.591 | [-5.304, 0.151] |  |
| llama3.1-8b | medial | geo | 4.865 (18/18) | 4.964 (18/18) | -0.099 | [-0.438, 0.241] |  |
| llama3.1-8b | medial | joint | 61.434 (18/18) | 62.319 (18/18) | -0.885 | [-4.984, 3.254] |  |
| qwen3-8b | full | geo | 5.370 (18/18) | 5.564 (18/18) | -0.194 | [-0.500, 0.107] |  |
| qwen3-8b | full | joint | 61.273 (18/18) | 64.110 (18/18) | -2.837 | [-5.511, -0.111] | * |
| qwen3-8b | medial | geo | 5.329 (18/18) | 5.379 (18/18) | -0.050 | [-0.412, 0.312] |  |
| qwen3-8b | medial | joint | 61.997 (18/18) | 62.510 (18/18) | -0.513 | [-4.777, 3.839] |  |
| qwen3-8b-base | full | geo | 4.981 (18/18) | 5.130 (18/18) | -0.149 | [-0.457, 0.155] |  |
| qwen3-8b-base | full | joint | 56.744 (18/18) | 58.946 (18/18) | -2.202 | [-4.842, 0.431] |  |
| qwen3-8b-base | medial | geo | 4.935 (18/18) | 4.948 (18/18) | -0.013 | [-0.392, 0.367] |  |
| qwen3-8b-base | medial | joint | 57.427 (18/18) | 57.338 (18/18) | 0.089 | [-4.414, 4.643] |  |

## H(p)  (↓ smaller = more concentrated, nats)

| model | mode | reduction | idiom mean (fin/N) | nonidiom mean (fin/N) | Δ (idiom−nonidiom) | 95% CI | sig |
|---|---|---|---:|---:|---:|---|:---:|
| gemma2-9b | full | geo | 4.648 (18/18) | 5.507 (18/18) | -0.860 | [-1.353, -0.373] | * |
| gemma2-9b | full | joint | 59.119 (18/18) | 69.592 (18/18) | -10.473 | [-14.544, -6.382] | * |
| gemma2-9b | medial | geo | 4.388 (18/18) | 5.180 (18/18) | -0.792 | [-1.358, -0.227] | * |
| gemma2-9b | medial | joint | 57.510 (18/18) | 66.768 (18/18) | -9.258 | [-14.355, -4.175] | * |
| gpt2 | full | geo | 4.748 (18/18) | 5.124 (18/18) | -0.376 | [-0.723, -0.043] | * |
| gpt2 | full | joint | 56.326 (18/18) | 60.490 (18/18) | -4.164 | [-7.626, -0.648] | * |
| gpt2 | medial | geo | 4.405 (18/18) | 4.733 (18/18) | -0.327 | [-0.728, 0.068] |  |
| gpt2 | medial | joint | 53.784 (18/18) | 56.898 (18/18) | -3.114 | [-7.793, 1.533] |  |
| llama3.1-8b | full | geo | 4.293 (18/18) | 4.894 (18/18) | -0.601 | [-0.916, -0.294] | * |
| llama3.1-8b | full | joint | 54.666 (18/18) | 62.505 (18/18) | -7.839 | [-10.650, -5.073] | * |
| llama3.1-8b | medial | geo | 4.047 (18/18) | 4.651 (18/18) | -0.604 | [-0.979, -0.235] | * |
| llama3.1-8b | medial | joint | 52.850 (18/18) | 60.144 (18/18) | -7.294 | [-11.182, -3.499] | * |
| qwen3-8b | full | geo | 4.717 (18/18) | 5.320 (18/18) | -0.603 | [-0.944, -0.267] | * |
| qwen3-8b | full | joint | 55.345 (18/18) | 62.749 (18/18) | -7.404 | [-10.265, -4.538] | * |
| qwen3-8b | medial | geo | 4.462 (18/18) | 5.031 (18/18) | -0.569 | [-0.967, -0.165] | * |
| qwen3-8b | medial | joint | 53.941 (18/18) | 60.211 (18/18) | -6.270 | [-10.398, -2.075] | * |
| qwen3-8b-base | full | geo | 4.274 (18/18) | 4.868 (18/18) | -0.593 | [-0.936, -0.254] | * |
| qwen3-8b-base | full | joint | 50.181 (18/18) | 57.521 (18/18) | -7.340 | [-10.174, -4.578] | * |
| qwen3-8b-base | medial | geo | 4.083 (18/18) | 4.608 (18/18) | -0.525 | [-0.932, -0.113] | * |
| qwen3-8b-base | medial | joint | 49.327 (18/18) | 55.334 (18/18) | -6.007 | [-10.345, -1.712] | * |

