Runs

Canonical runs are visible by ID.

All other valid runs are preserved for rerun and robustness analysis, with exclusions logged.

Claim Evidence

The run index links canonical-run claims to the pages documenting release artifacts before showing run rows.

ClaimEvidence
Canonical runs are one selected full-suite run per model slug. Run table , Canonical responses
Other valid runs and exclusions are preserved for rerun and robustness analysis. Duplicate resolution , Exclusions
Completion, parse validity, and evidence levels trace back to release audit artifacts. Truth gate , Validation manifest
Run Model Date Completion Parse Evidence
jn745fmm7et4q7nq87x5r7yh1586ajta Qwen3.5 397B A17B 2026-05-08T04:55:17.687Z 100% 100% Level 2
jn74kq8ve2jy1ms5czap7sqd4d86apss Qwen3.5 Plus 20260420 2026-05-08T03:04:51.116Z 100% 100% Level 2
jn7a14xnb15xckyfvyy41q6k4s86bhv1 GLM 4.7 2026-05-08T02:28:40.685Z 100% 100% Level 2
jn76nae0e9j4pqakz7zwtj1yn186abyv DeepSeek V3.2 2026-05-08T01:12:03.192Z 100% 100% Level 2
jn793dym6gfm0tssrp6mgh8es986b8nq Devstral 2512 2026-05-08T00:53:11.370Z 100% 100% Level 2
jn74fwhmj5ehh9xmb5jy2rrxqx868gse GPT OSS 120B 2026-05-08T00:17:56.885Z 100% 100% Level 2
jn70ff7ys17t6z347339a788kh868hka GPT-4.1 Nano 2026-05-08T00:08:43.626Z 100% 100% Level 2
jn7ae2rzspfcdav901hm50bf71868yjf GPT-4.1 Mini 2026-05-08T00:07:11.197Z 100% 100% Level 2
jn7f2kxcg3mg932p5txnza06tx8697w7 MiniMax M2.1 2026-05-07T23:50:33.145Z 100% 100% Level 2
jn77hqg7j6vmamae2r3hwnv1t1869rww GLM 5 2026-05-07T23:47:38.034Z 100% 100% Level 2
jn705tsdrjcd0np9gz91ct01ks868wpr MiniMax M2 2026-05-07T23:42:48.435Z 100% 100% Level 2
jn7deergpw9v49fk6rj2s0xwb1868tky Llama 4 Scout 2026-05-07T23:01:21.083Z 100% 100% Level 2
jn76fvf4gfy08hdmsnfxmxdrbx869qc3 Trinity Large Preview 2026-05-07T23:00:39.527Z 100% 100% Level 2
jn7fgjneas7stn448cbt35fbcs8690g9 Kimi K2.5 2026-05-07T22:49:45.952Z 100% 100% Level 2
jn7bkw0f7z6fa6z1anr0t4xf2x869xp5 Trinity Large Thinking 2026-05-07T22:45:28.713Z 100% 100% Level 2
jn72y2njy87gzbvvymbanmm0b18686rt Phi 4 2026-05-07T21:58:10.041Z 100% 100% Level 2
jn75x7fnknr6d2xwgq8px34pg5869a6m Solar Pro 3 2026-05-07T21:56:21.617Z 100% 100% Level 2
jn7bh1gqd6p23gdq346rc3jd1n869bqs Llama 3.3 70b Instruct 2026-05-07T21:39:01.541Z 100% 100% Level 2
jn7d4w89z1jma790phnzz9d8qh869t8a Kimi K2.6 2026-05-07T21:19:08.439Z 100% 100% Level 2
jn797b19f23bqm4ey1n6tr2z0h86980h MiniMax M2.7 2026-05-07T18:01:07.122Z 100% 100% Level 2
jn7aybgpr67x8zswpmegfqytyx869wkp DeepSeek V4 Pro 2026-05-07T18:00:26.704Z 100% 100% Level 2
jn77es7pyamhprdbm0bb3dntz1869ydp GLM 5.1 2026-05-07T17:59:44.592Z 100% 100% Level 2
jn7ejrx66rgxbgpy086rg1m665869aam Mercury 2 2026-05-07T17:04:41.198Z 100% 100% Level 2
jn74gs2nnjg5q8n7ysvsfmdhzh8662ad Gemma 3 4B 2026-05-06T05:27:21.023Z 100% 100% Level 2
jn794e1cgp5ecmz53cx08v2vhx866t4w Gemma 3 12B 2026-05-06T05:18:38.292Z 100% 100% Level 2
jn79cccx49g16xgxtftyamstsx866ay9 Gemma 3 27B 2026-05-06T04:53:22.363Z 100% 100% Level 2
jn7ftf922rmzy7k0ad1m8e18h5866wed Grok 3 Mini 2026-05-06T04:31:52.429Z 100% 100% Level 2
jn7322gpqzvnrj0p808perdjrn867ssk Grok 4.1 Fast 2026-05-06T04:30:15.855Z 100% 100% Level 2
jn71419q7s8pmwrg8y9095xx9n867qp2 Gemini 2.0 Flash Lite 2026-05-06T04:25:12.254Z 100% 100% Level 2
jn77f88qts7had7ywj89ncd1yd86715s Gemini 2.0 Flash 2026-05-06T04:18:38.901Z 100% 100% Level 2
jn75acm3ttqh3n44gzgafkfqm58660ee Grok Code Fast 1 2026-05-06T04:18:33.205Z 100% 100% Level 2
jn7367rpjc0ar1m1mcpkwq1ahs867sk5 Gemini 2.5 Flash 2026-05-06T04:11:22.568Z 100% 100% Level 2
jn7a7eaaja7pmzfc76pq7syqc18679yc Gemini 2.5 Flash Lite 2026-05-06T04:01:14.926Z 100% 100% Level 2
jn7ez731knc67nfs7gfshenwhd86777p Gemini 3.1 Flash Lite Preview 2026-05-06T03:55:07.107Z 100% 100% Level 2
jn7bpp64vsa9s9n3fj3g0mkdb18677j3 Grok 4.20 2026-05-06T03:47:50.112Z 100% 100% Level 2
jn79vwtvy4ew6phzgp37bkxncx866ngb Nemotron 3 Super 2026-05-06T03:30:12.726Z 100% 100% Level 2
jn790737dwtwgx14e1s31j6ycx867k67 LFM2 24B A2B 2026-05-06T03:06:50.156Z 100% 100% Level 2
jn715rwtcrnae9trwpm6kwq74d867kba Nemotron Nano 9B V2 2026-05-06T02:53:09.598Z 100% 100% Level 2
jn73p8tfdvn74nyaytf37zjve9867g4t Nemotron 3 Nano 30B A3B 2026-05-06T02:39:23.536Z 100% 100% Level 2
jn77fygwkbh1tcwk4sz25kmey58675ct MiniMax M2.5 2026-05-06T02:06:06.312Z 100% 100% Level 2
jn70sha61z7rvxw9m1ebac4w298669rz OLMo 3.1 32B Instruct 2026-05-06T01:13:14.334Z 100% 100% Level 2
jn77znvpt5wtay1jkv1jp7y3an867fk7 Mistral Large 3 2512 2026-05-06T01:07:56.988Z 100% 100% Level 2
jn7egfgd1waqa15wwzyatnjk698673e7 Mistral Medium 3.1 2026-05-06T01:04:48.108Z 100% 100% Level 2
jn7cneszh8h4h169wp9m6ftj818669wz Ministral 3 3B 2512 2026-05-06T00:57:52.578Z 100% 100% Level 2
jn72eek26kmhcfna69zsg8m0qs8667a2 Ministral 3 8B 2512 2026-05-06T00:55:32.313Z 100% 100% Level 2
jn78b7c7pyv9bwxfz63p58xrjs867dg5 Ministral 3 14B 2512 2026-05-06T00:51:50.541Z 100% 100% Level 2
jn7dn0ckwrdgp6nksteazb2rps866c2a Mistral Saba 2026-05-06T00:48:28.640Z 100% 100% Level 2
jn78dd95913j8fhzm2wpf1wxa1866pjq Mistral Small 4 2026-05-06T00:44:45.481Z 100% 100% Level 2
jn7ca900bsv6ychdfzf2r879js864j6a Reka Edge 2026-05-06T00:03:00.665Z 100% 100% Level 2
jn77m1pvwyaed4n2v6nb6btn1s864kct DeepSeek V4 Flash 2026-05-05T23:20:57.419Z 100% 100% Level 2
jn7ayysp9g6ett652tdhefmpj586550k Trinity Mini 2026-05-05T21:30:38.001Z 100% 100% Level 2
jn7fdzd95ngzbwn6j42yfs5kzx864qrk Gemma 4 31B 2026-05-05T20:40:09.933Z 100% 100% Level 2
jn770jvh5bx4s74v63yhxmnxnh865vgq Gemma 4 26B A4B 2026-05-05T20:25:15.509Z 100% 100% Level 2
jn7839n5vcfsf5zsyqg0098rwd864xxv Claude Opus 4.5 2026-05-05T17:46:36.587Z 100% 100% Level 2
jn7fbr2nfw808e81z8aszvp391864cr9 GPT-5.1 2026-05-05T17:32:08.830Z 100% 100% Level 2
jn72r49hq5g308wv4xv0y6rf9s864bjp Qwen3.6 35B A3B 2026-05-05T17:03:19.274Z 100% 100% Level 2
jn72zhsrwq9m571zcf6mesd5rs865qss Mistral Medium 3.5 2026-05-05T16:22:24.792Z 100% 100% Level 2
jn70fpqyr7an1bca1cn7fq93ys864cx0 Claude Haiku 4.5 2026-05-05T15:30:50.905Z 100% 100% Level 2
jn74qyaygktq550zw4metb3xt5864hfv Claude Sonnet 4.6 2026-05-05T15:21:28.080Z 100% 100% Level 2
jn7bedafgemk6hecfqtpd6e309864xhq Claude Opus 4.7 2026-05-05T15:03:26.640Z 100% 100% Level 2
jn7ccgp27qt3ecc85dq0vaabhh8659q1 Qwen3.6 Flash 2026-05-05T14:13:50.999Z 100% 100% Level 2
jn7ed5ge4j9xakj11zcvnsx2jd865y2k Ling 2.6 Flash 2026-05-05T13:38:35.511Z 100% 100% Level 2
jn7cfwkqn38wj9715mw02tdxxh8630yc Grok 4.3 2026-05-05T05:37:04.503Z 100% 100% Level 2
jn7b14sehdj2rhgte9k6pw824d864pmp Qwen3.6 Max Preview 2026-05-05T04:43:40.761Z 100% 100% Level 2
jn73khjf12dxsp17a9t1eg68ks863rwm Granite 4.1 8b 2026-05-04T23:46:39.231Z 100% 100% Level 2
jn7dsp0pxw3zk1yhe7swg846kh8624vq Grok 4 Fast 2026-05-04T20:30:22.295Z 100% 100% Level 2
jn7a64svgza9ah7n809x2cbqrx862g9r Llama 4 Maverick 2026-05-04T20:13:26.050Z 100% 100% Level 2
jn7f62k61w8er0kyjr36fpph0n862d85 GPT OSS 20B 2026-05-04T19:59:25.052Z 100% 100% Level 2
jn72p8yhqm85173fhr4xxvctgh862prm Llama 4 Maverick 2026-05-04T19:30:59.957Z 100% 100% Level 2
jn72d06xfqwj8pds5qgdq6t2gs8623kp Gemini 3 Flash Preview 2026-05-04T19:16:18.863Z 100% 100% Level 2
jn74r11x1a5denvej7jbyc4p8h8633y8 Gemini 3.1 Pro Preview 2026-05-04T19:13:01.026Z 100% 100% Level 2
jn7frn897xwxymwwpnbck45ejn8626rf GPT-5.5 2026-05-04T18:47:29.330Z 100% 100% Level 2
jn78dxdtvkys549w6ad5sfh6vh863tbm GPT-5.4 2026-05-04T18:37:12.233Z 100% 100% Level 2
jn74tnhtxnqj7q7b7pqmg5a7nx863xhv GPT-5.4 Mini 2026-05-04T18:13:22.445Z 100% 100% Level 2

Evidence note

PoliBench is a public benchmark surface for model outputs under fixed political prompts. Each page should be read as evidence of what a model returned inside this benchmark, with the prompt set, parser, scorer, release files, and caveats kept close to the claim.

The site keeps the claims narrow on purpose. Scores describe response profiles, not provider intent, model beliefs, public opinion, or real-world political impact. Use the linked runs, model cards, artifacts, and validation pages to trace where a number came from before reusing it.

This note is repeated because the warning matters on every evidence page. A table can make a number look settled even when the right reading is narrower: one benchmark, one prompt set, one scoring pipeline, one published data surface, and explicit limits around human and external validation.