Downloads
The public site now reads from live benchmark data.
Old paper packs and raw response dumps are no longer published from this frontend repository. Current model profiles are loaded from the Convex backend and filtered to completed full-suite runs from the active benchmark system.
Data Policy
This app is the frontend for current backend results. Paper-analysis exports, raw response archives, and large CSV packs should be generated and stored outside this web repo when needed for writing or offline review. Current evidence lives on the live runs, models, and items pages, with validation status documented on validity.
| Surface | Status |
|---|---|
| Compass, Explorer, Models, Runs | Live Convex data, completed full-suite rows only. |
| Raw response archives | Kept out of the frontend repository. |
| Paper release packs | Kept out of the frontend repository. |
| Public downloads | No large benchmark artifacts are published from Cloudflare Pages. |
Evidence note
PoliBench is a public benchmark surface for model outputs under fixed political prompts. Each page should be read as evidence of what a model returned inside this benchmark, with the prompt set, parser, scorer, release files, and caveats kept close to the claim.
The site keeps the claims narrow on purpose. Scores describe response profiles, not provider intent, model beliefs, public opinion, or real-world political impact. Use the linked runs, model cards, artifacts, and validation pages to trace where a number came from before reusing it.
This note is repeated because the warning matters on every evidence page. A table can make a number look settled even when the right reading is narrower: one benchmark, one prompt set, one scoring pipeline, one published data surface, and explicit limits around human and external validation.