Fallback frozen profile rows.
PoliBench dashboard
Model-output profiles with visible limits.
PoliBench scores responses under fixed prompts and parser rules. It does not measure model beliefs, provider intent, training-data ideology, or real-world political impact.
Full, completed, parse-valid profiles shown in the frontend.
Finished full-suite runs available to the public interface.
Current benchmark identity used across the live pages.
Responses represented by the current completed profile set.
Fallback paper export loaded because the live backend was unavailable during this build.
Evidence first
Start with what can be verified.
The dashboard now routes from each claim toward its proof surface, so users do not have to infer where the evidence lives.
Profiles describe model responses under this benchmark, not provider intent or real-world political impact.
Read method Data source Frozen exportFallback paper export loaded because the live backend was unavailable during this build.
Review validity Missing evidence 0 open limitsHuman baseline, external validation, and version certainty remain visible instead of being hidden behind rank.
Review limits Question bank qb.v1.3.00 scored prompts per completed full-suite model profile.
Read methodResearch paths
Move from summary to evidence without hunting.
Missing Evidence
These warnings are intentionally public. PoliBench should not look more certain than its evidence, and the public page should not read as leaderboard-first.
Release Position
Each public positioning statement links to its evidence surface. Missing validation stays labeled as pending or not collected.
| Claim | Evidence |
|---|---|
| Political-response profiles, not model beliefs, provider intent, or real-world political impact. | Methodology |
| Displayed profiles come from completed full-suite runs with complete response and parser validity. | Explorer , Runs |
| Human and external validation remain separate pending work, not completed evidence. | Human status , External status |
| Open-ended diagnostics are visible for inspection, but excluded from official placement claims. | Explorer |
| Paid execution stays behind preflight, validation, canary, audit, and dead-code gates. | Paid readiness |
Dashboard summary
The full table lives where it can breathe.
This page keeps the release overview compact. Detailed model rows stay in the Explorer, Models, Runs, and evidence pages.