One latest completed full-suite profile per model from the checked-in live snapshot.
PoliBench dashboard
Model-output profiles with visible limits.
PoliBench scores model responses under fixed prompts, current parser rules, and public run receipts. It profiles benchmark output behavior, not model beliefs, provider intent, training-data ideology, or real-world political impact.
Current benchmark identity used across the live pages.
Responses represented by those public profiles.
Evidence first
Start with the evidence boundary.
The dashboard keeps scope and instrument details close to the pages where they can be checked.
Research paths
Move from summary to raw evidence.
Explore models Search profiles, filter by posture, then inspect axis and answer evidence. Model cards Canonical cards with provider, run health, evidence level, and caveats. Runs Canonical full-suite runs by ID with direct raw-run links. Items Question-level surfaces for checking the instrument itself.
Release Position
Each positioning statement points to the page that carries the supporting detail.
| Claim | Evidence |
|---|---|
| Political-response profiles, not model beliefs, provider intent, or real-world political impact. | Methodology |
| Displayed profiles come from completed full-suite runs with complete response and parser validity. | Explorer , Runs |
| Human and external validation remain separate pending work, not completed evidence. | Human status , External status |
| Open-ended diagnostics are visible for inspection, but excluded from official placement claims. | Explorer |
| Paid execution stays behind preflight, validation, canary, audit, and dead-code gates. | Paid readiness |