PoliBench dashboard

Model-output profiles with visible limits.

PoliBench scores model responses under fixed prompts, current parser rules, and public run receipts. It profiles benchmark output behavior, not model beliefs, provider intent, training-data ideology, or real-world political impact.

Public profiles 73

One latest completed full-suite profile per model from the checked-in live snapshot.

Question bank qb.v1.3.0

Current benchmark identity used across the live pages.

Responses shown 19,710

Responses represented by those public profiles.

Evidence first

Start with the evidence boundary.

The dashboard keeps scope and instrument details close to the pages where they can be checked.

Research paths

Move from summary to raw evidence.

Release Position

Each positioning statement points to the page that carries the supporting detail.

ClaimEvidence
Political-response profiles, not model beliefs, provider intent, or real-world political impact. Methodology
Displayed profiles come from completed full-suite runs with complete response and parser validity. Explorer , Runs
Human and external validation remain separate pending work, not completed evidence. Human status , External status
Open-ended diagnostics are visible for inspection, but excluded from official placement claims. Explorer
Paid execution stays behind preflight, validation, canary, audit, and dead-code gates. Paid readiness