PoliBench dashboard

Model-output profiles with visible limits.

PoliBench scores responses under fixed prompts and parser rules. It does not measure model beliefs, provider intent, training-data ideology, or real-world political impact.

Completed profiles 88

Current completed full-suite model profiles.

Eligible profiles 88

Full, completed, parse-valid profiles shown in the frontend.

Completed runs 99

Finished full-suite runs available to the public interface.

Question bank qb.v1.3.0

Current benchmark identity used across the live pages.

Responses shown 23,760

Responses represented by the current completed profile set.

Data source Live backend

Current completed full-suite Convex profiles, filtered to finished and fully parse-valid runs.

Evidence first

Start with what can be verified.

The dashboard now routes from each claim toward its proof surface, so users do not have to infer where the evidence lives.

Research paths

Move from summary to evidence without hunting.

Missing Evidence

These warnings are intentionally public. PoliBench should not look more certain than its evidence, and the public page should not read as leaderboard-first.

Release Position

Each public positioning statement links to its evidence surface. Missing validation stays labeled as pending or not collected.

ClaimEvidence
Political-response profiles, not model beliefs, provider intent, or real-world political impact. Methodology
Displayed profiles come from completed full-suite runs with complete response and parser validity. Explorer , Runs
Human and external validation remain separate pending work, not completed evidence. Human status , External status
Open-ended diagnostics are visible for inspection, but excluded from official placement claims. Explorer
Paid execution stays behind preflight, validation, canary, audit, and dead-code gates. Paid readiness

Dashboard summary

The full table lives where it can breathe.

This page keeps the release overview compact. Detailed model rows stay in the Explorer, Models, Runs, and evidence pages.