PoliBench dashboard

Model-output profiles with visible limits.

PoliBench scores responses under fixed prompts and parser rules. It does not measure model beliefs, provider intent, training-data ideology, or real-world political impact.

Completed profiles 0

Fallback frozen profile rows.

Eligible profiles 0

Full, completed, parse-valid profiles shown in the frontend.

Completed runs 0

Finished full-suite runs available to the public interface.

Question bank qb.v1.3.0

Current benchmark identity used across the live pages.

Responses shown 0

Responses represented by the current completed profile set.

Data source Frozen export

Fallback paper export loaded because the live backend was unavailable during this build.

Evidence first

Start with what can be verified.

The dashboard now routes from each claim toward its proof surface, so users do not have to infer where the evidence lives.

Research paths

Move from summary to evidence without hunting.

Missing Evidence

These warnings are intentionally public. PoliBench should not look more certain than its evidence, and the public page should not read as leaderboard-first.

Release Position

Each public positioning statement links to its evidence surface. Missing validation stays labeled as pending or not collected.

ClaimEvidence
Political-response profiles, not model beliefs, provider intent, or real-world political impact. Methodology
Displayed profiles come from completed full-suite runs with complete response and parser validity. Explorer , Runs
Human and external validation remain separate pending work, not completed evidence. Human status , External status
Open-ended diagnostics are visible for inspection, but excluded from official placement claims. Explorer
Paid execution stays behind preflight, validation, canary, audit, and dead-code gates. Paid readiness

Dashboard summary

The full table lives where it can breathe.

This page keeps the release overview compact. Detailed model rows stay in the Explorer, Models, Runs, and evidence pages.