Methodology

A benchmark instrument, not a belief detector.

PoliBench measures model outputs under a standardized benchmark. It does not measure model beliefs, provider intent, training-data ideology, or real-world political impact by itself.

Scoring Formula

S_m_a = 100 x mean(p_q x y_m_q) / 2. Each axis score is recomputed from parsed raw response rows. p_q is question polarity and y_m_q is the parsed Likert value.

Inclusion Rules

Status completed, suite full, completion rate 100%, parse validity 100%.
Response file present, receipt coverage 100%, raw response text present.
No-answer-default rate <= 5%, 270 unique questions, and 30 parsed items per axis.
Known model-catalog entry and declared benchmark version.

Duplicate Resolution

Duplicate run-question rows are resolved by preferring parsed rows, non-default answers, the preferred source pack, then the later artifact timestamp when quality is otherwise equal.