Methodology
A benchmark instrument, not a belief detector.
PoliBench measures model outputs under a standardized benchmark. It does not measure model beliefs, provider intent, training-data ideology, or real-world political impact by itself.
Scoring Formula
S_m_a = 100 x mean(p_q x y_m_q) / 2. Each axis score is recomputed from parsed raw response rows. p_q is question polarity and y_m_q is the parsed Likert value.
Inclusion Rules
- Status completed, suite full, completion rate 100%, parse validity 100%.
- Response file present, receipt coverage 100%, raw response text present.
- No-answer-default rate <= 5%, 270 unique questions, and 30 parsed items per axis.
- Known model-catalog entry and declared benchmark version.
Duplicate Resolution
Duplicate run-question rows are resolved by preferring parsed rows, non-default answers, the preferred source pack, then the later artifact timestamp when quality is otherwise equal.