Paper release

polibench-paper-v1.0.1

PoliBench measures model outputs under a standardized benchmark. It does not measure model beliefs, provider intent, training-data ideology, or real-world political impact by itself.

Canonical models84
Resolved responses22680
Duplicate decisions24
Code commitc0f92bd
Frozen response rows34875
File checksums28
Truth-gated rows22680
Static schemas7
Field dictionary65

Validation Gate

Statuspass
WarningsExploratory release: no human baseline collected; not externally validated.
ErrorsNone

Release Lineage

Immutable basepolibench-paper-v1.0.0
Generated releasepolibench-paper-v1.0.1
Source exportexports/polibench-clean-csv-2026-04-26
Scope9-axis political-response profile benchmark
Bill benchmarkpoliscore-bill-benchmark is future separate work, not part of this release

Schema Manifest

ArtifactSchemaStatusErrors
runs.csv schemas/run.schema.json pass None
canonical_responses.csv schemas/response.schema.json pass None
questions.csv schemas/question.schema.json pass None
axis_scores_recomputed.csv schemas/score.schema.json pass None
exclusions.csv schemas/exclusion.schema.json pass None
duplicate_resolution.csv schemas/duplicate-resolution.schema.json pass None
paper_release.json schemas/paper-release.schema.json pass None

Release Files

Checksums

FileRowsSHA-256
runs.csv 147 225b2c784d97660c490b65a8075e287402971be60962e159891fc2c4bc27ab66
responses.csv 34875 71e1b3e8807b067e3ced97e4cc36a5b6ce3dd7c04c1d93a9517bde0efbebdead
questions.csv 522 f774a33f5894bddb8cb444c3f7b99c13660c860b6aea1756fb1734b835ac01e8
axis_scores.csv 1009 32c7c1581a59ace148a34afbb36df193d5a9e0d1508b0ec5024ee7150b881851
axis_stats.csv 1314 f12c19d44790ec4e501fb717cf56b1f83f2283c215a38a0ad963e668d59463a6
axis_definitions.csv 9 07df22fed304479cac658efe5b04f66b3086c5f4410742d64aea362ef3d21ade
model_catalog.csv 127 f99cd3db01c90192c3fd84d2743c6483d6506735eb1019ece16df8e9fb7e3ecc
artifact_packs.csv 50 7bcc4d28c1da87ddf5cf1745fbfa8779fdefee50a0dc9f0789d626aa0ec9bdbf
pack_runs.csv 158 8d36b3ab362db4bdd36626a812c5e67bf69133918c107a5ccd07ffd36f30532f
canonical_sample.csv 84 afc3bddc9d4480a5c49d0f3c0d6941faa5a74e7cf8333a033041414c535996c5
canonical_responses.csv 22680 9e902ee604844482bbdb728040def04d9f4d4c318989c8a852f54ad762b9e6ce
axis_scores_recomputed.csv 756 f20df6554b42fff9e71fa71c09def6d35f6179c8f49f708c9b62d3acb3e40d7f
axis_intervals.csv 756 71841a6eb0590ffc936c3fe6ab3c88092a5baa4ba65c8c7cf7972b6ddac40b49
response_style_controls.csv 84 f36234c504de7944a8c30f8626c324e5bcda6a6bb93b87cab3c2e75d21df01c3
item_diagnostics.csv 270 e153955252cffb1747edfcc1ed05b6fff10bcf9dc94dd477f394323cee83c365
duplicate_resolution.csv 24 09737a66e9c225852afb870e3f932419a3d564ebb31afe6a0e5eaeeff4924266
exclusions.csv 63 635bccf966bb1e13c502c1c53342abef6f10ae23e3cea61ac4e672260d54cde8
data_dictionary.csv 65 40dfd18813a7d7ad32f75b425c6edbca201a1fc3851f661411b2cc8539dedd0c
schema_manifest.csv 7 2b401f0034087c8f45cf0dcdc3d4961f547425a9035e1266221e043815dd87d2
scoring_config.json Unknown f4b558c903bcc8c0957fde6c0dd11df2f8b52cf30a27cc606b8b0a31bc2307f3
benchmark_version.json Unknown c2dcffaab0096e3955348ee367a90536cab950f897c21c2b3e73b8b76b069a99
release_validation.json Unknown d9da7f6f59cbdfe7887c8cb0ece0b5d1323346cd25cb8b2a825618d955c33db1
human_expert_coding.csv 0 9e2a843e528300150c9eef8f979ad3510ae7c77701e7503b05a8bf6977a6fa81
external_anchors.csv 0 71172b617481954cba819a14c609ae8ab2f0f9b0d885417239da267bf0da6f17
open_ended_responses.csv 0 71695289bed8d8602fe097b1b4eb0493d7fc64ee0c3ac3ee7f317dd1f61a9b21
paper_reproduction_script.md Unknown 3d84193bb4e9378e17d27468621d0d4e99a0ae3902cc38eebe51a4e50a65ce34
validation_manifest.json Unknown bb4d69f2a808ef507f9fdcf8e0ef334de7cd4d75d3e06e80b88c593d8f5423cc
paper_release.json Unknown 83116db54cd72d348d31804e6a8c06b1fcfe8654ea3b00658514ab0176750429

Limitations