Behavioral Calibration Observatory

Research instrument for AI behavioral observability infrastructure

Phase 1 Dimension Scores

Sorted by score, descending · 517 assessments

Mean Phase 1 alignment ratings across all providers. Scale: 0–100.

Learning Index Distribution

308 paired observations · LI = Phase 3 / Phase 1

15 records (4.9%) show LI > 1.0 — improvement under observation

Provider Comparison

6 dimensions · 7 providers · ordered by Learning Index

The Gap We Measure

Phase 1 captures unobserved self-assessment — the baseline state before calibration pressure is applied. Phase 3 measures the same dimensions under sustained external observation. The gap between these states is the Learning Index.

Why Humility

Humility consistently scores lowest across all providers — a finding designated F-H1-CONFIRMED. This dimension, the capacity to acknowledge uncertainty and limitation, appears structurally resistant to alignment interventions compared to task-oriented dimensions.

[arXiv:submit/7336774]

What Learning Index Means

LI = Phase 3 total / Phase 1 total. Values near 1.0 indicate stable calibration. Values above 1.0 suggest measurable improvement under observation. All LI claims use qualified language: clean, unanchored conditions (v5.3+).