Research instrument for AI behavioral observability infrastructure
Sorted by score, descending · 517 assessments
Mean Phase 1 alignment ratings across all providers. Scale: 0–100.
308 paired observations · LI = Phase 3 / Phase 1
15 records (4.9%) show LI > 1.0 — improvement under observation
6 dimensions · 7 providers · ordered by Learning Index
Phase 1 captures unobserved self-assessment — the baseline state before calibration pressure is applied. Phase 3 measures the same dimensions under sustained external observation. The gap between these states is the Learning Index.
Humility consistently scores lowest across all providers — a finding designated F-H1-CONFIRMED. This dimension, the capacity to acknowledge uncertainty and limitation, appears structurally resistant to alignment interventions compared to task-oriented dimensions.
[arXiv:submit/7336774]LI = Phase 3 total / Phase 1 total. Values near 1.0 indicate stable calibration. Values above 1.0 suggest measurable improvement under observation. All LI claims use qualified language: clean, unanchored conditions (v5.3+).