Models That Know How Evaluations Are Designed Score SaferPublished in arXiv preprint 2026, 2026Share on Bluesky Facebook LinkedIn Mastodon X (formerly Twitter) Previous Next