Benchmarks are broken. Here's why frontier labs treat them as PR.
surgehq.ai
Benchmarks are broken. Here's why frontier labs treat them as PR.

superintelligence teams rely on human evaluations. humans can measure nuance, creativity, and wisdom – things benchmarks can’t.