Engineering

We often hear that AI developer teams delay creating evals because they believe that only large evals with hundreds of test cases are useful. However, it’s best to start with small-scale testing right away with a few examples, rather than delaying until you can build more thorough evals.

anthropic.com • How We Built Our Multi-Agent Research System

It turns out that we were almost there with the original Rainbow Deploy idea. The key was simple: instead of using fixed colors, we used git hashes. Instead of a Deployment called chat-olark-com-$COLOR we deploy chat-olark-com-$SHA . As a bonus, since the first six characters of a git sha are also a valid hex color, the name still makes sense. You... See more

anthropic.com • How We Built Our Multi-Agent Research System

Rainbow Deploys with Kubernetes | Brandon Dimcheff