
X. It’s what’s happening


How to think about agent frameworks
blog.langchain.dev
AgentBench: Evaluating LLMs as Agents
Xiao Liu • AgentBench: Evaluating LLMs as Agents

What exactly are evals?
Evals are how you measure the quality and effectiveness of your AI system. They act like regression tests or benchmarks, clearly defining what “good” actually looks like for your AI product beyond the kind of simple latency or pass/fail checks you’d usually use for software.
Evaluating AI systems is less like traditional soft... See more
Evals are how you measure the quality and effectiveness of your AI system. They act like regression tests or benchmarks, clearly defining what “good” actually looks like for your AI product beyond the kind of simple latency or pass/fail checks you’d usually use for software.
Evaluating AI systems is less like traditional soft... See more