AgentBench: Evaluating LLMs as Agents
Evaluating Large Language Models (LLMs) as agents in interactive environments, highlighting the performance gap between API-based and open-source models, and introducing the AgentBench benchmark.
arxiv.orgSaved by Darren LI