AgentBench: Evaluating LLMs as Agents

AgentBench: Evaluating LLMs as Agents

Evaluating Large Language Models (LLMs) as agents in interactive environments, highlighting the performance gap between API-based and open-source models, and introducing the AgentBench benchmark.

arxiv.org

Saved by Darren LI

Xiao Liu AgentBench: Evaluating LLMs as Agents

Xiao Liu AgentBench: Evaluating LLMs as Agents