AgentBench: Evaluating LLMs as Agents

Evaluating Large Language Models (LLMs) as agents in interactive environments, highlighting the performance gap between API-based and open-source models, and introducing the AgentBench benchmark.

added by Darren LI · updated 1y ago

  • LLM Powered Autonomous Agents

    by Lilian Weng

    1 highlight

    Darren LI and added

  • from LLM agents: the next platform shift in B2B software by Chris Rainville

    Nicolay Gerold added

  • A Survey on Large Language Model based Autonomous Agents

    A comprehensive survey on large language model (LLM)-based autonomous agents, including their architecture design, application domains, and evaluation strategies.

    by Chen Ma

    Darren LI added