AgentBench: Evaluating LLMs as Agents

Evaluating Large Language Models (LLMs) as agents in interactive environments, highlighting the performance gap between API-based and open-source models, and introducing the AgentBench benchmark.

added by Darren LI · updated 1y ago

  • Community Paper Reading

    6 cards · by Darren LI

    Darren LI added 1y ago

  • Generative AI

    125 cards · by sari and

    Darren LI added 1y ago

  • Autonomous agents

    15 cards · by Darren LI and

    Darren LI added 1y ago