AgentBench: Evaluating LLMs as Agents

Evaluating Large Language Models (LLMs) as agents in interactive environments, highlighting the performance gap between API-based and open-source models, and introducing the AgentBench benchmark.

arxiv.org

Saved by Darren LI

LLM Powered Autonomous Agents

Lilian Weng lilianweng.github.io

An agent can be thought of as a logical wrapper around an LLM, allowing us to add several features to our AI systems, primarily:

Tool usage, such as calling APIs for info, executing code,

Internal thoughts over multiple generation steps

Ability to use various tools and reasoning steps to answer more complex queries.

Parallel agents can go and complete

James Briggs • LLMs Are Not All You Need | Pinecone

Building Effective AI Agents

Barry Zhang anthropic.com