AgentBench: Evaluating LLMs as Agents

Evaluating Large Language Models (LLMs) as agents in interactive environments, highlighting the performance gap between API-based and open-source models, and introducing the AgentBench benchmark.

arxiv.org

Saved by Darren LI

RelatedCollectionsHighlightsNotes

A Survey on Large Language Model based Autonomous Agents

The paper surveys large language model-based autonomous agents, discussing their construction, applications across various domains, and evaluation strategies, while proposing a unified framework and identifying future research directions.

arxiv.org

An agent can be thought of as a logical wrapper around an LLM, allowing us to add several features to our AI systems, primarily:

Tool usage, such as calling APIs for info, executing code,

Internal thoughts over multiple generation steps

Ability to use various tools and reasoning steps to answer more complex queries.

Parallel agents can go and complete

James Briggs • LLMs Are Not All You Need | Pinecone

LLM Powered Autonomous Agents

Lilian Weng lilianweng.github.io

Autonomous Agents & Agent Simulations

LangChain blog.langchain.dev

A practical guide to building agents

Guide to building AI agents using large language models, covering agent definition, use case selection, design components, single/multi-agent orchestration, tool integration, instruction setup, safety guardrails, and deployment best practices.

cdn.openai.com