AgentBench: Evaluating LLMs as Agents

AgentBench: Evaluating LLMs as Agents

Evaluating Large Language Models (LLMs) as agents in interactive environments, highlighting the performance gap between API-based and open-source models, and introducing the AgentBench benchmark.

arxiv.org

Saved by Darren LI

A Survey on Large Language Model based Autonomous Agents

The paper surveys large language model-based autonomous agents, discussing their construction, applications across various domains, and evaluation strategies, while proposing a unified framework and identifying future research directions.

arxiv.org

LLM Powered Autonomous Agents

Lilian Wenglilianweng.github.io

Autonomous Agents & Agent Simulations

LangChainblog.langchain.dev
Thumbnail of Autonomous Agents & Agent Simulations

A practical guide to building agents

Guide to building AI agents using large language models, covering agent definition, use case selection, design components, single/multi-agent orchestration, tool integration, instruction setup, safety guardrails, and deployment best practices.

cdn.openai.com

How to think about agent frameworks

blog.langchain.dev
Thumbnail of How to think about agent frameworks

James Briggs LLMs Are Not All You Need | Pinecone