AgentBench: Evaluating LLMs as Agents

AgentBench: Evaluating LLMs as Agents

Evaluating Large Language Models (LLMs) as agents in interactive environments, highlighting the performance gap between API-based and open-source models, and introducing the AgentBench benchmark.

arxiv.org

Saved by Darren LI

A Survey on Large Language Model based Autonomous Agents

The paper surveys large language model-based autonomous agents, discussing their construction, applications across various domains, and evaluation strategies, while proposing a unified framework and identifying future research directions.

arxiv.org

James Briggs LLMs Are Not All You Need | Pinecone

LLM Powered Autonomous Agents

Lilian Wenglilianweng.github.io

Autonomous Agents & Agent Simulations

LangChainblog.langchain.dev
Thumbnail of Autonomous Agents & Agent Simulations

A practical guide to building agents

Guide to building AI agents using large language models, covering agent definition, use case selection, design components, single/multi-agent orchestration, tool integration, instruction setup, safety guardrails, and deployment best practices.

cdn.openai.com