AgentBench: Evaluating LLMs as Agents

AgentBench: Evaluating LLMs as Agents

Evaluating Large Language Models (LLMs) as agents in interactive environments, highlighting the performance gap between API-based and open-source models, and introducing the AgentBench benchmark.

arxiv.org

Saved by Darren LI

A Survey on Large Language Model based Autonomous Agents

A comprehensive survey on large language model (LLM)-based autonomous agents, including their architecture design, application domains, and evaluation strategies.

arxiv.org

Darren LI added

LLM Powered Autonomous Agents

Lilian Wenglilianweng.github.io

Darren LI and added

OpenBMB β€’ GitHub - OpenBMB/XAgent: An Autonomous LLM Agent for Complex Task Solving

James Briggs β€’ LLMs Are Not All You Need | Pinecone

Nicolay Gerold and added

Shortwave β€” rajhesh.panchanadhan@gmail.com [Gmail alternative]

How Far Are Large Language Models from Agents with Theory-of-Mind?

A study examines whether large language models can go beyond understanding mental states to effectively use that understanding in decision-making and taking actions in social scenarios.

arxiv.org

r/singularity - Reddit

Nicolay Gerold added

Zhaofeng Wu β€’ Reasoning skills of large language models are often overestimated