AgentBench: Evaluating LLMs as Agents

Evaluating Large Language Models (LLMs) as agents in interactive environments, highlighting the performance gap between API-based and open-source models, and introducing the AgentBench benchmark.

arxiv.org

Saved by Darren LI

RelatedCollectionsHighlightsNotes

A Survey on Large Language Model based Autonomous Agents

A comprehensive survey on large language model (LLM)-based autonomous agents, including their architecture design, application domains, and evaluation strategies.

arxiv.org

Darren LI added

LLM Powered Autonomous Agents

Lilian Weng lilianweng.github.io

Darren LI and added

📖 Introduction

XAgent is an open-source experimental Large Language Model (LLM) driven autonomous agent that can automatically solve various tasks. It is designed to be a general-purpose agent that can be applied to a wide range of tasks. XAgent is still in its early stages, and we are working hard to improve it.

🏆 Our goal is to create a super-in... See more

OpenBMB • GitHub - OpenBMB/XAgent: An Autonomous LLM Agent for Complex Task Solving

Nicolay Gerold added

An agent can be thought of as a logical wrapper around an LLM, allowing us to add several features to our AI systems, primarily:

Tool usage, such as calling APIs for info, executing code,

Internal thoughts over multiple generation steps

Ability to use various tools and reasoning steps to answer more complex queries.

Parallel agents can go and complete

James Briggs • LLMs Are Not All You Need | Pinecone

Nicolay Gerold and added

LLMs struggle when handling tasks which require extensive knowledge. This limitation highlights the need to supplement LLMs with non-parametric knowledge. This paper Prompting Large Language Models with Knowledge Graphs for Question Answering Involving Long-tail Facts analyze the effects of different types of non-parametric knowledge, such as textu... See more

Shortwave — rajhesh.panchanadhan@gmail.com [Gmail alternative]

Nicolay Gerold added

How Far Are Large Language Models from Agents with Theory-of-Mind?

A study examines whether large language models can go beyond understanding mental states to effectively use that understanding in decision-making and taking actions in social scenarios.

arxiv.org

Bhaumik Patel added

AutoGen's design offers multiple advantages: a) it gracefully navigates the strong but imperfect generation and reasoning abilities of these LLMs; b) it leverages human understanding and intelligence, while providing valuable automation through conversations between agents; c) it simplifies and unifies the implementation of complex LLM workflows as... See more

r/singularity - Reddit

Nicolay Gerold added

Chaining LLM Agents instead of LLM calls. Seems like a pretty heavy prompt engineering effort. They are pushing for agents that are specialized in a certain tasks through RAG / finetuning, where CAMEL and other frameworks failed. One interesting area for exploration might be finetuning LLMs for collaboration before finetuning them for tasks.

The community remains puzzled about whether these models genuinely generalize to unseen tasks, or seemingly succeed by memorizing the training data. This paper makes important strides in addressing this question. It constructs a suite of carefully designed counterfactual evaluations, providing fresh insights into the capabilities of state-of-the-ar... See more

Zhaofeng Wu • Reasoning skills of large language models are often overestimated

Mary Martin added

https://news.mit.edu/2024/reasoning-skills-large-language-models-often-overestimated-0711