AgentBench: Evaluating LLMs as Agents

AgentBench: Evaluating LLMs as Agents

Evaluating Large Language Models (LLMs) as agents in interactive environments, highlighting the performance gap between API-based and open-source models, and introducing the AgentBench benchmark.

arxiv.org

Saved by Darren LI

[12112024, Latent Space @ NeurIPS] Reasoning

docs.google.com

OASIS | Open Agents Social Interaction Simulations on One Million Agents

oasis.camel-ai.org

OpenBMB • GitHub - OpenBMB/XAgent: An Autonomous LLM Agent for Complex Task Solving

Large language models are proficient in solving and creating emotional intelligence tests - Communications Psychology

Nils R. Sommernature.com
Thumbnail of Large language models are proficient in solving and creating emotional intelligence tests - Communications Psychology

Reasoning models - OpenAI API

platform.openai.com
Thumbnail of Reasoning models - OpenAI API