Autonomous agents

Autonomous Agents Directory

AgentBench: Evaluating LLMs as Agents

Evaluating Large Language Models (LLMs) as agents in interactive environments, highlighting the performance gap between API-based and open-source models, and introducing the AgentBench benchmark.

arxiv.org

DDarren LI

AgentBench: Evaluating LLMs as Agents

Main OpenAI dev day takeaways: (1) Existential crisis for AI wrappers and middleware. (2) Opportunity for people with good taste in UI and can figure out distribution creatively. Launching something useful has never been easier. The likelihood of... See more

Alex Ker 🔭x.com

phoebe