GitHub - THUDM/AgentTuning: AgentTuning: Enabling Generalize...

GitHub - THUDM/AgentTuning: AgentTuning: Enabling Generalized Agent Abilities for LLMs

RelatedHighlights

AgentBench: Evaluating LLMs as Agents

Xiao Liu • AgentBench: Evaluating LLMs as Agents

Our extensive test over 25 LLMs (including APIs and open-sourced models) shows that, while top commercial LLMs present a strong ability of acting as agents in complex environments, there is a significant disparity in performance between them and open-sourced competitors.

Xiao Liu • AgentBench: Evaluating LLMs as Agents

Darren LI added

AutoGen's design offers multiple advantages: a) it gracefully navigates the strong but imperfect generation and reasoning abilities of these LLMs; b) it leverages human understanding and intelligence, while providing valuable automation through conversations between agents; c) it simplifies and unifies the implementation of complex LLM workflows as... See more

r/singularity - Reddit

Nicolay Gerold added

Chaining LLM Agents instead of LLM calls. Seems like a pretty heavy prompt engineering effort. They are pushing for agents that are specialized in a certain tasks through RAG / finetuning, where CAMEL and other frameworks failed. One interesting area for exploration might be finetuning LLMs for collaboration before finetuning them for tasks.

SteerLM leverages a supervised fine-tuning method that empowers you to control responses during inference. It overcomes the limitations of prior alignment techniques, and consists of four key steps:

Train an attribute prediction model on human-annotated datasets to evaluate response quality on any number of attributes like helpfulness, humor, and cr

Yi Dong, Zhilin Wang • NVIDIA Technical Blog | News and tutorials for developers, data ...

Nicolay Gerold added

a couple of the top of my head:

LLM in the loop with preference optimization

synthetic data generation

cross modality "distillation" / dictionary remapping

constrained decoding

r/MachineLearning - Reddit

Nicolay Gerold added

Additional LLM paradigms beyond RAG

AgentBench: Evaluating LLMs as Agents

Evaluating Large Language Models (LLMs) as agents in interactive environments, highlighting the performance gap between API-based and open-source models, and introducing the AgentBench benchmark.

arxiv.org

Darren LI added

AgentBench: Evaluating LLMs as Agents

Fine-Tuning Large Language Models with Sequential Instructions

ar5iv.labs.arxiv.org

Ayoola John added

LLM Powered Autonomous Agents

Lilian Weng lilianweng.github.io

Darren LI and added