GitHub - sqrkl/lm-evaluation-harness: A framework for few-shot evaluation of language models.

GitHub - sqrkl/lm-evaluation-harness: A framework for few-shot evaluation of language models.

github.com
Thumbnail of GitHub - sqrkl/lm-evaluation-harness: A framework for few-shot evaluation of language models.
Eugene Yanx.com

Xiao Liu • AgentBench: Evaluating LLMs as Agents