GitHub - sqrkl/lm-evaluation-harness: A framework for few-shot evaluation of language models.
updated 4mo ago
updated 4mo ago
Nicolay Gerold added
BA Builder added
promptfoo is a tool for testing and evaluating LLM output quality.... See more
With promptfoo, you can:
Systematically test prompts & models against predefined test cases
Evaluate quality and catch regressions by comparing LLM outputs side-by-side
Speed up evaluations with caching and concurrency
Score outputs automatically by defining test cases
Use as a
Nicolay Gerold added