Creating test suites - bench documentation

bench.readthedocs.io

RelatedInsightsHighlights

Thumbnail of www-x-com-karpathy-status-1760022429605474550-b742b37857b64fd9

"My benchmark for large language models" https://t.co/YZBuwpL0tl Nice post but even more than the 100 tests specifically, the Github code looks excellent - full-featured test evaluation framework, easy to extend with further tests and run against many... See more

Andrej Karpathy

x.com

GitHub - arthur-ai/bench: A tool for evaluating LLMs

Take a look at our official page for user documentation and examples: langtest.org

Key Features

Generate and execute more than 50 distinct types of tests only with 1 line of code

Test all aspects of model quality: robustness, bias, representation, fairness and accuracy.

Automatically augment training data based on test results (for select models)

GitHub - arthur-ai/bench: A tool for evaluating LLMs

GitHub - BrunoScaglione/langtest: Deliver safe & effective language models