Creating test suites - bench documentation
Key Features
- Generate and execute more than 50 distinct types of tests only with 1 line of code
- Test all aspects of model quality: robustness, bias, representation, fairness and accuracy.
- Automatically augment training data based on test results (for select models)
- Sup
GitHub - BrunoScaglione/langtest: Deliver safe & effective language models
Nicolay Gerold added
The goal of a benchmark usability test is to describe how usable an application is relative to a set of benchmark goals.
Jeff Sauro • Quantifying the User Experience: Practical Statistics for User Research
You want to ensure that you run your tests for at least a hundred conversions on each variation, but the exact number might be unique to your own website.
Alex Harris • Small Business Big Money Online: A Proven System to Optimize eCommerce Websites and Increase Internet Profits
Test systems at production scale:
Amazon Web Services • AWS Well-Architected Framework (AWS Whitepaper)
Objective #3 (20%): Execute at least 15 unique tests during the fiscal year. Document what was learned in each test, and share results with the executive team.
Kevin Hillstrom • Hillstrom's Email Marketing Excellence
The size of your test will be constrained by the traffic to your landing page and its data rate (the number of conversion actions per unit time). Changing the granularity of your tests allows you to include all or most of your important ideas while still fitting into a reasonable test size.
Maura Ginty • Landing Page Optimization: The Definitive Guide to Testing and Tuning for Conversions
New updates and features include:
- Internal refactoring
- Config-based task creation and configuration
- Easier import and sharing of externally-defined task config YAMLs
- Support for Jinja2 prompt design, easy modification of prompts + prompt imports from Promptsource
- More advanced configuration opt
GitHub - sqrkl/lm-evaluation-harness: A framework for few-shot evaluation of language models.
Nicolay Gerold added