GitHub - confident-ai/deepeval: The LLM Evaluation Framework
GitHub - confident-ai/deepeval: The LLM Evaluation Framework
github.com
Related
Insights
Highlights
Take a look at our official page for user documentation and examples:
langtest.org
Key Features
Generate and execute more than 50 distinct types of tests only with 1 line of code
Test all aspects of model quality: robustness, bias, representation, fairness and accuracy.
Automatically augment training data based on test results (for select models)
Sup
...
See more
GitHub - BrunoScaglione/langtest: Deliver safe & effective language models
Open LLM Leaderboard - a Hugging Face Space by open-llm-leaderboard
huggingface.co
Laminar
lmnr.ai
AgentBench: Evaluating LLMs as Agents
Evaluating Large Language Models (LLMs) as agents in interactive environments, highlighting the performance gap between API-based and open-source models, and introducing the AgentBench benchmark.
arxiv.org
LLM evaluation framework https://t.co/2KZiZyMz9I
Tom Dörr
x.com
Unlock unlimited Related cards