LLM Evaluation

Large variety of ready-to-use LLM evaluation metrics (all with explanations) powered by

ANY

LLM of your choice, statistical methods, or NLP models that runs

locally on your machine

:

G-Eval

Summarization

Answer Relevancy

Faithfulness

Contextual Recall

Contextual Precision

RAGAS

Hallucination

Toxicity

Bias

etc.

GitHub - confident-ai/deepeval: The LLM Evaluation Framework

Take a look at our official page for user documentation and examples: langtest.org

Key Features

Generate and execute more than 50 distinct types of tests only with 1 line of code

Test all aspects of model quality: robustness, bias, representation, fairness and accuracy.

Automatically augment training data based on test results (for select models)

GitHub - confident-ai/deepeval: The LLM Evaluation Framework

GitHub - BrunoScaglione/langtest: Deliver safe & effective language models