GitHub - sqrkl/lm-evaluation-harness: A framework for few-shot evaluation of language models.

Thumbnail of GitHub - sqrkl/lm-evaluation-harness: A framework for few-shot evaluation of language models.

updated 4mo ago

  • Large variety of ready-to-use LLM evaluation metrics (all with explanations) powered by

    ANY

    LLM of your choice, statistical methods, or NLP models that runs

    locally on your machine

    :

    • G-Eval

    • Summarization

    • Answer Relevancy

    • Faithfulness

    • Contextual Recall

    • Contextual Precision

    • RAGAS

    • Hallucination

    • Toxicity

    • Bias

    • etc.

from GitHub - confident-ai/deepeval: The LLM Evaluation Framework

Nicolay Gerold added

  • from GitHub - arthur-ai/bench: A tool for evaluating LLMs

    BA Builder added

  • from Testing framework for LLM Part

    Nicolay Gerold added