• Large variety of ready-to-use LLM evaluation metrics (all with explanations) powered by

    ANY

    LLM of your choice, statistical methods, or NLP models that runs

    locally on your machine

    :

    • G-Eval

    • Summarization

    • Answer Relevancy

    • Faithfulness

    • Contextual Recall

    • Contextual Precision

    • RAGAS

    • Hallucination

    • Toxicity

    • Bias

    • etc.

from GitHub - confident-ai/deepeval: The LLM Evaluation Framework

Nicolay Gerold added 4mo ago

  • from Shortwave — rajhesh.panchanadhan@gmail.com [Gmail alternative]

    Nicolay Gerold added 10mo ago

  • from langchain-ai/opengpts by langchain-ai

    Nicolay Gerold added 1y ago

  • from DeepSpeed-FastGen by microsoft

    Nicolay Gerold added 1y ago

  • In some applications, such as inline code suggestions, the best AI models are too expensive, so tools like Github Copilot use carefully tuned smaller models and various search heuristics to provide results. In other applications, even the largest models, like GPT-4, are too cheap!

  • from The Shift From Models to Compound AI Systems by Matei Zaharia, Omar Khattab, Lingjiao Chen, et al.

    Nicolay Gerold added 7mo ago

  • from The Q* hypothesis: Tree-of-thoughts reasoning, process reward models, and supercharging synthetic data by Nathan Lambert

    Nicolay Gerold added 10mo ago

  • from Shortwave — rajhesh.panchanadhan@gmail.com [Gmail alternative]

    Nicolay Gerold added 10mo ago

  • from Shortwave — rajhesh.panchanadhan@gmail.com [Gmail alternative]

    Nicolay Gerold added 10mo ago

  • Systems can be dynamic. Machine learning models are inherently limited because they are trained on static datasets, so their “knowledge” is fixed. Therefore, developers need to combine models with other components, such as search and retrieval, to incorporate timely data. In addition, training lets a model “see” the whole training set, so more com
  • ... See more

    from The Shift From Models to Compound AI Systems by Matei Zaharia, Omar Khattab, Lingjiao Chen, et al.

    Nicolay Gerold added 7mo ago