GitHub - Giskard-AI/giskard: 🐢 The testing framework for ML...

GitHub - Giskard-AI/giskard: 🐢 The testing framework for ML models, from tabular to LLMs

Giskard-AI github.com

RelatedHighlights

Take a look at our official page for user documentation and examples: langtest.org

Key Features

Generate and execute more than 50 distinct types of tests only with 1 line of code

Test all aspects of model quality: robustness, bias, representation, fairness and accuracy.

Automatically augment training data based on test results (for select models)

GitHub - BrunoScaglione/langtest: Deliver safe & effective language models

Nicolay Gerold added

Welcome to prompttools created by Hegel AI! This repo offers a set of open-source, self-hostable tools for experimenting with, testing, and evaluating LLMs, vector databases, and prompts. The core idea is to enable developers to evaluate using familiar interfaces like code, notebooks, and a local playground.

In just a few lines of codes, you can t

Testing framework for LLM Part

Nicolay Gerold added

Creative AI Lab

creative-ai.org

Isabelle Levent added

🎄 lakera.ai

An Overview of Lakera Guard — Bringing Enterprise-Grade Security to LLMs with One Line of Code

At Lakera, we supercharge AI developers by enabling them to swiftly identify and eliminate their AI applications’ security threats so that they can focus on building the most exciting applications securely.

Businesses around the world are in

Testing framework for LLM Part

Nicolay Gerold added

baserun.ai💪💪💪

Testing & Observability Platform for LLM Apps

From prompt playground to end-to-end tests, baserun helps you ship your LLM apps with confidence and speed.

Testing framework for LLM Part

Nicolay Gerold added

GitHub - AI4Finance-Foundation/FinRobot: FinRobot: An Open-Source AI Agent Platform for Financial Applications using LLMs 🚀 🚀 🚀

Steve Werber added

GitHub - arthur-ai/bench: A tool for evaluating LLMs

BA Builder added

Creating test suites - bench documentation

BA Builder added

Best way to manage ai experiments. Very generic and extendible. Should make something similar.