GitHub - arthur-ai/bench: A tool for evaluating LLMs

github.com

RelatedHighlights

GitHub - mit-han-lab/streaming-llm: Efficient Streaming Language Models with Attention Sinks

mit-han-lab github.com

Darren LI and added

GitHub - MadcowD/ell: A language model programming library.

github.com

added

DeepBench will be the go-to tool that people visit to learn things that aren’t easily discoverable via Google.

DeepBench • Knowledge as an addictive drug—an investor’s guide to the expert network industry

sari added

GitHub - FlowiseAI/Flowise: Drag & drop UI to build your customized LLM flow

github.com

Andrés added

AgentBench: Evaluating LLMs as Agents

Xiao Liu • AgentBench: Evaluating LLMs as Agents

Darren LI added

Deep-ML

deep-ml.com

and added

GitHub - romkatv/zsh-bench: Benchmark for interactive Zsh

romkatv github.com

Daniel Bauke added

Welcome to prompttools created by Hegel AI! This repo offers a set of open-source, self-hostable tools for experimenting with, testing, and evaluating LLMs, vector databases, and prompts. The core idea is to enable developers to evaluate using familiar interfaces like code, notebooks, and a local playground.

In just a few lines of codes, you can t

Testing framework for LLM Part

Nicolay Gerold added