Testing framework for LLM Part

medium.com

RelatedHighlights

Take a look at our official page for user documentation and examples: langtest.org

Key Features

Generate and execute more than 50 distinct types of tests only with 1 line of code

Test all aspects of model quality: robustness, bias, representation, fairness and accuracy.

Automatically augment training data based on test results (for select models)

GitHub - BrunoScaglione/langtest: Deliver safe & effective language models

Nicolay Gerold added

Large variety of ready-to-use LLM evaluation metrics (all with explanations) powered by

ANY

LLM of your choice, statistical methods, or NLP models that runs

locally on your machine

:

G-Eval

Summarization

Answer Relevancy

Faithfulness

Contextual Recall

Contextual Precision

RAGAS

Hallucination

Toxicity

Bias

etc.

GitHub - confident-ai/deepeval: The LLM Evaluation Framework

Nicolay Gerold added

Introducing Prompts: LLM Monitoring

W&B Prompts – LLM Monitoring provides large language model usage monitoring and diagnostics. Start simply, then customize and evolve your monitoring analytics over time.

Monitoring

Nicolay Gerold added

Monitoring Tools

Some techniques I have seen in the wild

https://github.com/ruvnet/promptlang

this style was popular back in the early days of gpt-3.5 when it landed, for some reason, it fell out of fashion

Write a super long persona mapping prompt with a well defined action space

Best example of this is the V... See more

r/LocalLLaMA - Reddit

Nicolay Gerold added

What We Learned From a Year of Building With LLMs

Bryan Bischof oreilly.com

and added

Amplify Partners was running a survey among 800+ AI engineers to bring transparency to the AI Engineering space. The report is concise, yet it provides a wealth of insights into the technologies and methods employed by companies for the implementation of AI products.

Highlights

👉 Top AI use cases are code intelligence, data extraction and workflow a... See more

Feed | LinkedIn

added

Our extensive test over 25 LLMs (including APIs and open-sourced models) shows that, while top commercial LLMs present a strong ability of acting as agents in complex environments, there is a significant disparity in performance between them and open-sourced competitors.

Xiao Liu • AgentBench: Evaluating LLMs as Agents

Darren LI added

OpenAI just dropped their Prompt Engineering guide. Here are 6 strategies they recommend for getting better results from LLMs:

MatthewBerman

x.com

and added