Testing framework for LLM Part
Take a look at our official page for user documentation and examples: langtest.org
Key Features
Key Features
- Generate and execute more than 50 distinct types of tests only with 1 line of code
- Test all aspects of model quality: robustness, bias, representation, fairness and accuracy.​
- Automatically augment training data based on test results (for select models)​
- Sup
GitHub - BrunoScaglione/langtest: Deliver safe & effective language models
Nicolay Gerold added
ANY
LLM of your choice, statistical methods, or NLP models that runs
locally on your machine
:
- G-Eval
- Summarization
- Answer Relevancy
- Faithfulness
- Contextual Recall
- Contextual Precision
- RAGAS
- Hallucination
- Toxicity
- Bias
- etc.
GitHub - confident-ai/deepeval: The LLM Evaluation Framework
Nicolay Gerold added
Introducing Prompts: LLM Monitoring
W&B Prompts – LLM Monitoring provides large language model usage monitoring and diagnostics. Start simply, then customize and evolve your monitoring analytics over time.
W&B Prompts – LLM Monitoring provides large language model usage monitoring and diagnostics. Start simply, then customize and evolve your monitoring analytics over time.
Monitoring
Nicolay Gerold added
Monitoring Tools
Some techniques I have seen in the wild
this style was popular back in the early days of gpt-3.5 when it landed, for some reason, it fell out of fashion
Best example of this is the V... See more
- Create a semi-strict programming language like promptlang - https://github.com/ruvnet/promptlang
this style was popular back in the early days of gpt-3.5 when it landed, for some reason, it fell out of fashion
- Write a super long persona mapping prompt with a well defined action space
Best example of this is the V... See more
r/LocalLLaMA - Reddit
Nicolay Gerold added
Amplify Partners was running a survey among 800+ AI engineers to bring transparency to the AI Engineering space. The report is concise, yet it provides a wealth of insights into the technologies and methods employed by companies for the implementation of AI products.
Highlights
👉 Top AI use cases are code intelligence, data extraction and workflow a... See more
Highlights
👉 Top AI use cases are code intelligence, data extraction and workflow a... See more
Feed | LinkedIn
Our extensive test over 25 LLMs (including APIs and open-sourced models) shows that, while top commercial LLMs present a strong ability of acting as agents in complex environments, there is a significant disparity in performance between them and open-sourced competitors.
Xiao Liu • AgentBench: Evaluating LLMs as Agents
Darren LI added
OpenAI just dropped their Prompt Engineering guide.
Here are 6 strategies they recommend for getting better results from LLMs: