vectara/hallucination_evaluation_model

vectara/hallucination_evaluation_model · Hugging Face

RelatedHighlights

Hallucinations are when generative AI tools give incorrect, misleading, or made-up answers. While there are a number of ways to prevent hallucinations, Szilagyi said the key way his firm does so is by restricting data sources.

“Primarily it’s a closed system where what you’re really doing is relying on a large language model to interpret between you... See more

Claudia added

Large variety of ready-to-use LLM evaluation metrics (all with explanations) powered by

ANY

LLM of your choice, statistical methods, or NLP models that runs

locally on your machine

:

G-Eval

Summarization

Answer Relevancy

Faithfulness

Contextual Recall

Contextual Precision

RAGAS

Hallucination

Toxicity

Bias

etc.

GitHub - confident-ai/deepeval: The LLM Evaluation Framework

Nicolay Gerold added

Repository for the paper "The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning", including 1.84M CoT rationales extracted across 1,060 tasks"

Paper Link : https://arxiv.org/abs/2305.14045

kaistAI • GitHub - kaistAI/CoT-Collection: [Under Review] The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning

Nicolay Gerold added

Model Card for Zephyr 7B β

Zephyr is a series of language models that are trained to act as helpful assistants. Zephyr-7B-β is the second model in the series, and is a fine-tuned version of mistralai/Mistral-7B-v0.1 that was trained on on a mix of publicly available, synthetic datasets using Direct Preference Optimization (DPO). We found that removi... See more

HuggingFaceH4/zephyr-7b-beta · Hugging Face

Nicolay Gerold added

GPT-4V Safety and Deployment Preparation

Analysis of safety preparations and evaluations for GPT-4V, a multimodal language model with image analysis capabilities, including early access testing, red teaming, and mitigations for potential risks and limitations.

cdn.openai.com

Darren LI added

LLaVA v1.5, a new open-source multimodal model stepping onto the scene as a contender against GPT-4 with multimodal capabilities. It uses a simple projection matrix to connect the pre-trained CLIP ViT-L/14 vision encoder with Vicuna LLM, resulting in a robust model that can handle images and text. The model is trained in two stages: first, updated ... See more

This AI newsletter is all you need #68

Nicolay Gerold added

Untitled

trysunlight.ai

added

Sunlight AI uses Large Language Models (LLMs) to analyze the word, substance, and structural choices of online text content. Submit a URL to generate a report.

DEMO VERSION

I tried it on my own blog piece, it’s accurate and correctly labelled as “introspective”. A lot of speculation and exaggeration, true, but that was the point. Not bad at all.

It’s created more for subjective articles to clear the air between facts, opinions and speculation. It also focusses on political biases, so this would be geared towards op-eds imo.

Managing the risks of inevitably biased visual artificial intelligence systems

brookings.edu

Laura Pike Seeley added