LLMs
The new seed parameter enables reproducible outputs by making the model return consistent completions most of the time. This beta feature is useful for use cases such as replaying requests for debugging, writing more comprehensive unit tests, and generally having a higher degree of control over the model behavior. We at OpenAI have been using this ... See more
New models and developer products announced at DevDay
To train LLMs, you need data that is:
Large — Sufficiently large LMs require trillions of tokens.
Clean — Noisy data reduces performance.
Diverse — Data should come from different sources and different knowledge bases.
What does clean data look like?
You can de-duplicate data with simple heuristics. The most basic would be removing any exact duplicates ... See more
Large — Sufficiently large LMs require trillions of tokens.
Clean — Noisy data reduces performance.
Diverse — Data should come from different sources and different knowledge bases.
What does clean data look like?
You can de-duplicate data with simple heuristics. The most basic would be removing any exact duplicates ... See more
Shortwave — rajhesh.panchanadhan@gmail.com [Gmail alternative]
Today, we’re releasing the Assistants API, our first step towards helping developers build agent-like experiences within their own applications. An assistant is a purpose-built AI that has specific instructions, leverages extra knowledge, and can call models and tools to perform tasks. The new Assistants API provides new capabilities such as Code I... See more
New models and developer products announced at DevDay
Menlo Ventures released a report on ‘The State of Generative AI in the Enterprise’ and found that adoption is trailing the hype. Details below:
Generative AI still represents less than 1% of cloud spend by surveyed enterprises, including just an 8% increase in 2023.
Safety and ROI continue to be prime concerns, and the tangible advantages of being fi... See more
Generative AI still represents less than 1% of cloud spend by surveyed enterprises, including just an 8% increase in 2023.
Safety and ROI continue to be prime concerns, and the tangible advantages of being fi... See more
Shortwave — rajhesh.panchanadhan@gmail.com [Gmail alternative]
Langfuse is an open source observability & analytics solution for LLM-based applications. It is mostly geared towards production usage but some users also use it for local development of their LLM applications.
Langfuse is focused on applications built on top of LLMs. Many new abstractions and common best practices evolved recently, e.g. agents,... See more
Langfuse is focused on applications built on top of LLMs. Many new abstractions and common best practices evolved recently, e.g. agents,... See more
langfuse • GitHub - langfuse/langfuse: Open source observability and analytics for LLM applications
LLM-PowerHouse: A Curated Guide for Large Language Models with Custom Training and Inferencing
Welcome to LLM-PowerHouse, your ultimate resource for unleashing the full potential of Large Language Models (LLMs) with custom training and inferencing. This GitHub repository is a comprehensive and curated guide designed to empower developers, researche... See more
Welcome to LLM-PowerHouse, your ultimate resource for unleashing the full potential of Large Language Models (LLMs) with custom training and inferencing. This GitHub repository is a comprehensive and curated guide designed to empower developers, researche... See more
ghimiresunil • GitHub - ghimiresunil/LLM-PowerHouse-A-Curated-Guide-for-Large-Language-Models-with-Custom-Training-and-Inferencing: LLM-PowerHouse: Unleash LLMs' potential through curated tutorials, best practices, and ready-to-use code for custom training and inferencing.
In general, I see LLMs to be used in two broad categories: data processing, which is more of a worker use-cases, where the latency isn't the biggest issue but rather quality, and in user-interactions, where latency is a big factor. I think for the faster case a faster fallback is necessary. Or you escalate upwards, you first rely on a smaller more ... See more
Discord - A New Way to Chat with Friends & Communities
Clean & curate your data with LLMs
databonsai is a Python library that uses LLMs to perform data cleaning tasks.
Features
databonsai is a Python library that uses LLMs to perform data cleaning tasks.
Features
- Suite of tools for data processing using LLMs including categorization, transformation, and extraction
- Validation of LLM outputs
- Batch processing for token savings
- Retry logic with exponential backoff for handling rate limits an
databonsai • GitHub - databonsai/databonsai: clean & curate your data with LLMs.
The xAI PromptIDE is an integrated development environment for prompt engineering and interpretability research. It accelerates prompt engineering through an SDK that allows implementing complex prompting techniques and rich analytics that visualize the network's outputs. We use it heavily in our continuous development of Grok.