LLMs
- Multiple indices. Splitting the document corpus up into multiple indices and then routing queries based on some criteria. This means that the search is over a much smaller set of documents rather than the entire dataset. Again, it is not always useful, but it can be helpful for certain datasets. The same approach works with the LLMs themselves.
Matt Rickard • Improving RAG: Strategies
DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and DeepSpeed-Inference
Table of Contents
1. Introduction
Large... See more
Table of Contents
- Introduction
- Key LLM Serving Techniques
- Dynamic SplitFuse: A Novel Prompt and Generation Composition Strategy
- Performance Evaluation
- DeepSpeed-FastGen: Implementation and Usage
- Try out DeepSpeed-FastGen
- Acknowledgements
1. Introduction
Large... See more
microsoft • DeepSpeed-FastGen
To train LLMs, you need data that is:
Large — Sufficiently large LMs require trillions of tokens.
Clean — Noisy data reduces performance.
Diverse — Data should come from different sources and different knowledge bases.
What does clean data look like?
You can de-duplicate data with simple heuristics. The most basic would be removing any exact duplicates... See more
Large — Sufficiently large LMs require trillions of tokens.
Clean — Noisy data reduces performance.
Diverse — Data should come from different sources and different knowledge bases.
What does clean data look like?
You can de-duplicate data with simple heuristics. The most basic would be removing any exact duplicates... See more
Shortwave — rajhesh.panchanadhan@gmail.com [Gmail alternative]
memary: Open-Source Longterm Memory for Autonomous Agents
memary demo
Why use memary?
Agents use LLMs that are currently constrained to finite context windows. memary overcomes this limitation by allowing your agents to store a large corpus of information in knowledge graphs, infer user knowledge through our memory modules, and only retrieve... See more
memary demo
Why use memary?
Agents use LLMs that are currently constrained to finite context windows. memary overcomes this limitation by allowing your agents to store a large corpus of information in knowledge graphs, infer user knowledge through our memory modules, and only retrieve... See more
GitHub - kingjulio8238/memary: Longterm Memory for Autonomous Agents.
Data
- Mistral AI shows a promising alternative to the GPT 3.5 model using prompt engineering .
- Mistral AI can be used where it requires high volume and faster processing time with very little cost .
- Mistral AI can be used as pre-filtering to GPT 4 to reduce cost i.e. can be used to filter down search results .
Mistral 7B is 187x cheaper compared to GPT-4
The xAI PromptIDE is an integrated development environment for prompt engineering and interpretability research. It accelerates prompt engineering through an SDK that allows implementing complex prompting techniques and rich analytics that visualize the network's outputs. We use it heavily in our continuous development of Grok.
PromptIDE
However development time, and maintenance can offset these savings. Hiring skilled data scientists, machine learning engineers, and DevOps professionals can be expensive and time consuming. Using available resources for “reimplementing” solutions hinder innovation and lead to a lack of focus. Since You not longer work on improving your model or... See more
Understanding the Cost of Generative AI Models in Production
One thing that is still confusing to me, is that we've been building products with machine learning pretty heavily for a decade now and somehow abandoned all that we have learned about the process now that we're building "AI".
The biggest thing any ML practitioner realizes when they step out of a research setting is that for most tasks accuracy has... See more
The biggest thing any ML practitioner realizes when they step out of a research setting is that for most tasks accuracy has... See more
Ask HN: What are some actual use cases of AI Agents right now? | Hacker News
You are assuming that the probability of failure is independent, which couldn't be further from the truth. If a digit recogniser can recognise one of your "hard" handwritten digits, such as a 4 or a 9, it will likely be able to recognise all of them.
The same happens with AI agents. They are not good at some tasks, but really really food at others.