LLMs
Table of Contents
- Introduction
- Key LLM Serving Techniques
- Dynamic SplitFuse: A Novel Prompt and Generation Composition Strategy
- Performance Evaluation
- DeepSpeed-FastGen: Implementation and Usage
- Try out DeepSpeed-FastGen
- Acknowledgements
1. Introduction
Large... See more
microsoft • DeepSpeed-FastGen
Sean Sheng • Scaling AI Models Like You Mean It
ANY
LLM of your choice, statistical methods, or NLP models that runs
locally on your machine
:
- G-Eval
- Summarization
- Answer Relevancy
- Faithfulness
- Contextual Recall
- Contextual Precision
- RAGAS
- Hallucination
- Toxicity
- Bias
- etc.
GitHub - confident-ai/deepeval: The LLM Evaluation Framework
Reply
reply
LinuxSpinach
•
5h ago
^ this. And especially classification as a task, because businesses don’t want to pay llm... See more
r/MachineLearning - Reddit
We're doing NER on hundreds of millions of documents in a specialised niche. LLMs are terrible for this. Slow, expensive and horrifyingly inaccurate. Even with agents, pydantic parsing and the like. Supervised methods are the way to go. Hell, I'd take an old school rule based approach over LLMs for this.
- You have access to a proprietary asset (like data) that others don’t have easy access to. In our “write job postings” example, perhaps you have a corpus of thousands of job postings including some outcome scores (as to how well they did). You could use this data to create better job postings. Others don’t have ready access to this data. Note: The
Dharmesh Shah • How To Build a Defensible A.I. Startup
Protecting LLM products:
(1) Is hard to bootstrap. This already hints to existing customers or you need to get a bunch of your customers to co-develop (insurance model → companies pooling their data to solve a problem they all have). This runs into a bunch of issues: competitive drive of the companies, data privacy and security.
(2) Reserved for existing companies. This is the co-pilot model.
(3) This might be the most sustainable one, but it is also the hardest one. I have not seen anything in that direction yet besides OpenAI.
- Right now, GPTs are the easiest way of sharing structured prompts, which are programs, written in plain English (or another language), that can get the AI to do useful things. I discussed creating structured prompts last week, and all the same techniques apply, but the GPT system makes structured prompts more powerful and much easier to create,
Ethan Mollick • Almost an Agent: What GPTs can do
When to use context caching
Context caching is particularly well suited to scenarios where a substantial initial context is referenced repeatedly by shorter requests. Consider using context caching for use cases... See more
