GitHub - microsoft/LLMLingua: To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.
GitHub - mit-han-lab/streaming-llm: Efficient Streaming Language Models with Attention Sinks
mit-han-labgithub.comDarren LI and added
Data-Juicer: A One-Stop Data Processing System for Large Language Models
Data-Juicer is a one-stop data processing system to make data higher-quality, juicier, and more digestible for LLMs. This project is being actively updated and maintained, and we will periodically enhance and add more features and data recipes. We welcome you to join us in pro... See more
Data-Juicer is a one-stop data processing system to make data higher-quality, juicier, and more digestible for LLMs. This project is being actively updated and maintained, and we will periodically enhance and add more features and data recipes. We welcome you to join us in pro... See more
alibaba • GitHub - alibaba/data-juicer: A one-stop data processing system to make data higher-quality, juicier, and more digestible for LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大语言模型提供更高质量、更丰富、更易”消化“的数据!
Nicolay Gerold added
SGLang is a structured generation language designed for large language models (LLMs). It makes your interaction with LLMs faster and more controllable by co-designing the frontend language and the runtime system.
The core features of SGLang include:
The core features of SGLang include:
- A Flexible Front-End Language : This allows for easy programming of LLM applications with multiple ch
sgl-project • GitHub - sgl-project/sglang: SGLang is a structured generation language designed for large language models (LLMs). It makes your interaction with models faster and more controllable.
Nicolay Gerold added
They have a fast jsond ecoding feature with a finite state machine.
a couple of the top of my head:
- LLM in the loop with preference optimization
- synthetic data generation
- cross modality "distillation" / dictionary remapping
- constrained decoding
r/MachineLearning - Reddit
Nicolay Gerold added
Additional LLM paradigms beyond RAG
StreamingLLM can enable Llama-2, MPT, Falcon, and Pythia to perform stable and efficient language modeling with up to 4 million tokens and more.
mit-han-lab • GitHub - mit-han-lab/streaming-llm: Efficient Streaming Language Models with Attention Sinks
Darren LI added
Meta 刚刚低调发布了一个最小且快速的 LLM大模型训练和推理框架库Meta Lingua。 目的是让更多人来训练 llama。可以 24 小时训练出一个 llama 7B,MMLU达到 48%。在许多下游任务上获得了非常强大的性能,并且与DCLM 基线 1.0的性能相匹配。
国内大模型要笑醒,训练一个国产大模型只需要 10 万人民币,费用计算: $2.50/h(租用一个 H100) × 256个(h100 gpu) × 24H = $15,360。
Meta Lingua主要特点 :
- 允许用户快速入门,而无需安装和配置大量依赖项。
- 用于研究的最小且快速的 LLM 训练/推理库。
- 使用可修改的 PyTorch 组件来试验架构、损失、数据
- 支持端到端训练、推理和评估... See more
simply accessing LLMs via APIs has limitations. Instead, combining them with other data sources and tools can enable more powerful applications. In this chapter, we will introduce LangChain as a way to overcome LLM limitations and build innovative language-based applications.
Ben Auffarth • Generative AI with LangChain: Build large language model (LLM) apps with Python, ChatGPT, and other LLMs
There is often a notable gap between state of the art research and what practitioners can reasonably use. However, I'm glad to say that attention sinks can be added to any pretrained LLM at near to no additional effort.
I have released the attention_sinks Python module, which acts as a drop-in replacement for the transformers API. This Python module... See more
I have released the attention_sinks Python module, which acts as a drop-in replacement for the transformers API. This Python module... See more
Tom Aarsen • 🕳️ Attention Sinks in LLMs for endless fluency
Nicolay Gerold added