Sublime
An inspiration engine for ideas
Cherlyn Hsing-Hsin Liu
cherlynhsinghsinliu.comJen Yuan
linkedin.comIntroductions - Jason Liu
jxnl.co
Liang Wang
bywangliang.com
TL;DR
LLMLingua utilizes a compact, well-trained language model (e.g., GPT2-small, LLaMA-7B) to identify and remove non-essential tokens in prompts. This approach enables efficient inference with large language models (LLMs), achieving up to 20x compression with minimal performance loss.
... See more
LLMLingua utilizes a compact, well-trained language model (e.g., GPT2-small, LLaMA-7B) to identify and remove non-essential tokens in prompts. This approach enables efficient inference with large language models (LLMs), achieving up to 20x compression with minimal performance loss.
... See more
microsoft • GitHub - microsoft/LLMLingua: To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.
Liya Jin
@liyajin
yin li
@luna
Yongjin Li
@yjinnk