GitHub - microsoft/LLMLingua: To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.

Thumbnail of GitHub - microsoft/LLMLingua: To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.

updated 8mo ago

... See more

from Prodigy in 2023: LLMs, task routers, QA and plugins · Explosion

Nicolay Gerold added

  • from Shortwave — rajhesh.panchanadhan@gmail.com [Gmail alternative]

    Nicolay Gerold added

  • from DeepSpeed-FastGen by microsoft

    Nicolay Gerold added