updated 10mo ago
GitHub - unslothai/unsloth: 5X faster 50% less memory LLM finetuning
- 2-5x faster 50% less memory local LLM finetuning
- Manual autograd engine - hand derived backprop steps.
- 2x to 5x faster than QLoRA. 50% less memory usage.
- All kernels written in OpenAI's Triton language.
- 0% loss in accuracy - no approximation methods - all exact.
- No change of hardware necessary. Supports NVIDIA GPUs since 2018+. Minimum CUDA Compute Cap
from GitHub - unslothai/unsloth: 5X faster 50% less memory LLM finetuning by unslothai
Nicolay Gerold added 10mo ago