GitHub - turboderp/exllamav2: A fast inference library for r...

刚才花2小时看完了DeepSeek V3 的 Technical Report，下面说下我的感想。首先，文章贡献主要来自系统（Training Infra），而非模型本身。模型本身依然基于传统的Transformer： 1）他们世界首创在大规模LLM训练中系统性部署fp8（8位浮点）量化技术，这大大降低训练对显卡内存的需求，也加快了训练过程； 2）为了正确使用fp8的矩阵乘法，他们优化并改进了CUDA Kernal的调用方式，甚至给NVDA提出了诸多Tensor Core方面的设计建议 3）他们开发了自己的训练框架DualPipe，实现了16/64通道的流水线和专家（MOE）并行，极大改善了并行训练中的通信和计算冲突问题，解决了调度瓶颈。最终，DeepSeek实现了在2048个... See more

勃勃OC x.com

Overview

MaxText is a high performance , highly scalable , open-source LLM written in pure Python/Jax and targeting Google Cloud TPUs and GPUs for training and inference . MaxText achieves high MFUs and scales from single host to very large clusters while staying simple and "optimization-free" thanks to the power of Jax and the XLA compiler.

MaxText... See more

google • GitHub - google/maxtext: A simple, performant and scalable Jax LLM!

2-5x faster 50% less memory local LLM finetuning

Manual autograd engine - hand derived backprop steps.

2x to 5x faster than QLoRA. 50% less memory usage.

All kernels written in OpenAI's Triton language.

0% loss in accuracy - no approximation methods - all exact.

No change of hardware necessary. Supports NVIDIA GPUs since 2018+. Minimum CUDA Compute Cap

unslothai • GitHub - unslothai/unsloth: 5X faster 50% less memory LLM finetuning

Edition 22: A Framework to Securely Use LLMs in Companies - Part 2: Managing Risk

Sandesh Mysore Anand boringappsec.substack.com

"「macOS」にLLMをインストールするには--「Ollama」を試す" - ZDNET Japan #SmartNews

GitHub - turboderp/exllamav2: A fast inference library for running LLMs locally on modern consumer-class GPUs

google • GitHub - google/maxtext: A simple, performant and scalable Jax LLM!

unslothai • GitHub - unslothai/unsloth: 5X faster 50% less memory LLM finetuning

Edition 22: A Framework to Securely Use LLMs in Companies - Part 2: Managing Risk

「macOS」にLLMをインストールするには--「Ollama」を試す (ZDNET Japan)