GitHub - KomputeProject/kompute: General purpose GPU compute framework built on Vulkan to support 1000s of cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). Blazing fast, mobile-enabled, asynchronous and optimized...
Lots of hot takes on whether it's possible that DeepSeek made training 45x more efficient, but @doodlestein wrote a very clear explanation of how they did it. Once someone breaks it down, it's not hard to understand. Rough summary:
* Use 8 bit instead of 32 bit floating point numbers, which gives massive memory... See more
Jared Friedmanx.comDeep learning at the speed of light.
Luminal is a deep learning library that uses composable compilers to achieve high performance.
use luminal::prelude::*;
// Setup graph and tensors
let mut cx = Graph::new();
let a = cx.tensor().set([[1.0], [2.0], [3.0]]);
let b = cx.tensor().set([[1.0, 2.0, 3.0, 4.0]]);
// Do math...
let mut c = a.matmul(b).retrieve();... See more
Luminal is a deep learning library that uses composable compilers to achieve high performance.
use luminal::prelude::*;
// Setup graph and tensors
let mut cx = Graph::new();
let a = cx.tensor().set([[1.0], [2.0], [3.0]]);
let b = cx.tensor().set([[1.0, 2.0, 3.0, 4.0]]);
// Do math...
let mut c = a.matmul(b).retrieve();... See more

