updated 5mo ago
FastAPI: Thread Pool and Event Loop
- One of the first things Data Scientists learn as they run predictions is to avoid the use of loops. That’s because most ML libraries support vectorized inference, combining many inputs into a batch and more efficiently calculating the results. This specialized technique combines framework-level features with specialized hardware like GPUs, making p... See more
from Breaking Up With Flask & FastAPI: Why ML Model Serving Requires A Specialized Framework by Tim Liu
Nicolay Gerold added
Another way to improve cycle time is by having fewer levels to traverse when a decision does need vertical leaps and reducing the latency of escalations. One element of this is escalating early and often rather than letting issues drag out. One thing that helps with that is assigning a Single Threaded Owner for each critical area of work. That is s
... See morefrom Cycle Time by Andrew Bosworth
Pritesh added
- Currently we have single-threaded execution running at ~10Hz (tok/s) and enjoy looking at the assembly-level execution traces stream by.
from Tweet by Andrej Karpathy
Darren LI added
This is exactly the reason why these agents are soooo slow executing tasks