Your GPU Probably Isn't Helping Your Retrieval System
I benchmarked a small embedding model across five hardware backends. DirectML was break-even with CPU. CUDA only won by 20 percent. The reason is the part that generalizes.
Rob··5 min read
Practical writing on deploy risk, runbooks, incident response, dependency changes, and the habits that keep small engineering teams out of avoidable fire drills.
I benchmarked a small embedding model across five hardware backends. DirectML was break-even with CPU. CUDA only won by 20 percent. The reason is the part that generalizes.