Field notes for deploy-safety teams.

Practical writing on deploy risk, runbooks, incident response, dependency changes, and the habits that keep small engineering teams out of avoidable fire drills.

Written for

Lean engineering teams

Focus

Deploys, incidents, runbooks

Bias

Useful over theoretical

Latest writing

All benchmarks gpu llama-cpp ml performance

Featuredperformance benchmarks ml gpu llama-cpp

An AMD GPU Beat My Mac on Llama 8B. The Same GPU Lost on Phi-3.

Same hardware. Same benchmark. Opposite winner depending on model size. Cross-platform inference numbers on Mac M2 Pro Metal, Linux RTX 2080 Ti CUDA, and Windows RX 6600 XT Vulkan, plus the hardware tier ceiling none of them clear.

Rob·Jun 2, 2026·6 min read

Featuredperformance benchmarks ml gpu

Your GPU Probably Isn't Helping Your Retrieval System

I benchmarked a small embedding model across five hardware backends. DirectML was break-even with CPU. CUDA only won by 20 percent. The reason is the part that generalizes.

Rob·Jun 1, 2026·5 min read