Can You Tell When an LLM API Swaps in a Cheaper Model?
Providers have every reason to serve a smaller or more quantized model under load. I ran the experiment to see if you can catch it from the outside. The obvious method fails backwards, and the one that works needs to accumulate evidence.
Rob··4 min read