Discussion about this post

User's avatar
JP's avatar

This is a solid technical overview of quantization trade-offs. The point about INT8 weight quantization having very little loss while INT4 affects smaller models disproportionately is key.

What's changed since this was published is how some providers are using aggressive quantization *without disclosing it*. If you're paying for 'DeepSeek V3' through a budget API, you might be getting an INT4-quantized version that performs nothing like the benchmarked model.

I wrote about this phenomenon and why the recent price hikes from Chutes, Z.ai and Synthetic might actually signal honesty: https://sulat.com/p/the-real-cost-of-cheap-ai-inference

The providers charging sustainable prices are the ones showing you their quantization levels. The $3/month 'unlimited' plans are the ones doing silent degradation and hoping you don't notice.

john's avatar

Great summary and details! Keep this kind of content coming. Subscribed!

No posts

Ready for more?