Running a language model with the same quality of ChatGPT within your own company requires expensive hardware.
But don’t worry, there is an optimization: Quantization can drastically reduce the hardware requirements of LLMs (up to 80%). In this post, I’ll explain how it works.