Tag: gpu
All the articles with the tag "gpu".
-
Sovereign AI: Running GPUs On-Prem When the Cloud Isn't an Option
For regulated workloads where the data legally cannot leave a building, on-prem GPU inference is back. The build-vs-rent math, the constraints nobody prices in, and the software that makes a fixed fleet feel like a platform.
-
vLLM, Quantization, and Serving LLMs on a Budget
Self-hosting an open model when GPUs are scarce and finance is reading the bill. Continuous batching, KV-cache, what quantization actually costs you, and when to just call a hosted API instead.