Tag: gpu

All the articles with the tag "gpu".

Sovereign AI: Running GPUs On-Prem When the Cloud Isn't an Option

20 May, 2025

For regulated workloads where the data legally cannot leave a building, on-prem GPU inference is back. The build-vs-rent math, the constraints nobody prices in, and the software that makes a fixed fleet feel like a platform.
vLLM, Quantization, and Serving LLMs on a Budget

16 Apr, 2024

Self-hosting an open model when GPUs are scarce and finance is reading the bill. Continuous batching, KV-cache, what quantization actually costs you, and when to just call a hosted API instead.

Sovereign AI: Running GPUs On-Prem When the Cloud Isn't an Option