Discussion Solved: Cloud vs. On-Prem

show table of contents

There is this decade old discussion: Cloud vs. On-Prem which is better? This article solves this debate and gives you guidelines on what to do for your business.

Infrastructure exists on a spectrum of trade-offs, and there is no universally “right” or “wrong”. Here the hard nut to crack: From a reliability perspective, the decision isn’t just about where the servers sit; it’s about which operational burdens you are prepared to own and which you rather pay a premium to outsource.

The industry generally operates across four distinct models, each optimizing for different variables:

Public Cloud (AWS, GCP, Azure): This model minimizes capital expenditure (CapEx) as you do not have to buy (really expensive) server hardware or rent/build server farms. On top you reduce hiring risk as you don’t have to employ the people manually taking care about the servers, or building the infrastructure on top of it. (And believe me: All the good professionals in this fields are quite expensive). You maximize agility but that comes with the highest operational/ongoing cost and variable usage-based billing.
Managed Private Cloud: A middle ground often favored by SMEs with baseline loads. A provider manages the bare metal and the orchestration layer (like Kubernetes) using open-source stacks. This typically reduces costs by ~50% compared to hyperscalers while retaining a “managed” experience.
Rented Bare Metal (e.g., Hetzner): Here, you rent the physical hardware but own the software stack, OS, and networking. It is roughly 90% cheaper in raw infrastructure costs than the public cloud, but it requires a high degree of internal skill to manage hardware failures and network routing. (And the cost for highly skilled labor)
Buy and Colocate: The most cost-efficient long-term option (typically a 3–5 year ROI). It requires significant upfront CapEx and a team capable of physical hardware lifecycle management.

The Cost of Convenience

The “Cloud Tax” is a well-known phenomenon. Hyperscalers often mark up resources like bandwidth and RAM by 10x to 20x compared to bare metal prices. They also lean into proprietary APIs and high egress fees, to keep you locked in and making migrating out a multi-month (if not years) engineering project.

The counter-argument to “the cloud is expensive” is the Total Cost of Ownership. While raw hardware is cheaper, the labor required to manage it is not. As industry skills shift toward cloud-native orchestration (Terraform, IAM, API-driven workflows), engineers with deep expertise in “ancient” arts—like kernel tuning, RAID configurations, or physical datacenter logistics—are becoming rarer.

When a team chooses the public cloud, they aren’t just buying VMs; they are buying the 24/7 SRE team and the automated redundancy that would be difficult and risky for a small team to replicate manually.

What should you use?

Choosing where to sit on this spectrum is a strategic decision that usually follows a company’s lifecycle:

Startups: Generally benefit from the Public Cloud. Speed to market and low initial CapEx are more valuable than infrastructure margins.
Scale-ups (Spend >€20k/month): This is often the “tipping point.” When the “convenience tax” equals or exceeds the cost of a dedicated engineer, moving to Managed Private Cloud or Bare Metal becomes a viable financial strategy.
Steady-State Businesses: If your workload is predictable and doesn’t require “bursting,” the public cloud’s elasticity is an unnecessary expense. Providers like Hetzner shine here, where hardware is run for years at a fraction of the cost.
Infrastructure-Heavy Products: If your core product is the platform itself, staying as close to the metal as possible is often a competitive necessity for performance and margin.

The most successful teams are those that remain “cloud-flexible”—using the cloud for what it’s good at (experimentation and bursting) while owning the steady-state workloads that define their business.

If you are currently stuck in this decision and want someone who helps you evaluate it from multiple angles to look at your setup, feel free to reach out.

Here a few more concrete scenarios, on when to use what technology

Why to leave the cloud

The 30% Rule of Revenue: For many SaaS companies, once cloud costs approach 30% of the cost of revenue, the impact on EBITDA becomes too large to ignore. At this stage, repatriation to bare metal can often triple your margins.
[One I found on the internet, that I am highly sceptical about] The “One Engineer” Breakeven: For an SME, if your monthly cloud bill exceeds €10k–€15k, the raw savings from moving to a Managed Private Cloud (typically 50% cheaper) or Rented Bare Metal (up to 90% cheaper) roughly equals the salary of a full-time SRE. I am highly sceptical of this one, as keeping a platform up and running without proper on-call rotations leads to burning out your team. I would advice AGAINST moving out of the cloud if you only can afford a single engineer.
Egress Anomalies: If you are building a data-intensive platform (e.g., AdTech, Analytics, or Media), keep an eye on your networking line item. If egress fees exceed 20% of your total bill, you are being “taxed” for your own growth. Bare metal providers often include massive bandwidth allocations (20TB+) as standard.
Unpredictable Latency Tails: If your application requires p99 latencies under 10ms (e.g., high-frequency trading or real-time bidding), the virtualization layer (hypervisor) in the public cloud introduces jitter. Moving to Bare Metal removes this layer, providing direct access to the silicon.
Persistent Resource Saturation: If your monitoring shows CPU or RAM usage consistently at 80-90% on your largest available cloud instances, you’ve hit a vertical wall. Upgrading to a dedicated physical server often provides double the performance for a fraction of the cost because you aren’t paying for “burst capacity” you’re already using.
The Skills Threshold: Don’t move to Rented Bare Metal until your team is comfortable with “infrastructure-as-code” (Terraform/OpenTofu) and has a solid handle on Linux internals. If your team only knows how to click “Create Database” in a console, start with a Managed Private Cloud first.
Compliance & Data Sovereignty: If you are entering highly regulated markets (e.g., Healthcare or Finance in the EU), you may find that Option 4 (Colocation) is the only way to meet strict hardware-level isolation requirements that public cloud “shared responsibility” models can’t satisfy.

Why to stay IN the cloud

Kicking the blame-can down the hill If e.g. AWS has downtime, basically the entire internet is broken. All your customers rather understand that and most of the time have other issues in that moment. This argues for staying in the cloud.

Database “Magic”: Tools like Amazon Aurora or Google Cloud Spanner offer features—instant cross-region replication, serverless scaling, and millisecond failovers—that are notoriously difficult to maintain manually using standard PostgreSQL or MySQL.
The AI Ecosystem: In 2026, the cloud has become the primary gateway to specialized AI hardware. Accessing TPUs on GCP or deep OpenAI integration on Azure is far easier than trying to procure and maintain your own H100 GPU clusters. (which is super expensive to set up)
Handling Spikes: If you run a ticketing site or a gaming platform that sees 100x traffic during a product launch, the cloud’s ability to provision 1,000 nodes in minutes is unmatched. On bare metal, you’d be paying for those 1,000 nodes all year just to handle those few hours of peak load. Most businesses DO NOT need this elasticity. And it often is still a lot more efficient to just rent 20% more servers and just keep them around for spikes.
Global Footprint: If you need to launch in a new region (say, Tokyo or São Paulo) tomorrow, the public cloud lets you do it with a click. Setting up a new colocation presence in a foreign country involves months of contracts, shipping, and local legal compliance.
Blast Radius Reduction: Cloud providers spend billions on physical security and redundant power/networking. Their Availability Zones (AZs) are physically separate data centers with low-latency links. Replicating this “multi-AZ” setup in a private environment requires renting space in multiple, independent Tier-IV data centers.
Compliance Shortcuts: Getting SOC-2, HIPAA, or PCI-DSS certified is significantly faster when your provider is already compliant at the physical and hypervisor layers. You only have to audit your application, not the entire stack down to the rack locks.
The “Hiring Liquidity” Factor It is easier to hire for AWS than for Hetzner. When you use a hyperscaler, you are tapping into a global pool of talent that already knows how to navigate IAM, VPCs, and CloudWatch. If you move to a custom bare-metal stack with a bespoke Proxmox or OpenStack layer, every new hire faces a steep learning curve. You aren’t just managing servers; you’re managing a “specialized knowledge” silo.