DevOps Ninja logo devops.ninja

How I Migrated 50 Services from Heroku to DigitalOcean

By DevOps Ninja Editorial · Published 2026-05-09 · // cornerstone

A field report on moving 50 production services off Heroku to DigitalOcean App Platform + Droplets — the migration plan, the gotchas, the real cost delta.

The Setup

Our production landscape consisted of fifty independent services, each a Heroku dyno‑based web‑process or worker. The stack was homogeneous: Ruby 2.7, Node 18, Python 3.11, PostgreSQL 12, and a handful of Redis 6 add‑ons. All services were deployed via Heroku pipelines, scaled with heroku ps:scale, and monitored through Heroku Metrics plus Datadog overlays.

Two constraints drove the move:

DigitalOcean offered two primitives that map cleanly to our needs: the App Platform for managed web services and Droplets for stateful workers and custom runtimes. Both run on the same underlying KVM hypervisor, share the same VPC, and can be governed by a single Terraform codebase.

Migration Planning

We treated the migration as a staged, reversible rollout rather than a big‑bang switch. The plan comprised three phases: inventory, automation, and cut‑over.

1. Inventory & Dependency Mapping

We exported the Heroku app list via the API:

heroku apps --json | jq -r '.[] | .name'

Each name was cross‑referenced against a service‑deps.yml file that listed:

The resulting matrix gave us a clear “as‑is” baseline and highlighted the 12 services that relied on Heroku‑only features (e.g., heroku pipelines:promote and heroku redis:fork). Those were earmarked for custom scripts later.

2. IaC Bootstrap

We codified the target environment in Terraform 1.6, using the official digitalocean provider. A minimal module looked like:

module "app" {
  source  = "digitalocean/app-platform"
  version = "~> 1.0"

  name        = var.service_name
  region      = var.region
  runtime     = var.runtime
  env         = var.env_vars
  instance_size = var.instance_size
  instance_count = var.instance_count
}

For workers we fell back to Droplets with a cloud‑init script that pulled the latest Docker image from our private registry, then started systemd units. This kept the runtime identical to Heroku’s one‑off dyno model while giving us full SSH access for debugging.

3. Data Migration Strategy

Heroku Postgres was the biggest cost driver. We provisioned DigitalOcean Managed PostgreSQL (v14) clusters in the same region as the target App Platform to minimize latency. The migration used pg_dumpall piped over a private VPC tunnel:

pg_dumpall -h $HEROKU_HOST -U $HEROKU_USER | \
  psql -h $DO_HOST -U $DO_USER

Because the dump includes roles and extensions, we verified that the target cluster had the same shared_preload_libraries (pg_stat_statements, pg_hint_plan). The cut‑over window was limited to a 15‑minute maintenance window per service, during which we paused the Heroku dynos with heroku ps:scale web=0, performed the dump, and immediately resumed traffic on the DigitalOcean endpoint.

4. Canary Deployment

For each service we deployed two identical App Platform instances: a “blue” (Heroku) and a “green” (DigitalOcean). Traffic was split 95/5 using Cloudflare Load Balancing rules. Metrics from Datadog were filtered by service:myservice and environment:green. If the green instance showed no error spikes after 48 hours, we increased its share to 50 % and retired the blue instance.

The Evidence

We measured three key dimensions over a six‑month observation period after full migration:

Operational Overhead

Cost Delta

Our monthly bill broke down as follows (rounded to nearest hundred):

DigitalOcean equivalent:

That’s a 28 % reduction, plus the elimination of per‑add‑on “feature‑lock” fees that rarely get used.

Performance Benchmarks

Response latency for the 95 % of services that are pure HTTP APIs improved by an average of 12 ms (p < 0.01) after migration. The gain stems from the lower network hop count between the App Platform and the co‑located PostgreSQL cluster.

The Counter‑Argument

Trend‑chasing platforms sometimes solve niche problems that “boring” stacks cannot. Two concrete scenarios where Heroku’s premium features still win:

Neither issue broke the migration, but they illustrate why a blanket “always choose boring” mantra is naive. The rule of thumb: adopt a trendy feature only if it eliminates a >5 % operational cost or solves a compliance gap that the boring stack cannot meet.

The Practical Take

From a SRE perspective, the migration checklist reads like a checklist for any large‑scale platform shift:

  1. Lock down the source of truth. All environment variables must live in a version‑controlled .env.yml file, not in the Heroku dashboard.
  2. Automate idempotent provisioning. Terraform + Ansible for the first‑run, then guard against drift with terraform plan in CI.
  3. Validate data integrity before cut‑over. Run pg_verifychecksum on the target cluster after each dump.
  4. Instrument both stacks. Keep Datadog agents on Heroku and DigitalOcean simultaneously for a 48‑hour overlap window.
  5. Roll back with a single command. Keep the original Heroku app alive (scaled to 0) and a heroku pipelines:promote alias ready.

We codified these steps into a make migrate‑service SERVICE=payments target, which runs the entire pipeline from Terraform apply to traffic split. The command takes ~12 minutes for a typical service, leaving ample time for manual sanity checks.

Cost Analysis Deep Dive

Beyond the headline $4k/month saving, the migration reshaped our cost model in three ways:

When we model a 2‑year horizon, the cumulative savings exceed $96 k, which comfortably pays for the one‑time engineering effort (~400 engineer‑hours) required to build the migration framework.

The Conclusion

Choosing “boring” infrastructure isn’t about being dull; it’s about maximizing the ratio of uptime to engineering effort. The Heroku‑to‑DigitalOcean migration proved that a disciplined, IaC‑first approach slashes cost, halves incident latency, and preserves performance—all while keeping the platform under direct control. Trendy SaaS will always tempt teams with glossy UI and one‑click add‑ons, but those conveniences hide hidden operational debt. Deploy the proven stack, automate the path, and only lift a trendy tool into production after it has survived six months of real‑world traffic in a non‑critical canary. The payoff is measurable, repeatable, and, most importantly, observable on a post‑mortem chart that any sane SRE can read without a marketing glossary.

This is part of the DevOps Ninja cornerstone series. Honest critique welcome.