How I Migrated 50 Services from Heroku to DigitalOcean
A field report on moving 50 production services off Heroku to DigitalOcean App Platform + Droplets — the migration plan, the gotchas, the real cost delta.
The Setup
Our production landscape consisted of fifty independent services, each a Heroku dyno‑based web‑process or worker. The stack was homogeneous: Ruby 2.7, Node 18, Python 3.11, PostgreSQL 12, and a handful of Redis 6 add‑ons. All services were deployed via Heroku pipelines, scaled with heroku ps:scale, and monitored through Heroku Metrics plus Datadog overlays.
Two constraints drove the move:
- Predictable cost. Heroku’s per‑dyno pricing ballooned above $12 k/month for the combined load, with hidden fees for add‑ons and data egress.
- Control over failure domains. A region‑wide Heroku outage in Q2 2025 forced us to restart all 50 services manually, losing an average of 7 minutes per service and incurring SLA penalties.
DigitalOcean offered two primitives that map cleanly to our needs: the App Platform for managed web services and Droplets for stateful workers and custom runtimes. Both run on the same underlying KVM hypervisor, share the same VPC, and can be governed by a single Terraform codebase.
Migration Planning
We treated the migration as a staged, reversible rollout rather than a big‑bang switch. The plan comprised three phases: inventory, automation, and cut‑over.
1. Inventory & Dependency Mapping
We exported the Heroku app list via the API:
heroku apps --json | jq -r '.[] | .name'
Each name was cross‑referenced against a service‑deps.yml file that listed:
- Runtime (Ruby/Node/Python)
- Environment variables (including
DATABASE_URL,REDIS_URL) - Add‑on IDs (Heroku Postgres, Heroku Redis, SendGrid)
- Scale targets (dyno count, worker concurrency)
The resulting matrix gave us a clear “as‑is” baseline and highlighted the 12 services that relied on Heroku‑only features (e.g., heroku pipelines:promote and heroku redis:fork). Those were earmarked for custom scripts later.
2. IaC Bootstrap
We codified the target environment in Terraform 1.6, using the official digitalocean provider. A minimal module looked like:
module "app" {
source = "digitalocean/app-platform"
version = "~> 1.0"
name = var.service_name
region = var.region
runtime = var.runtime
env = var.env_vars
instance_size = var.instance_size
instance_count = var.instance_count
}
For workers we fell back to Droplets with a cloud‑init script that pulled the latest Docker image from our private registry, then started systemd units. This kept the runtime identical to Heroku’s one‑off dyno model while giving us full SSH access for debugging.
3. Data Migration Strategy
Heroku Postgres was the biggest cost driver. We provisioned DigitalOcean Managed PostgreSQL (v14) clusters in the same region as the target App Platform to minimize latency. The migration used pg_dumpall piped over a private VPC tunnel:
pg_dumpall -h $HEROKU_HOST -U $HEROKU_USER | \
psql -h $DO_HOST -U $DO_USER
Because the dump includes roles and extensions, we verified that the target cluster had the same shared_preload_libraries (pg_stat_statements, pg_hint_plan). The cut‑over window was limited to a 15‑minute maintenance window per service, during which we paused the Heroku dynos with heroku ps:scale web=0, performed the dump, and immediately resumed traffic on the DigitalOcean endpoint.
4. Canary Deployment
For each service we deployed two identical App Platform instances: a “blue” (Heroku) and a “green” (DigitalOcean). Traffic was split 95/5 using Cloudflare Load Balancing rules. Metrics from Datadog were filtered by service:myservice and environment:green. If the green instance showed no error spikes after 48 hours, we increased its share to 50 % and retired the blue instance.
The Evidence
We measured three key dimensions over a six‑month observation period after full migration:
Operational Overhead
- Incidents. Heroku‑era: 12 production incidents (average MTTR 27 min). Post‑migration: 4 incidents (average MTTR 13 min). All remaining incidents were unrelated to the platform (e.g., code bugs).
- On‑call time. Senior engineers spent ~30 % of their on‑call rotation triaging Heroku add‑on failures (e.g., Redis quota breaches). After migration that dropped to <10 %.
Cost Delta
Our monthly bill broke down as follows (rounded to nearest hundred):
- Heroku dynos: $9,800
- Heroku Postgres (Standard‑2): $2,400
- Heroku Redis (Premium‑0): $1,200
- Heroku add‑on overhead (logging, metrics): $600
- Total: $14,000
DigitalOcean equivalent:
- App Platform (Standard‑S) instances: $5,700
- Managed PostgreSQL (General‑2): $2,000
- Managed Redis (Premium‑1): $850
- Droplets for workers (8 × $15): $1,200
- Outbound bandwidth (first 2 TB): $300
- Total: $10,050
That’s a 28 % reduction, plus the elimination of per‑add‑on “feature‑lock” fees that rarely get used.
Performance Benchmarks
Response latency for the 95 % of services that are pure HTTP APIs improved by an average of 12 ms (p < 0.01) after migration. The gain stems from the lower network hop count between the App Platform and the co‑located PostgreSQL cluster.
The Counter‑Argument
Trend‑chasing platforms sometimes solve niche problems that “boring” stacks cannot. Two concrete scenarios where Heroku’s premium features still win:
- Ephemeral filesystem size. Heroku’s
tmpfsquota (300 MB) is baked into the dyno, making it trivial for build‑time asset pipelines. Replicating that on DigitalOcean required a customtmpfsmount inside each Droplet, adding configuration churn. - Zero‑downtime pipeline promotion. Heroku’s
heroku pipelines:promoteatomically swaps the slug across all dynos. On DigitalOcean we had to script a rolling restart, which introduced a brief (<2 s) 503 window for high‑traffic endpoints.
Neither issue broke the migration, but they illustrate why a blanket “always choose boring” mantra is naive. The rule of thumb: adopt a trendy feature only if it eliminates a >5 % operational cost or solves a compliance gap that the boring stack cannot meet.
The Practical Take
From a SRE perspective, the migration checklist reads like a checklist for any large‑scale platform shift:
- Lock down the source of truth. All environment variables must live in a version‑controlled
.env.ymlfile, not in the Heroku dashboard. - Automate idempotent provisioning. Terraform + Ansible for the first‑run, then guard against drift with
terraform planin CI. - Validate data integrity before cut‑over. Run
pg_verifychecksumon the target cluster after each dump. - Instrument both stacks. Keep Datadog agents on Heroku and DigitalOcean simultaneously for a 48‑hour overlap window.
- Roll back with a single command. Keep the original Heroku app alive (scaled to 0) and a
heroku pipelines:promotealias ready.
We codified these steps into a make migrate‑service SERVICE=payments target, which runs the entire pipeline from Terraform apply to traffic split. The command takes ~12 minutes for a typical service, leaving ample time for manual sanity checks.
Cost Analysis Deep Dive
Beyond the headline $4k/month saving, the migration reshaped our cost model in three ways:
- Predictable scaling. App Platform instance pricing is linear with CPU‑minutes; we can forecast a 10 % traffic spike and budget the exact dollar amount. Heroku’s dyno‑based scaling hides the underlying CPU usage, leading to surprise overages.
- Reduced vendor lock‑in. DigitalOcean’s API is REST‑first and fully documented. Switching from Droplets to another provider (e.g., Linode) is a matter of updating the provider block in Terraform. Heroku’s proprietary CLI and slug system make any migration a multi‑month project.
- Operational tooling cost. We retired three third‑party add‑ons (log‑drain, metrics, and background job monitor) because DigitalOcean’s native
doctllogs and built‑in VPC flow logs covered the same use cases at half the price.
When we model a 2‑year horizon, the cumulative savings exceed $96 k, which comfortably pays for the one‑time engineering effort (~400 engineer‑hours) required to build the migration framework.
The Conclusion
Choosing “boring” infrastructure isn’t about being dull; it’s about maximizing the ratio of uptime to engineering effort. The Heroku‑to‑DigitalOcean migration proved that a disciplined, IaC‑first approach slashes cost, halves incident latency, and preserves performance—all while keeping the platform under direct control. Trendy SaaS will always tempt teams with glossy UI and one‑click add‑ons, but those conveniences hide hidden operational debt. Deploy the proven stack, automate the path, and only lift a trendy tool into production after it has survived six months of real‑world traffic in a non‑critical canary. The payoff is measurable, repeatable, and, most importantly, observable on a post‑mortem chart that any sane SRE can read without a marketing glossary.
This is part of the DevOps Ninja cornerstone series. Honest critique welcome.