Database Engineering

Zero-Downtime Database Migrations

By Journal

Birds flying across a pale blue sky

Schema changes are routine until one blocks writes or breaks an older application version. This sample post describes an expand-and-contract migration plan for shipping database changes without downtime.

Migration sequence

  1. Add the new nullable column or table.
  2. Deploy application code that writes both old and new shapes.
  3. Backfill historical rows in small batches.
  4. Validate parity between old and new reads.
  5. Switch reads to the new shape.
  6. Remove the old column or table in a later deploy.
alter table projects
add column archived_reason text;
create index concurrently projects_archived_reason_idx
on projects (archived_reason)
where archived_reason is not null;

Dual-write example

async function archiveProject(projectId: string, reason: string) {
await db.transaction(async (tx) => {
await tx.projectArchiveEvents.insert({ projectId, reason });
await tx.projects.update({
id: projectId,
archivedAt: new Date(),
archivedReason: reason,
});
});
}

Backfill job

Use small batches and a durable cursor so the job can stop and resume safely.

Terminal window
bun run scripts/backfill-archived-reason.ts --batch-size 500 --sleep-ms 250
type BackfillCursor = {
migration: 'projects.archived_reason.2026_06_08';
lastProjectId: string | null;
processedRows: number;
};

Safety limits

ControlSuggested valueWhy it matters
Batch size100-1000 rowsKeeps locks short and predictable.
Pause between batches100-500 msLeaves capacity for user traffic.
Statement timeout5-15 secondsPrevents one query from blocking the migration.
Rollback markerPer deployLets the team identify the last safe version.

Validation query

select count(*) as mismatched_projects
from projects p
left join project_archive_events e on e.project_id = p.id
where p.archived_at is not null
and coalesce(p.archived_reason, '') <> coalesce(e.reason, '');

Release notes template

### Database migration
- Expanded schema with `projects.archived_reason`.
- Enabled dual writes in the application layer.
- Backfill is resumable and rate-limited.
- Contract step is scheduled after validation remains clean for 7 days.

A zero-downtime migration is less about one clever SQL statement and more about sequencing compatible application versions.

The safest migrations are reversible, observable, and boring enough that an on-call engineer can reason about them during an incident.