Database Engineering
Zero-Downtime Database Migrations
By Journal
Schema changes are routine until one blocks writes or breaks an older application version. This sample post describes an expand-and-contract migration plan for shipping database changes without downtime.
Migration sequence
- Add the new nullable column or table.
- Deploy application code that writes both old and new shapes.
- Backfill historical rows in small batches.
- Validate parity between old and new reads.
- Switch reads to the new shape.
- Remove the old column or table in a later deploy.
alter table projects add column archived_reason text;
create index concurrently projects_archived_reason_idx on projects (archived_reason) where archived_reason is not null;Dual-write example
async function archiveProject(projectId: string, reason: string) { await db.transaction(async (tx) => { await tx.projectArchiveEvents.insert({ projectId, reason }); await tx.projects.update({ id: projectId, archivedAt: new Date(), archivedReason: reason, }); });}Backfill job
Use small batches and a durable cursor so the job can stop and resume safely.
bun run scripts/backfill-archived-reason.ts --batch-size 500 --sleep-ms 250type BackfillCursor = { migration: 'projects.archived_reason.2026_06_08'; lastProjectId: string | null; processedRows: number;};Safety limits
| Control | Suggested value | Why it matters |
|---|---|---|
| Batch size | 100-1000 rows | Keeps locks short and predictable. |
| Pause between batches | 100-500 ms | Leaves capacity for user traffic. |
| Statement timeout | 5-15 seconds | Prevents one query from blocking the migration. |
| Rollback marker | Per deploy | Lets the team identify the last safe version. |
Validation query
select count(*) as mismatched_projectsfrom projects pleft join project_archive_events e on e.project_id = p.idwhere p.archived_at is not null and coalesce(p.archived_reason, '') <> coalesce(e.reason, '');Release notes template
### Database migration
- Expanded schema with `projects.archived_reason`.- Enabled dual writes in the application layer.- Backfill is resumable and rate-limited.- Contract step is scheduled after validation remains clean for 7 days.A zero-downtime migration is less about one clever SQL statement and more about sequencing compatible application versions.
The safest migrations are reversible, observable, and boring enough that an on-call engineer can reason about them during an incident.