Infrastructure

Incremental Search Indexing for Product Workspaces

By Journal

Birds flying across a pale blue sky

Search quality depends on freshness as much as ranking. In a collaborative workspace, documents, comments, tasks, and attachments can change quickly, so a full reindex after every write is too expensive.

This sample post describes an incremental indexing loop that favors small jobs, durable checkpoints, and idempotent workers.

Architecture sketch

Writer service
|
v
Change log table ---> Index queue ---> Index worker ---> Search backend
| |
+----------- checkpoint store <--------+

Each write appends a change record. Workers consume ordered ranges from the change log and update the search backend with the latest document projection.

Change record

type ChangeRecord = {
sequence: number;
workspaceId: string;
entityType: 'document' | 'task' | 'comment';
entityId: string;
operation: 'upsert' | 'delete';
committedAt: string;
};

Worker loop

async function indexWorkspace(workspaceId: string) {
let cursor = await checkpoints.read(workspaceId);
while (true) {
const changes = await changeLog.readAfter({ workspaceId, cursor, limit: 100 });
if (changes.length === 0) {
return;
}
for (const change of changes) {
await applyChange(change);
cursor = change.sequence;
}
await checkpoints.write(workspaceId, cursor);
}
}

Retry policy

Use bounded retries for transient failures and move poison messages into a review queue.

{
"maxAttempts": 5,
"backoff": "exponential",
"initialDelayMs": 500,
"maxDelayMs": 30000,
"deadLetterQueue": "search-indexing-review"
}

Failure modes

  • Out-of-order delivery: use sequence numbers and checkpoint only after successful application.
  • Duplicate work: make search backend writes idempotent by document ID and version.
  • Deleted entities: emit tombstones so stale documents are removed from the index.
  • Large workspaces: partition queue work by workspace and entity type.

Freshness service-level objective

Workspace tierTarget freshnessRebuild cadence
Free15 minutesWeekly
Team5 minutesDaily
Enterprise60 secondsHourly safety scan

Operator runbook

When indexing falls behind:

  1. Check worker error rate and queue age.
  2. Pause non-critical rebuild jobs.
  3. Increase workers for hot partitions.
  4. Inspect dead-letter records for schema drift.
  5. Re-run affected ranges after the fix ships.
Terminal window
journal-search replay --workspace ws_123 --from-sequence 98110 --to-sequence 98420

Design tradeoffs

Incremental indexing is a consistency strategy. It accepts that search can lag behind writes briefly, then makes that lag observable and recoverable.

A good indexing pipeline should be easy to replay, safe to retry, and explicit about freshness expectations. Those constraints matter more than a clever ranking algorithm when users simply need their newest work to appear.