Infrastructure
Incremental Search Indexing for Product Workspaces
By Journal
Search quality depends on freshness as much as ranking. In a collaborative workspace, documents, comments, tasks, and attachments can change quickly, so a full reindex after every write is too expensive.
This sample post describes an incremental indexing loop that favors small jobs, durable checkpoints, and idempotent workers.
Architecture sketch
Writer service | vChange log table ---> Index queue ---> Index worker ---> Search backend | | +----------- checkpoint store <--------+Each write appends a change record. Workers consume ordered ranges from the change log and update the search backend with the latest document projection.
Change record
type ChangeRecord = { sequence: number; workspaceId: string; entityType: 'document' | 'task' | 'comment'; entityId: string; operation: 'upsert' | 'delete'; committedAt: string;};Worker loop
async function indexWorkspace(workspaceId: string) { let cursor = await checkpoints.read(workspaceId);
while (true) { const changes = await changeLog.readAfter({ workspaceId, cursor, limit: 100 });
if (changes.length === 0) { return; }
for (const change of changes) { await applyChange(change); cursor = change.sequence; }
await checkpoints.write(workspaceId, cursor); }}Retry policy
Use bounded retries for transient failures and move poison messages into a review queue.
{ "maxAttempts": 5, "backoff": "exponential", "initialDelayMs": 500, "maxDelayMs": 30000, "deadLetterQueue": "search-indexing-review"}Failure modes
- Out-of-order delivery: use sequence numbers and checkpoint only after successful application.
- Duplicate work: make search backend writes idempotent by document ID and version.
- Deleted entities: emit tombstones so stale documents are removed from the index.
- Large workspaces: partition queue work by workspace and entity type.
Freshness service-level objective
| Workspace tier | Target freshness | Rebuild cadence |
|---|---|---|
| Free | 15 minutes | Weekly |
| Team | 5 minutes | Daily |
| Enterprise | 60 seconds | Hourly safety scan |
Operator runbook
When indexing falls behind:
- Check worker error rate and queue age.
- Pause non-critical rebuild jobs.
- Increase workers for hot partitions.
- Inspect dead-letter records for schema drift.
- Re-run affected ranges after the fix ships.
journal-search replay --workspace ws_123 --from-sequence 98110 --to-sequence 98420Design tradeoffs
Incremental indexing is a consistency strategy. It accepts that search can lag behind writes briefly, then makes that lag observable and recoverable.
A good indexing pipeline should be easy to replay, safe to retry, and explicit about freshness expectations. Those constraints matter more than a clever ranking algorithm when users simply need their newest work to appear.