AI Engineering
Observable Agent Workflows
By Journal
Long-running agent workflows are easier to debug when every step leaves a structured trail. This sample post outlines a lightweight observability model for agent runs that mix tool calls, user-visible updates, retries, and generated artifacts.
Dummy content note: this article is placeholder copy for exercising blog rendering and Markdown content blocks.
Trace shape
A trace should answer three questions quickly:
- What did the agent try to accomplish?
- Which tools or systems did it touch?
- Where did latency, retries, or errors occur?
type AgentTrace = { runId: string; workflow: 'build' | 'research' | 'triage'; actorId: string; startedAt: string; completedAt?: string; spans: AgentSpan[];};
type AgentSpan = { spanId: string; parentSpanId?: string; name: string; status: 'ok' | 'error' | 'cancelled'; durationMs: number; attributes: Record<string, string | number | boolean>;};Event vocabulary
| Event | Purpose | Example attribute |
|---|---|---|
tool.started | A tool call was requested. | toolName |
tool.completed | A tool call returned successfully. | durationMs |
artifact.created | A durable output was generated. | artifactType |
retry.scheduled | A transient failure will be retried. | attempt |
Minimal span emitter
async function withSpan<T>(name: string, run: () => Promise<T>) { const startedAt = performance.now();
try { const result = await run(); emitSpan({ name, status: 'ok', durationMs: performance.now() - startedAt }); return result; } catch (error) { emitSpan({ name, status: 'error', durationMs: performance.now() - startedAt }); throw error; }}Replay metadata
Use replay metadata sparingly. Store enough context to reproduce the run without copying sensitive user data into logs.
{ "runId": "run_01jz9w2", "model": "agent-large", "toolVersion": "2026-06-10.1", "inputSnapshotId": "snap_8k2", "redactionPolicy": "workspace-default"}Logging guardrails
- Redact user-provided secrets before exporting traces.
- Prefer identifiers over full document bodies.
- Keep prompt and response captures behind stricter retention policies.
- Link traces to durable artifacts instead of embedding large payloads inline.
A useful observability system does not need perfect detail. It needs consistent structure, predictable retention, and enough context to turn a failed run into a reproducible bug report.