AI Engineering

Observable Agent Workflows

By Journal

Birds flying across a pale blue sky

Long-running agent workflows are easier to debug when every step leaves a structured trail. This sample post outlines a lightweight observability model for agent runs that mix tool calls, user-visible updates, retries, and generated artifacts.

Dummy content note: this article is placeholder copy for exercising blog rendering and Markdown content blocks.

Trace shape

A trace should answer three questions quickly:

  1. What did the agent try to accomplish?
  2. Which tools or systems did it touch?
  3. Where did latency, retries, or errors occur?
type AgentTrace = {
runId: string;
workflow: 'build' | 'research' | 'triage';
actorId: string;
startedAt: string;
completedAt?: string;
spans: AgentSpan[];
};
type AgentSpan = {
spanId: string;
parentSpanId?: string;
name: string;
status: 'ok' | 'error' | 'cancelled';
durationMs: number;
attributes: Record<string, string | number | boolean>;
};

Event vocabulary

EventPurposeExample attribute
tool.startedA tool call was requested.toolName
tool.completedA tool call returned successfully.durationMs
artifact.createdA durable output was generated.artifactType
retry.scheduledA transient failure will be retried.attempt

Minimal span emitter

async function withSpan<T>(name: string, run: () => Promise<T>) {
const startedAt = performance.now();
try {
const result = await run();
emitSpan({ name, status: 'ok', durationMs: performance.now() - startedAt });
return result;
} catch (error) {
emitSpan({ name, status: 'error', durationMs: performance.now() - startedAt });
throw error;
}
}

Replay metadata

Use replay metadata sparingly. Store enough context to reproduce the run without copying sensitive user data into logs.

{
"runId": "run_01jz9w2",
"model": "agent-large",
"toolVersion": "2026-06-10.1",
"inputSnapshotId": "snap_8k2",
"redactionPolicy": "workspace-default"
}

Logging guardrails

  • Redact user-provided secrets before exporting traces.
  • Prefer identifiers over full document bodies.
  • Keep prompt and response captures behind stricter retention policies.
  • Link traces to durable artifacts instead of embedding large payloads inline.

A useful observability system does not need perfect detail. It needs consistent structure, predictable retention, and enough context to turn a failed run into a reproducible bug report.