02 critical output format

Part I: Output & Parsing | Challenge §2

2. Output Format & Parseability

The Problem

Agents parse command output to determine what happened and extract values for subsequent steps. Unparseable, inconsistent, or human-only output forces the agent to do fragile regex parsing or hallucinate results.

Human-formatted output agents cannot reliably parse:

$ tool list-users
┌────────────────┬─────┬──────────────┐
│ Name           │ ID  │ Status       │
├────────────────┼─────┼──────────────┤
│ Alice Johnson  │ 42  │ active       │
│ Bob Smith      │ 43  │ suspended    │
└────────────────┴─────┴──────────────┘
Total: 2 users

# Agent tries: grep for numbers, split on │, strip whitespace...
# Breaks on: names with special chars, different terminal widths,
#             localized output, color codes embedded in text

Output that changes format based on result count:

$ tool get-item --id 1
name: foo, value: bar   # single item: flat format

$ tool get-items
name: foo               # multiple items: different structure
  value: bar
name: baz
  value: qux

Mixed content in stdout:

Initializing... done
Connecting to database... done
{"result": "ok", "id": 42}   # ← the actual data is buried in prose
Operation completed in 1.2s

Locale-dependent output:

# On en_US system:
$ tool show-size
File size: 1,234,567 bytes

# On de_DE system:
$ tool show-size
Dateigröße: 1.234.567 Bytes
# agent's number parsing breaks

Impact

Agent extracts wrong values, propagates errors downstream
Token waste: agent must reason about format before extracting data
Non-deterministic behavior across environments

Solutions

Machine-readable output flag:

# Always provide a structured output mode
tool list-users --output json
tool list-users --output jsonl   # one JSON object per line for streaming
tool list-users --output tsv     # tab-separated, good for piping
tool list-users --output plain   # minimal, no decoration (for humans too)

# Some CLIs use --json as a shorthand (gh, az, etc.)
tool list-users --json           # equivalent to --output json

# Some CLIs use -o as a short flag (kubectl, helm, etc.)
tool list-users -o json

JSON output schema:

{
  "ok": true,
  "data": [...],      // always present, even if empty array/null
  "error": null,      // always present
  "meta": {
    "count": 2,
    "duration_ms": 45
  }
}

On failure:

{
  "ok": false,
  "data": null,
  "error": {
    "code": "NOT_FOUND",
    "message": "User with id=999 does not exist",
    "details": {}
  }
}

Rules for agent-compatible output: 1. Same schema whether 0, 1, or N results 2. No prose mixed into data output (prose goes to stderr) 3. No color codes in --output json mode (detect NO_COLOR env var) 4. Numbers always in invariant locale (. decimal, no thousands separator) 5. Dates always in ISO 8601 (2024-03-11T14:30:00Z) 6. Boolean as true/false, never yes/no/1/0 in JSON mode

For framework design: - Auto-detect output format based on --output flag or CI=true env - Provide output formatters as first-class framework primitives - Emit a JSON schema for every command's output via --output-schema

Merged from §48: The following content was originally a separate challenge. It is consolidated here because it describes a specific case of the same root problem.

Subsection: Structured Output Envelope Absence

The Problem

Even when a CLI tool supports --format json or --output json, the JSON output is typically ad-hoc per command — each command invents its own top-level structure, its own error format, its own success/failure indicator, and its own metadata placement. Agents consuming a suite of tools must learn a different JSON schema for each command's success case, each command's error case, and have no reliable way to extract common metadata (timing, version, request ID) across tools.

This is distinct from challenge #2 (Output Format & Parseability), which covers whether JSON output exists at all. This challenge concerns the envelope — the consistent wrapper structure around any JSON response — which challenge #2 does not require.

// Tool A success:
{"status": "ok", "result": {...}, "timestamp": "..."}

// Tool B success:
{"data": [...], "count": 42}

// Tool C success:
[{"id": 1, ...}, {"id": 2, ...}]   // bare array, no envelope

// Tool D error:
{"error": "not found"}

// Tool E error:
{"message": "Resource not found", "code": 404}

// Tool F error (exit code 1, stderr):
"Error: resource 'xyz' not found\n"

An agent consuming all six tools must handle six different response shapes for what are semantically identical outcomes. More critically, there is no standard way to: - Distinguish success from failure without inspecting field names and values (not just exit codes) - Extract the error message in a format-agnostic way - Find pagination metadata (has_more, next_cursor) when it exists - Determine whether output data is complete or truncated - Read request timing (duration_ms) or correlation IDs (request_id)

The comparison matrix shows REQ-F-004 (Consistent JSON Response Envelope) is ✗ across all evaluated solutions — including agentyper, which provides JSON output but without a standardized envelope.

Impact

Agent must write per-tool parsing logic for each command's success and error formats
No generic error handler: agents cannot uniformly check response.ok or response.error.code across tools
Pagination metadata lives in different fields or different locations per tool, making generic pagination handling impossible
Correlating requests across tools (for debugging) requires per-tool knowledge of where timestamps and IDs live
LLM token spend increases when agents must reason about schema variations rather than applying a known pattern

Solutions

Standard envelope format:

{
  "ok": true,
  "data": { ... },
  "error": null,
  "warnings": [],
  "meta": {
    "request_id": "req_abc123",
    "duration_ms": 142,
    "schema_version": "1.0.0",
    "truncated": false,
    "has_more": false,
    "next_cursor": null
  }
}

Error envelope:

{
  "ok": false,
  "data": null,
  "error": {
    "code": "NOT_FOUND",
    "message": "Resource 'xyz' does not exist",
    "field": null,
    "retryable": false,
    "retry_after_ms": null
  },
  "warnings": [],
  "meta": { "request_id": "req_abc124", "duration_ms": 23, "schema_version": "1.0.0" }
}

For framework design: - Make the ok/data/error/warnings/meta envelope mandatory for all structured JSON output; prohibit raw arrays or bare objects as top-level responses - Framework-generated output functions (output(), echo()) must serialize through the envelope automatically; direct print() / console.log() must be prohibited in command handlers - Define a JSON Schema for the envelope itself and publish it as a standard (analogous to JSON:API or JSON-LD) so agents can validate responses against it - The meta section must always include request_id, duration_ms, and schema_version without any author effort (framework auto-injects these) - error.code must be from the standard exit code taxonomy (challenge #1) — machine-readable string constant, not a free-form message

Evaluation

Score	Condition
0	No structured output mode; human-formatted tables, prose, or locale-dependent text only
1	`--output json` exists on some commands but format varies; prose mixed into stdout; no consistent envelope
2	`--output json` on all commands; consistent `ok`/`data`/`error` top-level structure; no prose on stdout
3	Full envelope with `ok`, `data`, `error`, `warnings`, `meta` (including `request_id`, `duration_ms`); auto-activated when `CI=true` or stdout not a TTY

Check: Run any command with --output json and redirect stderr to /dev/null — verify stdout is valid JSON with ok and data fields regardless of result count (0, 1, or N items).

Agent Workaround

Always request structured output and detect format violations before parsing:

# Discover the JSON flag from --help before invoking the real command
_help = subprocess.run([*cmd, "--help"], capture_output=True, text=True)
help_text = _help.stdout + _help.stderr

JSON_FLAG_PATTERNS = [
    (r"--output\s+\w*json", ["--output", "json"]),  # --output json / --output-format json
    (r"--json\b",            ["--json"]),             # gh, az
    (r"-o\b",                ["-o", "json"]),         # kubectl, helm
]
json_flag = next(
    (flag for pattern, flag in JSON_FLAG_PATTERNS if re.search(pattern, help_text)),
    None,
)
if json_flag is None:
    raise ValueError(f"No JSON output flag found in --help for: {cmd}")

result = subprocess.run(
    [*cmd, *json_flag],
    capture_output=True, text=True,
    env={**os.environ, "NO_COLOR": "1", "CI": "true"},
)

stdout = result.stdout.strip()

# Detect help text pollution (invocation error)
if result.returncode != 0 and any(kw in stdout for kw in ("Usage:", "Options:", "Commands:")):
    raise ValueError(f"Received help text instead of JSON — likely a usage error: {cmd}")

# Parse the last valid JSON line (guards against leading prose)
for line in reversed(stdout.splitlines()):
    try:
        parsed = json.loads(line)
        break
    except json.JSONDecodeError:
        continue
else:
    raise ValueError(f"No valid JSON in output: {stdout[:200]}")

ok = parsed.get("ok", parsed.get("status") == "ok")
data = parsed.get("data") or parsed.get("result") or parsed

Limitation: If the tool has no --output json flag and mixes prose with data in stdout, regex extraction is fragile and environment-dependent — there is no reliable agent-side fix; treat the tool as unstructured and require human review of any extracted values