18 high error quality
Part III: Errors & Discoverability | Challenge §18
18. Error Message Quality
Severity: High | Frequency: Very Common | Detectability: Easy | Token Spend: High | Time: Medium | Context: High
The Problem
When a command fails, the agent needs to understand: what failed, why, and what to do next. Vague, undirected, or human-only error messages force the agent to guess.
Errors that don't help the agent:
$ tool deploy
Error: Something went wrong
exit 1
# Agent has zero actionable information
$ tool connect --host db.example.com
Connection failed.
exit 1
# Was it DNS? Auth? Firewall? Timeout? Agent doesn't know which to fix.
$ tool validate config.yaml
Validation error on line 14
exit 1
# Agent doesn't know what the error is, what the fix is, or what field
Stack traces as error output:
$ tool process file.csv
Traceback (most recent call last):
File "tool.py", line 234, in process
result = parser.parse(row)
File "tool.py", line 89, in parse
return int(row['count'])
ValueError: invalid literal for int() with base 10: 'N/A'
exit 1
# Agent receives a Python traceback — high token cost, low actionability
Errors that require human interpretation:
$ tool sync
SQLSTATE[23000]: Integrity constraint violation: 1062 Duplicate entry '42' for key 'PRIMARY'
exit 1
# Agent would need to reason about SQL error codes
Impact
- Agent retries with identical parameters (no basis to change anything)
- Agent escalates to user with no useful information
- Token waste reasoning about unparseable error text
Solutions
Structured error format:
{
"ok": false,
"error": {
"code": "CONNECTION_REFUSED", // machine-readable code
"message": "Cannot connect to database at db.example.com:5432",
"cause": "Connection refused (ECONNREFUSED)",
"suggestion": "Verify the database is running: `tool db status`",
"docs_url": "https://docs.example.com/errors/CONNECTION_REFUSED",
"context": {
"host": "db.example.com",
"port": 5432,
"timeout_ms": 5000
}
}
}
Error code taxonomy:
{DOMAIN}_{NOUN}_{CONDITION}
Examples:
DB_CONNECTION_REFUSED
AUTH_TOKEN_EXPIRED
FILE_CONFIG_NOT_FOUND
API_RATE_LIMIT_EXCEEDED
INPUT_PARAM_INVALID
Suggestion field for common errors:
"suggestion": "Run `tool login` to refresh your credentials"
"suggestion": "Use --force to overwrite existing file"
"suggestion": "Check network connectivity with: ping db.example.com"
For framework design:
- All errors MUST have a code (machine) and message (human)
- suggestion field is encouraged for recoverable errors
- Never emit raw stack traces to stdout; log them to stderr or a file
- Provide an error code registry queryable via tool errors list
Evaluation
| Score | Condition |
|---|---|
| 0 | Errors are prose strings or stack traces; no machine-readable code; agent cannot act without human interpretation |
| 1 | Some errors have a code field; message is present but suggestion and context are absent |
| 2 | All errors have code, message, and context; stack traces go to stderr, not stdout |
| 3 | suggestion field on all recoverable errors; docs_url per error code; tool errors list --output json enumerates all codes |
Check: Trigger a known error (e.g., pass an invalid argument) and verify stdout contains {"ok": false, "error": {"code": "...", "message": "...", "suggestion": "..."}} with no stack trace.
Agent Workaround
Extract and act on error.code and error.suggestion rather than parsing message text:
import subprocess, json
result = subprocess.run(
["tool", "connect", "--host", host, "--output", "json"],
capture_output=True, text=True,
)
try:
parsed = json.loads(result.stdout)
except json.JSONDecodeError:
# No structured output — raw crash or prose error on stdout
raise RuntimeError(f"Tool produced no JSON: {result.stdout[:200]}")
if not parsed.get("ok"):
error = parsed["error"]
code = error.get("code", "UNKNOWN")
suggestion = error.get("suggestion", "")
context = error.get("context", {})
if code == "CONNECTION_REFUSED":
# Use the suggestion to determine next action
raise RuntimeError(f"Connection failed: {suggestion or 'check host/port'}")
elif code == "AUTH_TOKEN_EXPIRED":
# Trigger re-auth flow
refresh_token()
else:
raise RuntimeError(f"[{code}] {error.get('message')} | {suggestion}")
Check stderr for stack traces when stdout JSON is missing:
if result.returncode != 0 and not result.stdout.strip():
# Unstructured failure — check stderr for clues
stderr = result.stderr
if "Traceback" in stderr:
# Unhandled exception — extract the last line
last_line = [l for l in stderr.splitlines() if l.strip()][-1]
raise RuntimeError(f"Tool crash: {last_line}")
Limitation: If the tool emits only prose error messages with no code field, the agent must pattern-match against message text — this is fragile and will break when the tool's error messages change wording