Skip to content

Ecosystem Runtime Agent Specific

Agent-specific patterns discovered from real frameworks, libraries, and multi-agent deployments.

Failure modes: 37 active · 3 merged elsewhere  |  🔴 12 critical · 🟠 21 high · 🟡 4 medium


File Severity Summary
34-critical-shell-injection.md 🔴 Critical When an AI agent constructs CLI invocations — either as shell strings or by assembling argument arrays from LLM-gener...
37-critical-repl-triggering.md 🔴 Critical Some CLI tools expose a REPL (Read-Eval-Print Loop) or interactive shell mode — either as an explicit subcommand (`my...
42-critical-debug-secret-leakage.md 🔴 Critical CLI frameworks often provide debug/trace modes that dump full invocation context to aid debugging
43-critical-output-size-unboundedness.md 🔴 Critical Challenge #5 (Pagination & Large Output) addresses paginated list commands that return many items
45-critical-headless-auth.md 🔴 Critical Many modern CLI tools implement authentication via OAuth flows that require a browser — typically an OAuth authorizat...
50-critical-stdin-deadlock.md 🔴 Critical Distinct from §10 (interactive prompts), some CLI tools silently read from stdin as a default fallback — not as a del...
53-critical-credential-expiry.md 🔴 Critical Agents often operate over sessions longer than credential lifetimes
60-critical-output-buffer-deadlock.md 🔴 Critical When a CLI tool's stdout is connected to a pipe rather than a TTY, the OS switches from line-buffered to fully-buffer...
61-critical-pipe-payload-deadlock.md 🔴 Critical UNIX pipes have a finite kernel buffer (typically 64KB on Linux)
62-critical-editor-trap.md 🔴 Critical Distinct from §37 (REPL triggering), many CLI tools invoke the user's $EDITOR or $VISUAL environment variable to ...
64-critical-headless-gui.md 🔴 Critical Distinct from §45 (OAuth browser flow), many CLI tools launch GUI applications for operations unrelated to authentica...
71-critical-noninteractive-installation.md 🔴 Critical Agents operating in fresh environments must install the CLI before use; interactive install steps (license prompts, w...
35-high-hallucination-inputs.md 🟠 High AI agents make systematically different input errors than human operators
38-high-dependency-version-mismatch.md 🟠 High CLI tools written in interpreted languages (Python, Node
40-high-async-race-condition.md 🟠 High Commander
41-high-update-notifier.md 🟠 High Many widely-deployed CLI tools (particularly in the npm/Commander
46-high-api-translation-loss.md 🟠 High CLI tools that wrap HTTP APIs (the majority of developer-facing CLIs) suffer from "translation loss" — the API's nati...
47-high-mcp-schema-staleness.md 🟠 High The MCP-wrapped CLI pattern is the most effective approach for making legacy CLIs agent-compatible: wrap an existing ...
49-high-async-job-polling.md 🟠 High Many CLI operations are inherently asynchronous — deployments, builds, data migrations, batch exports
51-high-glob-expansion.md 🟠 High When agents construct CLI invocations as shell strings and pass them to a shell executor, the shell performs word spl...
54-high-conditional-args.md 🟠 High Many commands have arguments only required when another argument takes a specific value: --auth-type oauth requires...
55-high-silent-truncation.md 🟠 High CLI tools that write to remote APIs often silently truncate field values that exceed API limits: descriptions > 255 c...
56-high-pipeline-exit-masking.md 🟠 High When a CLI tool is used in a shell pipeline (`tool
58-high-multiagent-conflict.md 🟠 High Distinct from §15 (race conditions within a single invocation), this is about multiple independent agent instances in...
59-high-high-entropy-tokens.md 🟠 High JWTs, API keys, UUIDs, base64 blobs, and cryptographic hashes in tool output consume hundreds of LLM tokens each — ye...
65-high-global-config-contamination.md 🟠 High Distinct from §28 (config file shadowing on READ), this challenge is about tools that WRITE to global configuration f...
66-high-symlink-loop.md 🟠 High When a CLI tool performs recursive directory traversal (copy, delete, archive, search) and encounters a circular syml...
67-high-json5-input.md 🟠 High LLMs frequently generate near-valid structured input that strict parsers reject: JSON with trailing commas, inline co...
68-high-stdout-pollution.md 🟠 High Distinct from §3 (command author stream discipline) and §41 (update notifiers), this challenge is about deeply embedd...
69-high-argument-order-ambiguity.md 🟠 High CLI parsers differ on whether options may appear after positional arguments or subcommands — agents construct invocations in LLM-natural order, causing silent misparsing or outright rejection
70-high-single-argument-arity.md 🟠 High Commands that accept only one positional argument force agents to loop N times for N items — each iteration a separate process launch, auth check, and round trip — instead of one variadic call
72-high-integration-artifact-drift.md 🟠 High Agent-facing integration artifacts (OpenAPI specs, AGENTS.md, skill files) drift from the CLI binary as it evolves — ...
73-high-documentation-accuracy-drift.md 🟠 High AGENTS.md and agent-facing docs become inaccurate over time — flag names change, commands are removed, env vars rename...
44-medium-knowledge-packaging.md 🟡 Medium Agents consuming a CLI tool have two information sources: the tool's --help text (or --schema if available) and a...
52-medium-command-tree-discovery.md 🟡 Medium Most CLIs require N+1 help calls to discover the full command surface: one call to list top-level subcommands, then o...
57-medium-locale-errors.md 🟡 Medium Distinct from §2 (locale-invariant serialization of numbers/dates), many CLI tools embed raw OS or runtime error mess...
63-medium-column-width-corruption.md 🟡 Medium Tools that format output based on terminal width ($COLUMNS, `shutil

Merged (redirect stubs):

Detailed Metrics

Challenge Severity Frequency Detectability Token Spend Time Context
§34 🔴 Critical Common Hard High High Medium
§37 🔴 Critical Situational Hard High Critical Low
§42 🔴 Critical Situational Hard Low Low High
§43 🔴 Critical Common Hard Critical High Critical
§45 🔴 Critical Common Hard High Critical Low
§50 🔴 Critical Common Hard High Critical Low
§53 🔴 Critical Common Hard High High Low
§60 🔴 Critical Common Hard High Critical Low
§61 🔴 Critical Situational Hard High Critical Low
§62 🔴 Critical Common Hard High Critical Low
§64 🔴 Critical Common Hard High Critical Low
§35 🟠 High Common Hard Medium Medium Low
§38 🟠 High Common Medium High High Low
§40 🟠 High Common (Node.js ecosystem) Hard High High Low
§41 🟠 High Common (Node.js/npm ecosystem) Medium Medium Medium Medium
§46 🟠 High Common Medium High Medium Medium
§47 🟠 High Common Hard High High Low
§49 🟠 High Common Hard High High Medium
§51 🟠 High Common Medium Medium Medium Low
§54 🟠 High Common Hard High Medium Low
§55 🟠 High Common Hard Medium Medium Low
§56 🟠 High Common Hard Medium Low Low
§58 🟠 High Situational Hard Medium High Low
§59 🟠 High Common Medium High Low High
§65 🟠 High Common Hard Medium High Low
§66 🟠 High Situational Hard Medium Critical Low
§67 🟠 High Common Easy High Medium Low
§68 🟠 High Common Medium Medium Low High
§69 🟠 High Common Medium Medium Medium Low
§70 🟠 High Common Easy Medium Medium Low
§71 🔴 Critical Common Easy Low Critical Low
§72 🟠 High Common Medium High Medium Low
§73 🟠 High Common Hard High Medium Low
§44 🟡 Medium Very Common Easy High High Medium
§52 🟡 Medium Very Common Easy High Medium High
§57 🟡 Medium Situational Easy High Low Medium
§63 🟡 Medium Common Easy Medium Low Medium