docuseal-cli - Issues Report
Generated: 2026-05-20 CLI version: 1.0.3 Scope: all Findings in scope: 71 failure modes
Observed Bugs (from evaluation notes)
These were witnessed directly when running checks against this CLI.
§2, §18 - Common failures produce stack traces instead of structured JSON
Discovered during: §2 evaluation - 2026-05-20
Symptom: Observed templates list without credentials, invalid JSON in -d, a missing upload file, and network failure. These exit 1 with Commander prose or Node stack traces rather than a structured error envelope.
Impact: Parse failure and brittle error handling for agents.
Trigger: node bin/run.js templates list --output json
§10 - configure prompt path exits 0 under non-TTY stdin
Discovered during: §10 evaluation - 2026-05-20
Symptom: Running node bin/run.js configure with empty stdin printed the server prompt and exited 0 without writing configuration.
Impact: False success signal for setup automation.
Trigger: node bin/run.js configure with empty stdin
§40 - Async handlers are registered under program.parse()
Discovered during: §40 evaluation - 2026-05-20
Symptom: src/index.js calls program.parse() while command actions are async; observed network/auth failures surfaced as unhandled stack traces.
Impact: Unhandled async errors break structured recovery.
Trigger: Source inspection of src/index.js.
§72 - Integration artifact version drift
Discovered during: §72 evaluation - 2026-05-20
Symptom: package.json and node bin/run.js --version report 1.0.3, while skills/docuseal-cli/SKILL.md metadata reports 1.0.6.
Impact: Agent integration docs can describe a different CLI than the installed binary.
Trigger: Compare node bin/run.js --version with skills/docuseal-cli/SKILL.md.
Failure-Mode Gaps (score 0-2, sorted: score asc, severity desc)
These are not confirmed bugs but verified gaps - the CLI does not meet the bar for reliable agent use.
§1 - Exit Codes & Status Signaling [Critical · score 0/3]
What fails: Failures observed with generic exit 1; no documented semantic exit-code table or JSON error body. Frequency: Very Common Token/time cost when it triggers: Token Spend: High · Time: High Workaround exists: Partial
§11 - Timeouts & Hanging Processes [Critical · score 0/3]
What fails: Network failure produced an uncaught Node stack trace; no timeout flag or TIMEOUT JSON.
Frequency: Common
Token/time cost when it triggers: Token Spend: High · Time: Critical
Workaround exists: Partial
§12 - Idempotency & Safe Retries [Critical · score 0/3]
What fails: Mutating commands have no idempotency key or effect/noop field. Frequency: Common Token/time cost when it triggers: Token Spend: High · Time: High Workaround exists: Partial
§13 - Partial Failure & Atomicity [Critical · score 0/3]
What fails: No partial-failure/resume protocol. Frequency: Common Token/time cost when it triggers: Token Spend: High · Time: High Workaround exists: Partial
§23 - Side Effects & Destructive Operations [Critical · score 0/3]
What fails: Destructive archive operations have no --dry-run or machine-readable danger declaration.
Frequency: Common
Token/time cost when it triggers: Token Spend: Medium · Time: High
Workaround exists: Partial
§24 - Authentication & Secret Handling [Critical · score 0/3]
What fails: Secrets can be supplied via hidden --api-key CLI flag; no standard redaction framework.
Frequency: Common
Token/time cost when it triggers: Token Spend: Medium · Time: Medium
Workaround exists: Partial
§25 - Prompt Injection via Output [Critical · score 0/3]
What fails: External API data is returned raw without a trusted/untrusted envelope. Frequency: Situational Token/time cost when it triggers: Token Spend: High · Time: High Workaround exists: Partial
§43 - Tool Output Result Size Unboundedness [Critical · score 0/3]
What fails: No output limit, truncation metadata, or schema max-output declaration. Frequency: Common Token/time cost when it triggers: Token Spend: Critical · Time: High Workaround exists: Partial
§53 - Credential Expiry Mid-Session [Critical · score 0/3]
What fails: No distinct credential-expiry code, reauth command, or expiry metadata. Frequency: Common Token/time cost when it triggers: Token Spend: High · Time: High Workaround exists: Partial
§60 - OS Output Buffer Deadlock [Critical · score 0/3]
What fails: No streaming protocol or heartbeat for long-running commands. Frequency: Common Token/time cost when it triggers: Token Spend: High · Time: Critical Workaround exists: Partial
§74 - Credential Scope Declaration Absence [Critical · score 0/3]
What fails: No machine-readable required scopes or permission check command. Frequency: Common Token/time cost when it triggers: Token Spend: Low · Time: Medium Workaround exists: Partial
§15 - Race Conditions & Concurrency [High · score 0/3]
What fails: No lock protocol for mutating/config operations. Frequency: Situational Token/time cost when it triggers: Token Spend: Medium · Time: Medium Workaround exists: Partial
§16 - Signal Handling & Graceful Cancellation [High · score 0/3]
What fails: No SIGTERM partial-result protocol. Frequency: Situational Token/time cost when it triggers: Token Spend: Medium · Time: Medium Workaround exists: Partial
§18 - Error Message Quality [High · score 0/3]
What fails: Validation/auth/file/network errors are prose or stack traces without code, suggestion, or context.
Frequency: Very Common
Token/time cost when it triggers: Token Spend: High · Time: Medium
Workaround exists: Partial
§19 - Retry Hints in Error Responses [High · score 0/3]
What fails: No retryable or retry_after_ms fields.
Frequency: Very Common
Token/time cost when it triggers: Token Spend: High · Time: High
Workaround exists: Partial
§22 - Schema Versioning & Output Stability [High · score 0/3]
What fails: No meta.schema_version in responses.
Frequency: Common
Token/time cost when it triggers: Token Spend: High · Time: High
Workaround exists: Partial
§26 - Stateful Commands & Session Management [High · score 0/3]
What fails: Implicit global config/env state; no status --output json context report.
Frequency: Common
Token/time cost when it triggers: Token Spend: Medium · Time: Medium
Workaround exists: Partial
§31 - Network Proxy Unawareness [High · score 0/3]
What fails: Network errors include no proxy context. Frequency: Situational Token/time cost when it triggers: Token Spend: Medium · Time: High Workaround exists: Partial
§35 - Agent Hallucination Input Patterns [High · score 0/3]
What fails: Percent-encoded/path-like values are not rejected with structured validation suggestions. Frequency: Common Token/time cost when it triggers: Token Spend: Medium · Time: Medium Workaround exists: Partial
§38 - Runtime Dependency Version Mismatch [High · score 0/3]
What fails: No engines declaration or startup runtime-version JSON check.
Frequency: Common
Token/time cost when it triggers: Token Spend: High · Time: High
Workaround exists: Partial
§40 - parse() vs parseAsync() Silent Race Condition [High · score 0/3]
What fails: Source uses program.parse() with async action handlers.
Frequency: Common (Node.js ecosystem)
Token/time cost when it triggers: Token Spend: High · Time: High
Workaround exists: Partial
§47 - MCP Wrapper Schema Staleness [High · score 0/3]
What fails: No MCP wrapper health, schema version, or stale-schema mapping. Frequency: Common Token/time cost when it triggers: Token Spend: High · Time: High Workaround exists: Partial
§49 - Async Job / Polling Protocol Absence [High · score 0/3]
What fails: No async job/status protocol or distinct running/done exit codes. Frequency: Common Token/time cost when it triggers: Token Spend: High · Time: High Workaround exists: Partial
§54 - Conditional / Dependent Argument Requirements [High · score 0/3]
What fails: No machine-readable arg groups or all-at-once dependent-argument validation. Frequency: Common Token/time cost when it triggers: Token Spend: High · Time: Medium Workaround exists: Partial
§55 - Silent Data Truncation [High · score 0/3]
What fails: No schema max lengths or FIELD_TRUNCATED/validation warning protocol.
Frequency: Common
Token/time cost when it triggers: Token Spend: Medium · Time: Medium
Workaround exists: Partial
§56 - Exit Code Masking in Shell Pipelines [High · score 0/3]
What fails: No ok, meta.ok, or meta.exit_code fields.
Frequency: Common
Token/time cost when it triggers: Token Spend: Medium · Time: Low
Workaround exists: Partial
§58 - Multi-Agent Concurrent Invocation Conflict [High · score 0/3]
What fails: Config writes use direct writes to shared config; no locking or conflict code. Frequency: Situational Token/time cost when it triggers: Token Spend: Medium · Time: High Workaround exists: Partial
§65 - Global Configuration State Contamination [High · score 0/3]
What fails: Config writes default to global user config without --global or write-scope metadata.
Frequency: Common
Token/time cost when it triggers: Token Spend: Medium · Time: High
Workaround exists: Partial
§67 - Agent-Generated Input Syntax Rejection [High · score 0/3]
What fails: Strict JSON parse errors produce raw stack traces; no INVALID_JSON corrected input.
Frequency: Common
Token/time cost when it triggers: Token Spend: High · Time: Medium
Workaround exists: Partial
§68 - Third-Party Library Stdout Pollution [High · score 0/3]
What fails: No stdout interception or warnings envelope. Frequency: Common Token/time cost when it triggers: Token Spend: Medium · Time: Low Workaround exists: Partial
§70 - Single-Argument Arity Forcing Agent Loop Overhead [High · score 0/3]
What fails: Single-ID commands do not accept variadic IDs with per-item results. Frequency: Common Token/time cost when it triggers: Token Spend: Medium · Time: Medium Workaround exists: Partial
§72 - Integration Artifact Version Drift [High · score 0/3]
What fails: Skill metadata version 1.0.6 differs from binary/package version 1.0.3, confirming integration artifact drift.
Frequency: Common
Token/time cost when it triggers: Token Spend: High · Time: Medium
Workaround exists: Partial
§6 - Command Composition & Piping [Medium · score 0/3]
What fails: No --output id mode and no stdin - ID protocol.
Frequency: Common
Token/time cost when it triggers: Token Spend: Medium · Time: Low
Workaround exists: Partial
§7 - Output Non-Determinism [Medium · score 0/3]
What fails: Raw API output has no stable-output mode or volatile-field isolation. Frequency: Common Token/time cost when it triggers: Token Spend: Medium · Time: Medium Workaround exists: Partial
§20 - Environment & Dependency Discovery [Medium · score 0/3]
What fails: No doctor --output json or structured dependency preflight.
Frequency: Common
Token/time cost when it triggers: Token Spend: Medium · Time: Medium
Workaround exists: Partial
§21 - Schema & Help Discoverability [Medium · score 0/3]
What fails: No --schema --output json; help is prose only.
Frequency: Very Common
Token/time cost when it triggers: Token Spend: High · Time: Medium
Workaround exists: Partial
§29 - Working Directory Sensitivity [Medium · score 0/3]
What fails: File paths are resolved relative to CWD with no meta.cwd or framework --cwd.
Frequency: Common
Token/time cost when it triggers: Token Spend: Medium · Time: Low
Workaround exists: Partial
§30 - Undeclared Filesystem Side Effects [Medium · score 0/3]
What fails: Config filesystem side effects are not declared or inventoried. Frequency: Common Token/time cost when it triggers: Token Spend: Low · Time: Low Workaround exists: Partial
§33 - Observability & Audit Trail [Medium · score 0/3]
What fails: No request_id, duration_ms, trace propagation, or audit log.
Frequency: Very Common
Token/time cost when it triggers: Token Spend: Medium · Time: High
Workaround exists: Partial
§52 - Recursive Command Tree Discovery Cost [Medium · score 0/3]
What fails: No --schema command tree; agents must recurse through help text.
Frequency: Very Common
Token/time cost when it triggers: Token Spend: High · Time: Medium
Workaround exists: Partial
§57 - Locale-Dependent Error Messages [Medium · score 0/3]
What fails: OS/file errors surface as raw stack traces, not normalized structured errors. Frequency: Situational Token/time cost when it triggers: Token Spend: High · Time: Low Workaround exists: Partial
§63 - Terminal Column Width Output Corruption [Medium · score 0/3]
What fails: No JSON mode; help prose wraps at terminal width. Frequency: Common Token/time cost when it triggers: Token Spend: Medium · Time: Low Workaround exists: Partial
§2 - Output Format & Parseability [Critical · score 1/3]
What fails: API commands emit JSON on success, but there is no --output json and no ok/data/error envelope; many errors are prose/stack traces.
Frequency: Very Common
Token/time cost when it triggers: Token Spend: High · Time: Medium
Workaround exists: Partial
§10 - Interactivity & TTY Requirements [Critical · score 1/3]
What fails: configure has flags for non-interactive setup, but the prompt path still runs in non-TTY and can exit 0 without configuring.
Frequency: Common
Token/time cost when it triggers: Token Spend: High · Time: Critical
Workaround exists: Partial
§34 - Shell Injection via Agent-Constructed Commands [Critical · score 1/3]
What fails: No shell execution path found, but suspicious name/path values are not validated into structured errors. Frequency: Common Token/time cost when it triggers: Token Spend: High · Time: High Workaround exists: Partial
§45 - Headless Authentication / OAuth Browser Flow Blocking [Critical · score 1/3]
What fails: Missing auth exits immediately, but as an uncaught stack trace rather than AUTH_REQUIRED with auth_methods.
Frequency: Common
Token/time cost when it triggers: Token Spend: High · Time: Critical
Workaround exists: Partial
§3 - Stderr vs Stdout Discipline [High · score 1/3]
What fails: Data is normally stdout, but help/prose success/error output can also appear on stdout. Frequency: Very Common Token/time cost when it triggers: Token Spend: Medium · Time: Low Workaround exists: Partial
§5 - Pagination & Large Output [High · score 1/3]
What fails: List commands expose limit/cursor flags, but no standard pagination metadata envelope. Frequency: Common Token/time cost when it triggers: Token Spend: High · Time: High Workaround exists: Partial
§14 - Argument Validation Before Side Effects [High · score 1/3]
What fails: Commander validates some arguments before execution, but exit code is generic and errors are not structured JSON. Frequency: Common Token/time cost when it triggers: Token Spend: Medium · Time: Medium Workaround exists: Partial
§28 - Config File Shadowing & Precedence [High · score 1/3]
What fails: README documents precedence and configure --list shows config, but sources are not machine-readable.
Frequency: Common
Token/time cost when it triggers: Token Spend: High · Time: High
Workaround exists: Partial
§46 - API Schema to CLI Flag Translation Loss [High · score 1/3]
What fails: -d accepts JSON/bracket notation, but there is no full --json body flag or API-schema validation.
Frequency: Common
Token/time cost when it triggers: Token Spend: High · Time: Medium
Workaround exists: Partial
§51 - Shell Word Splitting and Glob Expansion Interference [High · score 1/3]
What fails: Exec-array invocation preserves spaced file paths, but missing files become unstructured ENOENT stack traces.
Frequency: Common
Token/time cost when it triggers: Token Spend: Medium · Time: Medium
Workaround exists: Partial
§59 - High-Entropy String Token Poisoning [High · score 1/3]
What fails: configure --list masks stored api_key, but there is no semantic token summary/unmask protocol.
Frequency: Common
Token/time cost when it triggers: Token Spend: High · Time: Low
Workaround exists: Partial
§69 - Argument Order Ambiguity [High · score 1/3]
What fails: Subcommand-level global flags work after the subcommand; root-level placement is rejected. Frequency: Common Token/time cost when it triggers: Token Spend: Medium · Time: Medium Workaround exists: Partial
§73 - Documentation Accuracy Drift [High · score 1/3]
What fails: No AGENTS.md; available CLAUDE/skill docs are useful but version drift exists. Frequency: Common Token/time cost when it triggers: Token Spend: High · Time: Medium Workaround exists: Partial
§4 - Verbosity & Token Cost [Medium · score 1/3]
What fails: No progress spam observed, but there is no quiet/fields control and CI does not activate structured mode. Frequency: Very Common Token/time cost when it triggers: Token Spend: High · Time: Low Workaround exists: Partial
§27 - Platform & Shell Portability [Medium · score 1/3]
What fails: Node CLI is portable in principle, but there is no doctor command and failures are raw. Frequency: Common Token/time cost when it triggers: Token Spend: Medium · Time: Medium Workaround exists: Partial
§44 - Agent Knowledge Packaging Absence [Medium · score 1/3]
What fails: Repository ships CLAUDE.md and a skill, but no AGENTS.md/CONTEXT.md and no --schema danger/requires fields.
Frequency: Very Common
Token/time cost when it triggers: Token Spend: High · Time: High
Workaround exists: Partial
§42 - Debug / Trace Mode Secret Leakage [Critical · score 2/3]
What fails: No debug/trace mode found to leak secrets, but no sensitive schema/redaction declaration exists. Frequency: Situational Token/time cost when it triggers: Token Spend: Low · Time: Low Workaround exists: Partial
§71 - Non-Interactive Installation Absence [Critical · score 2/3]
What fails: README documents non-interactive npm install/use; no AGENTS.md install protocol and global install idempotency was not exercised. Frequency: Common Token/time cost when it triggers: Token Spend: Low · Time: Critical Workaround exists: Partial
§9 - Binary & Encoding Safety [High · score 2/3]
What fails: File uploads use Buffer/base64 for binary content; error handling remains unstructured. Frequency: Situational Token/time cost when it triggers: Token Spend: Low · Time: Medium Workaround exists: Partial
§41 - Update Notifier Side-Channel Output Pollution [High · score 2/3]
What fails: No update notifier found; CI/NO_UPDATE_NOTIFIER produced no side-channel notice. Frequency: Common (Node.js/npm ecosystem) Token/time cost when it triggers: Token Spend: Medium · Time: Medium Workaround exists: Partial
Passing (score 3/3 - safe to use without special handling)
§37 REPL / Interactive Mode Accidental Triggering, §50 Stdin Consumption Deadlock, §61 Bidirectional Pipe Payload Deadlock, §62 $EDITOR and $VISUAL Trap, §64 Headless Display and GUI Launch Blocking, §66 Symlink Loop and Recursive Traversal Exhaustion, §17 Child Process Leakage, §8 ANSI & Color Code Leakage, §32 Self-Update & Auto-Upgrade Behavior
Risk Summary
| Category | Count | §N list |
|---|---|---|
| Observed bugs | 4 | §2, §10, §18, §40, §72 |
| Score 0 - complete failure | 42 | §43, §53, §60, §35, §38, §40, §47, §49, §54, §55, §56, §58, §65, §67, §68, §70, §72, §52, §57, §63, §11, §12, §13, §15, §16, §23, §24, §25, §74, §1, §6, §7, §26, §31, §29, §30, §18, §19, §22, §20, §21, §33 |
| Score 1 - major gap | 16 | §34, §45, §46, §51, §59, §69, §73, §44, §10, §14, §2, §3, §5, §4, §28, §27 |
| Score 2 - minor gap | 4 | §42, §71, §41, §9 |
| Score 3 - passing | 9 | §37, §50, §61, §62, §64, §66, §17, §8, §32 |
| Indeterminate (?/3 - timed out) | 0 | None |
Highest-risk combination: §1 and §2 combine generic exit codes with no stable output envelope, so agents cannot reliably distinguish success, validation failure, auth failure, network failure, or crashes.