shopify — CLI Agent Evaluation

Evaluated against the CLI Agent Spec, a specification defining 71 failure modes for CLI tools used under AI agent orchestration.

CLI version: @shopify/cli/4.1.0 darwin-arm64 node-v25.9.0 Evaluated: 2026-05-28 Scope: Critical (22 of 71 failure modes)

Scores

Metric	Result
Failure mode score	0.6/3 — 1 passing · 9 partial · 11 failing · 1 indeterminate
Readiness score	6/15 [D]
Observed bugs	5 confirmed during live evaluation
Worst gaps	§10 Interactivity, §37 REPL triggering, §43 output size, §45 headless auth, §74 credential scopes

Headless auth is unsafe for agents: shopify auth login printed a device-code URL and kept running until terminated.
Interactive command paths are not consistently guarded: shopify theme console did not exit before a 3s alarm killed it.
Output is not parse-stable: release notes, preference errors, and prose boxes can appear before command-specific output.
Mutating and destructive commands lack a universal dry-run, idempotency key, and effect field.
The CLI has useful Oclif JSON command metadata, but no full schema with exit codes, credential scopes, interactivity, or safe-default declarations.

File	What it is
report-index.md	Full scorecard, readiness breakdown, links to all reports
report-issues.md	Concrete bugs and gaps agents will hit when using this CLI as-is
report-runtime.md	Compact operational brief — what to set, what to avoid, what to watch for
report-agent-dev.md	Integration guide — invocation invariants and per-gap workarounds for agent developers
report-dev.md	Fix list for CLI authors — what to implement
findings.md	Raw scorecard — one row per evaluated failure mode
issues.md	Observed bugs recorded during live evaluation
trace.md	Audit trail — exact check commands, exit codes, stdout/stderr per §N
environment.md	CLI environment profile — binary path, version, flags, timeout method
readiness.md	Proactive readiness scores across 5 dimensions

Generated by cli-agent-audit · CLI Agent Spec