shopify — CLI Agent Evaluation
Evaluated against the CLI Agent Spec, a specification defining 71 failure modes for CLI tools used under AI agent orchestration.
CLI version: @shopify/cli/4.1.0 darwin-arm64 node-v25.9.0
Evaluated: 2026-05-28
Scope: Critical (22 of 71 failure modes)
Scores
| Metric | Result |
|---|---|
| Failure mode score | 0.6/3 — 1 passing · 9 partial · 11 failing · 1 indeterminate |
| Readiness score | 6/15 [D] |
| Observed bugs | 5 confirmed during live evaluation |
| Worst gaps | §10 Interactivity, §37 REPL triggering, §43 output size, §45 headless auth, §74 credential scopes |
Key Findings
- Headless auth is unsafe for agents:
shopify auth loginprinted a device-code URL and kept running until terminated. - Interactive command paths are not consistently guarded:
shopify theme consoledid not exit before a 3s alarm killed it. - Output is not parse-stable: release notes, preference errors, and prose boxes can appear before command-specific output.
- Mutating and destructive commands lack a universal dry-run, idempotency key, and effect field.
- The CLI has useful Oclif JSON command metadata, but no full schema with exit codes, credential scopes, interactivity, or safe-default declarations.
Files
| File | What it is |
|---|---|
| report-index.md | Full scorecard, readiness breakdown, links to all reports |
| report-issues.md | Concrete bugs and gaps agents will hit when using this CLI as-is |
| report-runtime.md | Compact operational brief — what to set, what to avoid, what to watch for |
| report-agent-dev.md | Integration guide — invocation invariants and per-gap workarounds for agent developers |
| report-dev.md | Fix list for CLI authors — what to implement |
| findings.md | Raw scorecard — one row per evaluated failure mode |
| issues.md | Observed bugs recorded during live evaluation |
| trace.md | Audit trail — exact check commands, exit codes, stdout/stderr per §N |
| environment.md | CLI environment profile — binary path, version, flags, timeout method |
| readiness.md | Proactive readiness scores across 5 dimensions |
Generated by cli-agent-audit · CLI Agent Spec