gh — CLI Agent Evaluation
Evaluated against the CLI Agent Spec — a specification defining 71 failure modes for CLI tools used under AI agent orchestration.
CLI version: 2.88.1
Evaluated: 2026-05-07
Scope: Critical failure modes (13 of 71)
Scores
| Metric | Result |
|---|---|
| Failure mode score | 1.8/3 — 4 passing · 7 partial · 2 failing |
| Readiness score | 7/15 [D] |
| Observed bugs | 3 confirmed during live evaluation |
| Worst gaps | §1 Exit Codes (0/3), §53 Credential Expiry (0/3) |
Key Findings
ghexits 0 on every HTTP error — 401s, 404s, GraphQL failures all look identical to success; agents cannot detect failure without parsing stderr prose- When a token expires mid-session,
ghprints a human-readable message and exits 0; the suggested recovery (gh auth login) requires a browser the agent cannot open gh issue createwith all flags provided creates a real resource immediately with no dry-run and no idempotency key — every retry on a failed call produces a duplicate
Files
| File | What it is |
|---|---|
| report-index.md | Full scorecard — all failure modes, readiness breakdown, links to all reports |
| report-issues.md | Concrete bugs and gaps agents will hit when using this CLI as-is |
| report-runtime.md | Compact operational brief — what to set, what to avoid, what to watch for |
| report-agent-dev.md | Integration guide — invocation invariants and per-gap workarounds for agent developers |
| report-dev.md | Fix list for CLI authors — what to implement, mapped to spec requirements |
| findings.md | Raw scorecard — one row per evaluated failure mode |
| issues.md | Observed bugs recorded during live evaluation |
| trace.md | Audit trail — exact check commands, exit codes, stdout/stderr per §N |
| environment.md | CLI environment profile — binary path, version, flags, timeout method |
| readiness.md | Proactive readiness scores across 5 dimensions |
Generated by cli-agent-audit · CLI Agent Spec