gh — Issues Report
Generated: 2026-05-07 CLI version: 2.88.1 Scope: Critical failure modes Findings in scope: 13 failure modes
Observed Bugs (from evaluation notes)
§1 candidate — HTTP error responses exit 0
Discovered during: §1 and §53 evaluation — 2026-05-07
Symptom: gh issue view 999999999, GH_TOKEN=invalid gh repo view, and HTTP 404 responses all exit 0 despite being errors
Impact: Agent cannot branch on exit code to detect failures — must parse stderr prose for error detection
Trigger: Any command that produces an HTTP 4xx response
§62 candidate — gh issue create creates live resources during checks
Discovered during: §62 evaluation — 2026-05-07
Symptom: gh issue create --title X --body Y immediately creates a permanent resource with no confirmation and no dry-run option; prints only the URL
Impact: Agent testing invocation patterns will create real issues, PRs, or gists in production repos
Trigger: gh issue create, gh pr create, gh gist create with all required flags supplied
§45/§53 candidate — Auth failure not machine-readable
Discovered during: §45 and §53 evaluation — 2026-05-07
Symptom: Invalid token produces HTTP 401: Bad credentials on stderr and exits 0; suggested fix is gh auth login (interactive-only)
Impact: Agent cannot distinguish auth failure from network error or permissions error; cannot self-remediate
Trigger: Any command run with an expired or invalid GH_TOKEN
Failure-Mode Gaps (score < 3, sorted: score asc, severity desc)
§1 — Exit Codes & Status Signaling [Critical · score 0/3]
What fails: Error responses (HTTP 4xx, resource not found, GraphQL errors) all exit 0 — agent cannot detect failure without parsing stderr Frequency: Very Common Token/time cost when it triggers: Token Spend: High · Time: High Workaround exists: Partial (must parse stderr; fragile)
§53 — Credential Expiry Mid-Session [Critical · score 0/3]
What fails: Expired or invalid token produces a human-readable stderr message and exits 0 — agent has no machine-readable signal that re-auth is needed and no non-interactive re-auth path Frequency: Common Token/time cost when it triggers: Token Spend: High · Time: High Workaround exists: Partial (grep stderr for "401" or "Bad credentials")
§12 — Idempotency & Safe Retries [Critical · score 1/3]
What fails: No idempotency guarantees or keys on mutating commands — retrying a failed gh issue create may create duplicate issues
Frequency: Common
Token/time cost when it triggers: Token Spend: High · Time: High
Workaround exists: No
§45 — Headless Authentication / OAuth Browser Flow Blocking [Critical · score 1/3]
What fails: Invalid/expired token error is not machine-readable (plain text stderr, exit 0); suggested recovery (gh auth login) requires a browser
Frequency: Common
Token/time cost when it triggers: Token Spend: High · Time: Critical
Workaround exists: Partial (pre-set GH_TOKEN; grep stderr for auth error patterns)
§2 — Output Format & Parseability [Critical · score 2/3]
What fails: --json works but is not auto-activated in non-TTY; no response envelope (ok/error/meta); field list must be specified manually per command
Frequency: Very Common
Token/time cost when it triggers: Token Spend: High · Time: Medium
Workaround exists: Yes (always pass --json <fields> explicitly)
§10 — Interactivity & TTY Requirements [Critical · score 2/3]
What fails: No --non-interactive flag; interactive suppression requires env vars (GH_PROMPT_DISABLED=1, GH_PAGER=cat) that are not prominently documented for agents
Frequency: Common
Token/time cost when it triggers: Token Spend: High · Time: Critical
Workaround exists: Yes (set env vars; see runtime brief)
§11 — Timeouts & Hanging Processes [Critical · score 2/3]
What fails: No built-in timeout flag; long API operations can hang indefinitely; no structured timeout exit code
Frequency: Common
Token/time cost when it triggers: Token Spend: High · Time: Critical
Workaround exists: Yes (wrap with perl -e 'alarm(N); exec(...)' on macOS)
§43 — Tool Output Result Size Unboundedness [Critical · score 2/3]
What fails: Default limit is 30 items (good), but no JSON pagination metadata — agent cannot detect whether results were truncated without knowing the default limit
Frequency: Common
Token/time cost when it triggers: Token Spend: Critical · Time: High
Workaround exists: Partial (always pass --limit N explicitly)
§62 — $EDITOR and $VISUAL Trap [Critical · score 2/3]
What fails: Commands that open an editor (e.g. gh issue create without --body) will invoke $EDITOR and block; bypass requires supplying all content flags — not documented for agent use
Frequency: Common
Token/time cost when it triggers: Token Spend: High · Time: Critical
Workaround exists: Yes (always pass --title and --body; set GH_EDITOR=/bin/false)
Passing (score 3/3 — safe to use without special handling)
§8 ANSI & Color Code Leakage, §50 Stdin Consumption Deadlock, §60 OS Output Buffer Deadlock, §64 Headless Display and GUI Launch Blocking
Risk Summary
| Category | Count | §N list |
|---|---|---|
| Observed bugs | 3 | §1, §53, §62 |
| Score 0 — complete failure | 2 | §1, §53 |
| Score 1 — major gap | 2 | §12, §45 |
| Score 2 — minor gap | 5 | §2, §10, §11, §43, §62 |
| Score 3 — passing | 4 | §8, §50, §60, §64 |
Highest-risk combination: §1 (exit 0 on all errors) combined with §53 (auth failure exits 0) means an agent has no reliable signal that any command failed — it must parse stderr prose for every invocation.