Skip to content

gws — CLI Agent Evaluation

Evaluated against the CLI Agent Spec — a specification defining 71 failure modes for CLI tools used under AI agent orchestration.

CLI version: 0.17.0 Evaluated: 2026-05-14 Scope: Critical (22 of 71 failure modes)

Scores

Metric Result
Failure mode score 1.23/3 — 3 passing · 15 partial · 4 failing
Observed bugs 7 confirmed during live evaluation
Worst gaps §53 Credential Expiry (0/3), §11 Timeouts (0/3), §25 Prompt Injection (0/3), §13 Partial Failure (0/3)

Key Findings

  • Credential expiry returns the same error as permanent permission denial — agents cannot distinguish retryable from fatal auth failures, causing either silent retry loops or premature task abandonment (§53)
  • No --timeout flag — a single hung API call blocks the entire agent pipeline indefinitely until OS TCP timeout fires, with no structured error or exit code (§11)
  • Auth errors on list commands exit 0 — agents that branch on exit code treat a failed call as success and proceed with an empty or missing result (§1)
  • Email bodies, document content, and file names are returned as raw untagged strings — LLMs consuming gws output are exposed to prompt injection from external data with no structural protection (§25)
  • gws auth login opens a browser with no --print-url alternative — headless agents must obtain OAuth tokens externally before first use (§64)

Files

File What it is
report-index.md Full scorecard — all failure modes, links to all reports
report-issues.md Concrete bugs and gaps agents will hit when using gws as-is
report-runtime.md Compact operational brief — what to set, what to avoid, what to watch for
report-agent-dev.md Integration guide — invocation invariants and per-gap workarounds for agent developers
report-dev.md Fix list for gws authors — what to implement, mapped to spec requirements
findings.md Raw scorecard — one row per evaluated failure mode
issues.md Observed bugs recorded during live evaluation
trace.md Audit trail — exact check commands, exit codes, stdout/stderr per §N
environment.md CLI environment profile — binary path, version, flags, timeout method

Generated by cli-agent-report · CLI Agent Spec