gws — CLI Agent Evaluation
Evaluated against the CLI Agent Spec — a specification defining 71 failure modes for CLI tools used under AI agent orchestration.
CLI version: 0.17.0 Evaluated: 2026-05-14 Scope: Critical (22 of 71 failure modes)
Scores
| Metric | Result |
|---|---|
| Failure mode score | 1.23/3 — 3 passing · 15 partial · 4 failing |
| Observed bugs | 7 confirmed during live evaluation |
| Worst gaps | §53 Credential Expiry (0/3), §11 Timeouts (0/3), §25 Prompt Injection (0/3), §13 Partial Failure (0/3) |
Key Findings
- Credential expiry returns the same error as permanent permission denial — agents cannot distinguish retryable from fatal auth failures, causing either silent retry loops or premature task abandonment (§53)
- No
--timeoutflag — a single hung API call blocks the entire agent pipeline indefinitely until OS TCP timeout fires, with no structured error or exit code (§11) - Auth errors on list commands exit 0 — agents that branch on exit code treat a failed call as success and proceed with an empty or missing result (§1)
- Email bodies, document content, and file names are returned as raw untagged strings — LLMs consuming gws output are exposed to prompt injection from external data with no structural protection (§25)
gws auth loginopens a browser with no--print-urlalternative — headless agents must obtain OAuth tokens externally before first use (§64)
Files
| File | What it is |
|---|---|
| report-index.md | Full scorecard — all failure modes, links to all reports |
| report-issues.md | Concrete bugs and gaps agents will hit when using gws as-is |
| report-runtime.md | Compact operational brief — what to set, what to avoid, what to watch for |
| report-agent-dev.md | Integration guide — invocation invariants and per-gap workarounds for agent developers |
| report-dev.md | Fix list for gws authors — what to implement, mapped to spec requirements |
| findings.md | Raw scorecard — one row per evaluated failure mode |
| issues.md | Observed bugs recorded during live evaluation |
| trace.md | Audit trail — exact check commands, exit codes, stdout/stderr per §N |
| environment.md | CLI environment profile — binary path, version, flags, timeout method |
Generated by cli-agent-report · CLI Agent Spec