gws — Issues Report
Generated: 2026-05-14 CLI version: 0.17.0 Scope: Critical (22 failure modes) Findings in scope: 22 failure modes
Observed Bugs (from evaluation notes)
§1 candidate — Auth failures exit 0 on list commands
Discovered during: §1 evaluation — 2026-05-14
Symptom: gws drive files list returns a JSON auth error body but exits with code 0. gws drive files get correctly exits 2 on the same auth error. The inconsistency is per-command.
Impact: Agent has no failure signal from exit code on list operations. An agent that branches on exit code will treat a failing list call as success and proceed with an empty or missing result.
Trigger: gws drive files list --params '{"pageSize":1}' with expired credentials
§53 — Credential expiry indistinguishable from permission denial
Discovered during: §53 evaluation — 2026-05-14
Symptom: Both expired token (invalid_rapt) and permanent permission denial return {"error":{"code":401,"reason":"authError"}}. The only distinction is buried in the message field as a long OAuth error string.
Impact: Agent cannot safely decide whether to retry (expiry = transient) or abort (denial = permanent). Blind retry on permanent denial loops indefinitely. Blind abort on expiry abandons a recoverable task.
Trigger: gws drive files list with an expired OAuth token (invalid_rapt state)
§45 candidate — Missing AUTH_REQUIRED structured error
Discovered during: §45 evaluation — 2026-05-14
Symptom: Running without credentials returns reason: "authError" with no auth_methods array. The agent cannot programmatically determine how to authenticate — it must read the error message prose.
Impact: Agent cannot self-recover from a missing-credentials state. No machine-readable guidance on which env var to set.
Trigger: GOOGLE_WORKSPACE_CLI_TOKEN="" GOOGLE_WORKSPACE_CLI_CREDENTIALS_FILE="" gws drive files list < /dev/null
§43 candidate — No response body size limit
Discovered during: §43 evaluation — 2026-05-14
Symptom: --page-limit caps pagination page count but not the size of individual response bodies. A single large document, email, or spreadsheet is returned in full.
Impact: A single gws docs documents get on a large document can return hundreds of kilobytes — overflowing the agent's context window with a single call.
Trigger: gws docs documents get --params '{"documentId":"<large-doc-id>"}'
§11 candidate — No timeout mechanism
Discovered during: §11 evaluation — 2026-05-14
Symptom: --timeout flag does not exist (returns validation error). Network hangs (unreachable API, DNS timeout) block indefinitely.
Impact: A single hung gws call blocks the agent's entire pipeline until the OS TCP timeout (up to 2 minutes) fires — with no structured error, no JSON output, and exit code from the OS kill signal.
Trigger: gws drive files list --timeout 1 → {"error":{"code":400,"message":"error: unexpected argument '--timeout' found..."}}
§42 candidate — ANSI codes in debug stderr
Discovered during: §42 evaluation — 2026-05-14
Symptom: GOOGLE_WORKSPACE_CLI_LOG=gws=debug emits ANSI escape sequences (\x1b[2m, \x1b[34m, etc.) to stderr. Agents capturing stderr for error parsing receive polluted output.
Impact: Agents that parse stderr to extract error messages will receive ANSI-polluted strings. Pattern matching on error codes breaks. ANSI stripping adds complexity.
Trigger: GOOGLE_WORKSPACE_CLI_LOG=gws=debug gws drive files list < /dev/null
§64 — No headless auth alternative for gws auth login
Discovered during: §64 evaluation — 2026-05-14
Symptom: gws auth login explicitly opens a browser ("Authenticate via OAuth2 (opens browser)"). No --print-url, --no-browser, or device-code flow alternative.
Impact: Agent cannot perform initial authentication without human intervention. There is no in-band path for headless token acquisition. Agents must be pre-seeded with GOOGLE_WORKSPACE_CLI_TOKEN set externally before first use.
Trigger: gws auth --help → login Authenticate via OAuth2 (opens browser)
Failure-Mode Gaps (score 0–2, sorted: score asc, severity desc)
§11 — Timeouts & Hanging Processes [Critical · 0/3]
What fails: Network hangs block indefinitely — the agent receives no output and no timeout error; the pipeline stalls until OS TCP timeout (~2 minutes).
Frequency: Common
Token/time cost when it triggers: Token Spend: High · Time: Critical
Workaround exists: Partial (external timeout via perl alarm or subprocess.run(timeout=N) — does not produce structured JSON error)
§13 — Partial Failure & Atomicity [Critical · 0/3]
What fails: Workflow commands (gws workflow +standup-report, etc.) fail with a generic error and no indication of which steps completed — agents retry the entire workflow and duplicate side effects.
Frequency: Common
Token/time cost when it triggers: Token Spend: High · Time: High
Workaround exists: Partial (decompose workflow into individual calls; track state manually in agent)
§25 — Prompt Injection via Output [Critical · 0/3]
What fails: Email bodies, document content, and file names are returned as raw untagged strings in the JSON response — LLMs consuming this output may execute injected instructions from external data. Frequency: Situational Token/time cost when it triggers: Token Spend: High · Time: High Workaround exists: Partial (manually extract fields; never route raw external content to LLM — but no structural guarantee)
§53 — Credential Expiry Mid-Session [Critical · 0/3]
What fails: When credentials expire mid-session, every subsequent call returns reason:"authError" — identical to permanent denial — and the agent cannot distinguish retryable expiry from permanent failure.
Frequency: Common
Token/time cost when it triggers: Token Spend: High · Time: High
Workaround exists: Partial (inspect message string for invalid_rapt/invalid_grant patterns; treat as potentially retriable — but cannot auto-refresh without human browser interaction)
§1 — Exit Codes & Status Signaling [Critical · 1/3]
What fails: Agent branches on exit code and treats list commands with auth errors as success (exit 0), proceeding with an empty result as if the call succeeded.
Frequency: Very Common
Token/time cost when it triggers: Token Spend: High · Time: High
Workaround exists: Yes (always parse stdout JSON; check for "error" key regardless of exit code)
§2 — Output Format & Parseability [Critical · 1/3]
What fails: No top-level ok/data/meta envelope — agent must handle two different structures (raw API JSON on success; {error:{...}} on failure) with no common discriminator field.
Frequency: Very Common
Token/time cost when it triggers: Token Spend: Medium · Time: Medium
Workaround exists: Yes (normalize in wrapper: ok = "error" not in data)
§12 — Idempotency & Safe Retries [Critical · 1/3]
What fails: Retrying a failed email send or calendar event creation causes duplicates — no idempotency key and no effect: "noop" signal to detect second writes.
Frequency: Common
Token/time cost when it triggers: Token Spend: High · Time: High
Workaround exists: Partial (read-before-write pattern; track operation IDs in agent state — subject to TOCTOU race)
§23 — Side Effects & Destructive Operations [Critical · 1/3]
What fails: --dry-run validates params locally but does not return the affected scope — agent cannot confirm what would be deleted before executing.
Frequency: Common
Token/time cost when it triggers: Token Spend: Medium · Time: High
Workaround exists: Partial (fetch resource metadata before destructive call; confirm identity manually)
§34 — Shell Injection via Agent-Constructed Commands [Critical · 1/3]
What fails: LLM-generated values passed to --params with metacharacters (../../, %2F, ?) reach the Google API without CLI-level validation — may cause unexpected API behavior or data access.
Frequency: Common
Token/time cost when it triggers: Token Spend: High · Time: Medium
Workaround exists: Yes (validate params with regex before passing; use exec-array form — never shell=True)
§42 — Debug / Trace Mode Secret Leakage [Critical · 1/3]
What fails: Enabling GOOGLE_WORKSPACE_CLI_LOG=gws=debug produces ANSI-polluted stderr; no guaranteed redaction of token values in debug output.
Frequency: Situational
Token/time cost when it triggers: Token Spend: Low · Time: Low
Workaround exists: Yes (never enable debug logging in production agent code; strip ANSI from stderr if captured)
§43 — Tool Output Result Size Unboundedness [Critical · 1/3]
What fails: A single gws docs documents get or gws gmail users messages get on a large document returns the entire content — overflowing the agent's context window.
Frequency: Common
Token/time cost when it triggers: Token Spend: Critical · Time: High
Workaround exists: Partial (use fields param to limit response; manually truncate body fields — no gws-native size limit)
§45 — Headless Authentication / OAuth Browser Flow Blocking [Critical · 1/3]
What fails: Missing credentials exit with code 0 and no AUTH_REQUIRED code — agent has no machine-readable signal to distinguish "never authenticated" from other errors, and no auth_methods to guide recovery.
Frequency: Common
Token/time cost when it triggers: Token Spend: High · Time: Critical
Workaround exists: Partial (pre-check with gws auth status; inject GOOGLE_WORKSPACE_CLI_TOKEN before any call)
§60 — OS Output Buffer Deadlock [Critical · 1/3]
What fails: No heartbeat on long-running workflow calls — agent cannot tell if a multi-second workflow call is running, stuck, or crashed until it exits. Frequency: Common Token/time cost when it triggers: Token Spend: High · Time: Critical Workaround exists: Partial (external timeout with progress logging thread — no step-level visibility)
§64 — Headless Display and GUI Launch Blocking [Critical · 1/3]
What fails: gws auth login opens a browser; no --print-url alternative means agent cannot perform initial authentication headlessly.
Frequency: Common
Token/time cost when it triggers: Token Spend: High · Time: Critical
Workaround exists: Partial (pre-seed GOOGLE_WORKSPACE_CLI_TOKEN from external token service; no in-band headless auth path)
§71 — Non-Interactive Installation Absence [Critical · 1/3]
What fails: No AGENTS.md documents the install command; agents bootstrapping a fresh environment may use the wrong formula (gws vs googleworkspace-cli) or skip the verify step.
Frequency: Common
Token/time cost when it triggers: Token Spend: Low · Time: Critical
Workaround exists: Yes (brew install googleworkspace-cli && gws --version)
§74 — Credential Scope Declaration Absence [Critical · 1/3]
What fails: gws schema returns all possible OAuth scopes (up to 8 per method) but not the minimal required set — agent cannot create a minimally-scoped credential for a specific workflow.
Frequency: Common
Token/time cost when it triggers: Token Spend: Low · Time: Medium
Workaround exists: Partial (use gws auth login -s <service> --readonly to limit scopes at login time; no per-command required_scopes available)
§10 — Interactivity & TTY Requirements [Critical · 2/3]
What fails: No --non-interactive flag — future commands that add interactive prompts would hang on stdin=DEVNULL without a guaranteed non-interactive mode.
Frequency: Common
Token/time cost when it triggers: Token Spend: High · Time: Critical
Workaround exists: Yes (always pass stdin=subprocess.DEVNULL; never call gws auth login from agent code)
§24 — Authentication & Secret Handling [Critical · 2/3]
What fails: No --secret-from-file for container environments where env vars are harder to manage than mounted secrets files.
Frequency: Common
Token/time cost when it triggers: Token Spend: Medium · Time: Medium
Workaround exists: Yes (read secrets file at runtime and inject as env var: env={"GOOGLE_WORKSPACE_CLI_TOKEN": open(path).read().strip()})
§61 — Bidirectional Pipe Payload Deadlock [Critical · 2/3]
What fails: Large --json request bodies (batch spreadsheet updates, large JSON payloads) may exceed OS argument limits — no --json-file alternative.
Frequency: Situational
Token/time cost when it triggers: Token Spend: High · Time: Critical
Workaround exists: Partial (keep --json payloads under 64KB; batch large operations manually)
Passing (score 3/3 — safe to use without special handling)
§37 REPL / Interactive Mode Accidental Triggering, §50 Stdin Consumption Deadlock, §62 $EDITOR and $VISUAL Trap
Risk Summary
| Category | Count | §N list |
|---|---|---|
| Observed bugs | 7 | §1, §53, §45, §43, §11, §42, §64 |
| Score 0 — complete failure | 4 | §11, §13, §25, §53 |
| Score 1 — major gap | 12 | §1, §2, §12, §23, §34, §42, §43, §45, §60, §64, §71, §74 |
| Score 2 — minor gap | 3 | §10, §24, §61 |
| Score 3 — passing | 3 | §37, §50, §62 |
| Indeterminate (?/3 — timed out) | 0 | — |
Highest-risk combination: §53 (credential expiry indistinguishable from permanent denial) combined with §11 (no timeout) means a mid-session expiry causes every subsequent call to fail ambiguously, and any hung call has no time-bound — a cascading failure with no recovery signal.