53 critical credential expiry
Part VII: Ecosystem, Runtime & Agent-Specific | Challenge §53
53. Credential Expiry Mid-Session
Severity: Critical | Frequency: Common | Detectability: Hard | Token Spend: High | Time: High | Context: Low
The Problem
Agents often operate over sessions longer than credential lifetimes. Short-lived IAM roles (15min), JWTs (1hr), API keys (rotated by policy), and OAuth access tokens (1hr) may expire mid-task. When they do, commands start failing with 401/403, but error messages typically say "unauthorized" or "access denied" — identical to "never had permission" errors. The agent cannot distinguish "credential was never valid" (permanent failure requiring escalation) from "credential expired" (transient failure recoverable by refresh).
# Session starts — IAM role valid for 15 minutes
$ tool deploy --env prod
{"ok": true, "data": {"job_id": "dep_123"}}
# 16 minutes later
$ tool job status dep_123
{"ok": false, "error": {"code": "FORBIDDEN", "message": "Access denied"}}
# Is this: token expired? wrong permissions? job doesn't exist?
# Agent cannot tell. May retry (burns time), may abort (wrong call).
# Worse: cascade failure — every subsequent call also fails
$ tool list-resources
{"ok": false, "error": {"code": "UNAUTHORIZED", "message": "Invalid credentials"}}
Impact
- Agent retries indefinitely on a non-retryable permanent permission denial
- Agent abandons a recoverable expired-token situation that just needs a refresh
- Cascading failures across all subsequent commands in the session
- Agent may attempt rollback with the same expired credential, causing rollback to also fail
Solutions
Auth errors MUST distinguish expiry from permission denial:
{
"ok": false,
"error": {
"code": "CREDENTIALS_EXPIRED",
"message": "Access token expired at 2024-03-11T14:15:00Z.",
"expired": true,
"expired_at": "2024-03-11T14:15:00Z",
"retryable": true,
"reauth_command": "tool auth refresh",
"reauth_env_var": "TOOL_TOKEN"
}
}
For framework design:
- Add exit 10 to the standard exit code table: 10 = credentials expired (retryable with refresh). Exit 8 = permanent permission denied
- Framework MUST intercept HTTP 401/403 responses and attempt to classify expiry vs permission denial before surfacing the error
- error.reauth_command is a mandatory field for all auth errors — the exact command to run to recover credentials
Evaluation
| Score | Condition |
|---|---|
| 0 | Expired credentials produce FORBIDDEN or UNAUTHORIZED identical to permission denial; no way to distinguish; no reauth hint |
| 1 | Error message mentions "expired" in human-readable text; no structured expired field; reauth_command absent |
| 2 | CREDENTIALS_EXPIRED code distinct from PERMISSION_DENIED; expired_at field present; reauth_command provided |
| 3 | Exit code 10 (expiry) distinct from exit 8 (permanent denied); retryable: true on expiry errors; reauth_env_var listed |
Check: Let credentials expire (or mock expiry) and run any authenticated command — verify the response contains "code": "CREDENTIALS_EXPIRED" (not FORBIDDEN) and a reauth_command field.
Agent Workaround
Distinguish CREDENTIALS_EXPIRED from permanent auth failures; auto-refresh when reauth_command is provided:
import subprocess, json, os
CREDENTIAL_EXPIRY_CODES = {"CREDENTIALS_EXPIRED", "AUTH_EXPIRED", "TOKEN_EXPIRED"}
PERMANENT_AUTH_CODES = {"PERMISSION_DENIED", "FORBIDDEN", "UNAUTHORIZED"}
def run_with_auth_retry(cmd: list[str], max_auth_retries: int = 1) -> dict:
for attempt in range(max_auth_retries + 1):
result = subprocess.run(cmd, capture_output=True, text=True)
try:
parsed = json.loads(result.stdout)
except json.JSONDecodeError:
raise RuntimeError(f"No JSON output: {result.stdout[:200]}")
if parsed.get("ok"):
return parsed
error = parsed.get("error", {})
code = error.get("code", "")
if code in CREDENTIAL_EXPIRY_CODES and attempt < max_auth_retries:
reauth_cmd = error.get("reauth_command")
reauth_env = error.get("reauth_env_var")
if reauth_cmd:
# Run the reauth command
reauth_result = subprocess.run(
reauth_cmd.split(), capture_output=True, text=True
)
if reauth_result.returncode == 0:
continue # retry the original command
elif reauth_env:
raise RuntimeError(
f"Credentials expired. Re-set {reauth_env} to refresh."
)
raise RuntimeError(f"Credentials expired and no reauth path available: {error}")
if code in PERMANENT_AUTH_CODES:
raise PermissionError(f"Permanent auth failure [{code}]: {error.get('message')}")
raise RuntimeError(f"Command failed: {parsed}")
raise RuntimeError("Auth retry limit reached")
Limitation: If the tool does not distinguish expiry from permission denial (both use FORBIDDEN or UNAUTHORIZED), the agent cannot safely auto-retry — check the expired_at field if available; if absent, treat all 401/403 as non-retryable to avoid infinite retry loops