Skip to content

53 critical credential expiry

Part VII: Ecosystem, Runtime & Agent-Specific | Challenge §53

53. Credential Expiry Mid-Session

Severity: Critical | Frequency: Common | Detectability: Hard | Token Spend: High | Time: High | Context: Low

The Problem

Agents often operate over sessions longer than credential lifetimes. Short-lived IAM roles (15min), JWTs (1hr), API keys (rotated by policy), and OAuth access tokens (1hr) may expire mid-task. When they do, commands start failing with 401/403, but error messages typically say "unauthorized" or "access denied" — identical to "never had permission" errors. The agent cannot distinguish "credential was never valid" (permanent failure requiring escalation) from "credential expired" (transient failure recoverable by refresh).

# Session starts — IAM role valid for 15 minutes
$ tool deploy --env prod
{"ok": true, "data": {"job_id": "dep_123"}}

# 16 minutes later
$ tool job status dep_123
{"ok": false, "error": {"code": "FORBIDDEN", "message": "Access denied"}}
# Is this: token expired? wrong permissions? job doesn't exist?
# Agent cannot tell. May retry (burns time), may abort (wrong call).

# Worse: cascade failure — every subsequent call also fails
$ tool list-resources
{"ok": false, "error": {"code": "UNAUTHORIZED", "message": "Invalid credentials"}}

Impact

  • Agent retries indefinitely on a non-retryable permanent permission denial
  • Agent abandons a recoverable expired-token situation that just needs a refresh
  • Cascading failures across all subsequent commands in the session
  • Agent may attempt rollback with the same expired credential, causing rollback to also fail

Solutions

Auth errors MUST distinguish expiry from permission denial:

{
  "ok": false,
  "error": {
    "code": "CREDENTIALS_EXPIRED",
    "message": "Access token expired at 2024-03-11T14:15:00Z.",
    "expired": true,
    "expired_at": "2024-03-11T14:15:00Z",
    "retryable": true,
    "reauth_command": "tool auth refresh",
    "reauth_env_var": "TOOL_TOKEN"
  }
}

For framework design: - Add exit 10 to the standard exit code table: 10 = credentials expired (retryable with refresh). Exit 8 = permanent permission denied - Framework MUST intercept HTTP 401/403 responses and attempt to classify expiry vs permission denial before surfacing the error - error.reauth_command is a mandatory field for all auth errors — the exact command to run to recover credentials

Evaluation

Score Condition
0 Expired credentials produce FORBIDDEN or UNAUTHORIZED identical to permission denial; no way to distinguish; no reauth hint
1 Error message mentions "expired" in human-readable text; no structured expired field; reauth_command absent
2 CREDENTIALS_EXPIRED code distinct from PERMISSION_DENIED; expired_at field present; reauth_command provided
3 Exit code 10 (expiry) distinct from exit 8 (permanent denied); retryable: true on expiry errors; reauth_env_var listed

Check: Let credentials expire (or mock expiry) and run any authenticated command — verify the response contains "code": "CREDENTIALS_EXPIRED" (not FORBIDDEN) and a reauth_command field.


Agent Workaround

Distinguish CREDENTIALS_EXPIRED from permanent auth failures; auto-refresh when reauth_command is provided:

import subprocess, json, os

CREDENTIAL_EXPIRY_CODES = {"CREDENTIALS_EXPIRED", "AUTH_EXPIRED", "TOKEN_EXPIRED"}
PERMANENT_AUTH_CODES = {"PERMISSION_DENIED", "FORBIDDEN", "UNAUTHORIZED"}

def run_with_auth_retry(cmd: list[str], max_auth_retries: int = 1) -> dict:
    for attempt in range(max_auth_retries + 1):
        result = subprocess.run(cmd, capture_output=True, text=True)
        try:
            parsed = json.loads(result.stdout)
        except json.JSONDecodeError:
            raise RuntimeError(f"No JSON output: {result.stdout[:200]}")

        if parsed.get("ok"):
            return parsed

        error = parsed.get("error", {})
        code = error.get("code", "")

        if code in CREDENTIAL_EXPIRY_CODES and attempt < max_auth_retries:
            reauth_cmd = error.get("reauth_command")
            reauth_env = error.get("reauth_env_var")
            if reauth_cmd:
                # Run the reauth command
                reauth_result = subprocess.run(
                    reauth_cmd.split(), capture_output=True, text=True
                )
                if reauth_result.returncode == 0:
                    continue   # retry the original command
            elif reauth_env:
                raise RuntimeError(
                    f"Credentials expired. Re-set {reauth_env} to refresh."
                )
            raise RuntimeError(f"Credentials expired and no reauth path available: {error}")

        if code in PERMANENT_AUTH_CODES:
            raise PermissionError(f"Permanent auth failure [{code}]: {error.get('message')}")

        raise RuntimeError(f"Command failed: {parsed}")

    raise RuntimeError("Auth retry limit reached")

Limitation: If the tool does not distinguish expiry from permission denial (both use FORBIDDEN or UNAUTHORIZED), the agent cannot safely auto-retry — check the expired_at field if available; if absent, treat all 401/403 as non-retryable to avoid infinite retry loops