Skip to content

59 high high entropy tokens

Part VII: Ecosystem, Runtime & Agent-Specific | Challenge §59

59. High-Entropy String Token Poisoning

Source: Gemini 02_output_context.md (RA)

Severity: High | Frequency: Common | Detectability: Medium | Token Spend: High | Time: Low | Context: High

The Problem

JWTs, API keys, UUIDs, base64 blobs, and cryptographic hashes in tool output consume hundreds of LLM tokens each — yet provide zero useful signal to the agent. A single Authorization: Bearer eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9... in a debug dump wastes 200–400 tokens on an opaque string the agent cannot interpret. Over a session with dozens of tool calls, high-entropy fields silently consume a significant fraction of the context budget.

$ tool auth token --show
{
  "token": "eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJ1c2VyXzEyMyIsImlhdCI6MTcxMDAwMDAwMCwiZXhwIjoxNzEwMDAzNjAwfQ.SflKxwRJSMeKKF2QT4fwpMeJf36POk6yJV_adQssw5c",
  "expires_at": "2024-03-11T15:00:00Z"
}
# The JWT consumes ~300 tokens. The agent only needed "expires_at".

Worse: the same high-entropy strings appear across multiple tool responses (resource IDs, session tokens, correlation IDs), each adding wasteful repetition to the context window.

Impact

  • Context budget eroded silently by opaque strings the agent cannot reason about
  • Long tokens increase per-call API cost directly
  • Repeated high-entropy fields across multiple calls fill context with noise
  • Agents may attempt to reason about or pattern-match JWT segments, wasting reasoning tokens

Solutions

Auto-mask high-entropy fields in structured output:

{
  "token": "[JWT: expires 2024-03-11T15:00:00Z, sub=user_123]",
  "token_raw": "<available via: tool auth token --show --unmask>"
}

Schema marks fields as high_entropy: true:

{ "name": "token", "type": "string", "high_entropy": true, "mask_in_output": true }

Framework detects high-entropy strings automatically: - Strings matching ^[A-Za-z0-9+/]{40,}={0,2}$ (base64) or JWT pattern (xxx.yyy.zzz) are masked unless --unmask is passed - Instead of the raw value, output: entropy type, meaningful metadata extracted from the payload (expiry, subject), and the flag to retrieve the raw value

For framework design: - Framework MUST provide a high_entropy field type with automatic masking in non---unmask mode - The mask replacement MUST include the semantic metadata from the string (JWT: expiry + claims summary; UUID: just the ID truncated; API key: first 8 chars + ...) - --unmask flag explicitly opts into showing raw high-entropy values

Evaluation

Score Condition
0 JWTs, API keys, and base64 blobs returned verbatim in all output; no masking; high token cost per call
1 Some fields marked sensitive and omitted; no semantic replacement showing expiry or subject
2 High-entropy fields replaced with semantic summaries (e.g., [JWT: expires 2024-03-11, sub=user_123]); --unmask available
3 high_entropy: true declared in schema; automatic detection of base64/JWT patterns; masking applied by default without explicit declaration

Check: Run tool auth token --show --output json and verify the token field contains a semantic summary (not the full JWT), with the full value accessible via --unmask.


Agent Workaround

Extract only the semantic metadata the agent needs; request --unmask only when the raw value is operationally required:

import subprocess, json, base64, re

def decode_jwt_claims(token: str) -> dict:
    """Extract claims from a JWT without verification — for metadata only."""
    try:
        parts = token.split(".")
        if len(parts) != 3:
            return {}
        # Pad base64 to multiple of 4
        payload = parts[1] + "=" * (4 - len(parts[1]) % 4)
        claims = json.loads(base64.urlsafe_b64decode(payload))
        return {"sub": claims.get("sub"), "exp": claims.get("exp")}
    except Exception:
        return {}

# When the tool returns a raw JWT, extract only what the agent needs
result = subprocess.run(
    ["tool", "auth", "token", "--show", "--output", "json"],
    capture_output=True, text=True,
)
parsed = json.loads(result.stdout)
token = parsed.get("data", {}).get("token", "")

if token.startswith("eyJ"):
    # It's a raw JWT — extract only the expiry
    claims = decode_jwt_claims(token)
    expiry = claims.get("exp")
    print(f"Token expiry: {expiry} (not storing full JWT in context)")
    # Store only the expiry and whether we have a token; not the token itself
    parsed["data"]["token"] = f"[JWT: exp={expiry}]"
    parsed["data"]["token_available"] = True

Limitation: If the tool returns raw JWTs or API keys without masking and there is no --unmask flag (meaning they are always returned in full), extract only the fields the agent needs and discard the high-entropy value immediately after use — do not store it in variables that persist across many tool calls