44 medium knowledge packaging

Part VII: Ecosystem, Runtime & Agent-Specific | Challenge §44

44. Agent Knowledge Packaging Absence

The Problem

Agents consuming a CLI tool have two information sources: the tool's --help text (or --schema if available) and any contextual documentation. JSON Schema describes what arguments a tool accepts; it cannot describe when to use it, what order to run commands in, which combinations are dangerous, or what the known gotchas are in the specific deployment environment.

The absence of agent-facing documentation (AGENTS.md, CONTEXT.md, skill files) means agents must rely on: 1. General training knowledge about the tool (often outdated or wrong for non-default configurations) 2. Trial-and-error (expensive in tokens, time, and potential side effects) 3. Inferring behavioral constraints from --help text that was written for human readers

The jpoehnelt rubric makes this a scored axis (Axis 7), with four levels ranging from "only --help" (score 0) to "comprehensive skill library with versioned, discoverable skill files" (score 3). No other framework or specification in the research treats agent knowledge packaging as a first-class concern.

Specifically absent from most tools: - Which subcommand to use for a given task ("use user invite, not user create") - What must be set up before the tool will work ("requires auth login to have been run") - Which flags are dangerous and require --dry-run first - Common error patterns and their solutions - Rate limits and retry guidance specific to this tool - The difference between staging and production behavior

# Example: a tool with a known gotcha, but no AGENTS.md
$ my-tool deploy --env production
Error: invalid token  # What token? Which env var? Why now?
# Agent retries with different args, can't find solution
# Human: "Oh, you need to run `my-tool auth refresh` first after 8 hours"
# This is not in --help; it's not in the schema; only in an internal wiki

Impact

Agent fails silently on tasks it could complete with appropriate context
Agents make dangerous operations without understanding that --dry-run should be run first
Significant token spend on trial-and-error exploration of tool behavior
Agents learn incorrect generalizations from failed invocations (e.g., "this tool requires --force to work" when the real issue was a missing prerequisite)
Knowledge locked in human documentation (READMEs, wikis, runbooks) is invisible to agents
Without versioned skill files, agents may use outdated guidance after tool updates

Solutions

Minimum viable AGENTS.md:

# AGENTS.md

## Quick Reference
- Deploy: `my-tool deploy --env <staging|production> --dry-run` (always dry-run first)
- Auth: `my-tool auth login` must be run before any other command; tokens expire in 8 hours
- Status check: `my-tool status --json` returns current system health

## Known Gotchas
- If you see "invalid token", run `my-tool auth refresh` (tokens expire every 8 hours)
- `deploy` to production requires `--confirm` flag; staging does not
- The `--region` flag defaults to us-east-1 in CI, eu-west-1 locally

## Safe Operations
Read-only: `list`, `get`, `status`, `logs`
Mutating: `deploy`, `delete`, `update` (run with --dry-run first)
Irreversible: `delete --permanent` (no dry-run available)

OpenClaw skill file with machine-readable metadata:

---
name: my-tool
version: "1.0.0"
triggers:
  - "deploy to production"
  - "my-tool"
tools: [bash]
---
[Skill body with agent-specific guidance]

For framework design: - Auto-generate a minimal AGENTS.md template from schema metadata at my-tool --generate-agents-md - Include in --schema output: "danger_level", "requires" (prerequisite commands), "read_only", and "docs_url" fields - Provide a CLI hook to load and display skill files: my-tool --skill returns the tool's OpenClaw skill - Score frameworks against Axis 7 and require at least level 1 (CONTEXT.md or AGENTS.md present) before an "agent-ready" designation

Evaluation

Score	Condition
0	No AGENTS.md or CONTEXT.md; no machine-readable prerequisite or danger level; agent must discover behavior through trial-and-error
1	README mentions prerequisites; no structured machine-readable format; `--schema` lacks `danger_level` and `requires` fields
2	AGENTS.md or CONTEXT.md present with quick reference, known gotchas, and safe vs. mutating operation list
3	`--schema` includes `danger_level`, `requires`, `read_only`, `docs_url`; versioned skill file available via `--skill`; auto-generated from schema metadata

Check: Run tool --schema --output json and verify at least one command has a danger_level field and a requires array listing prerequisite commands.

Agent Workaround

Read AGENTS.md before first use; extract danger_level and requires from schema for safe operation planning:

import subprocess, json, os

def load_agent_knowledge(tool: str, tool_dir: str | None = None) -> dict:
    knowledge = {"prereqs": [], "dangerous_commands": [], "safe_commands": []}

    # Check for AGENTS.md in tool's directory or current dir
    for search_dir in filter(None, [tool_dir, os.getcwd()]):
        agents_md = os.path.join(search_dir, "AGENTS.md")
        if os.path.exists(agents_md):
            with open(agents_md) as f:
                knowledge["agents_md"] = f.read()
            break

    # Extract structured knowledge from schema
    result = subprocess.run(
        [tool, "--schema", "--output", "json"],
        capture_output=True, text=True,
    )
    try:
        schema = json.loads(result.stdout)
        for cmd in schema.get("commands", []):
            name = cmd["name"]
            danger = cmd.get("danger_level", "unknown")
            requires = cmd.get("requires", [])
            if requires:
                knowledge["prereqs"].extend(requires)
            if danger in ("mutating", "destructive"):
                knowledge["dangerous_commands"].append(name)
            elif danger in ("read_only", "safe"):
                knowledge["safe_commands"].append(name)
    except (json.JSONDecodeError, KeyError):
        pass

    return knowledge

knowledge = load_agent_knowledge("tool")
# Run prerequisites before starting work
for prereq in knowledge["prereqs"]:
    subprocess.run(["tool"] + prereq.split(), capture_output=True)

Limitation: If the tool has no AGENTS.md and no danger_level in schema, the agent must infer safety from command name patterns (get/list/show = read, create/update/delete = mutating) — always run with --dry-run first for any mutating operation and verify an explicit "effect" field before proceeding