Skip to content

Skills — Technical Reference

Ten installable agent skills for evaluating CLIs, implementing the spec, and diagnosing failures at the tool-execution layer. Compatible with any Agent Skills-enabled agent: Claude Code, Cursor, Gemini CLI, Copilot, and others.


Skill Inventory

Skill Category Purpose
cli-agent-onboard Evaluation pipeline Profile a CLI tool — detects runtime, binary, non-interactive flags, timeout method
cli-agent-evaluate Evaluation pipeline Score a CLI against one failure mode (0–3), return applicable agent workaround
cli-agent-evaluate-batch Evaluation pipeline Evaluate a CLI across multiple failure modes in one resumable run
cli-agent-readiness Evaluation pipeline Score proactive agent-readiness across 5 dimensions (0–15)
cli-agent-report Evaluation pipeline Transform findings into perspective-specific reports for 4 audiences
cli-agent-audit Evaluation pipeline Autonomous end-to-end pipeline: install → onboard → readiness → evaluate → report
cli-agent-implement Implementation Guide implementing the spec in a CLI framework, tier by tier
cli-agent-diagnose Diagnosis Classify a failed agent CLI call against the §N taxonomy; return workaround + memory string
fixlayer-report Diagnosis Render a self-contained HTML audit report from a tool-execution trace file
validate-links Spec maintenance Validate cross-links and schema↔requirement symmetry in the spec

Skill Relations

Call graph

Skills invoke other skills automatically or on demand. The arrows show delegation, not data flow (data flow is covered in Artifact Flow below).

cli-agent-audit
  ├─→ cli-agent-onboard          (Phase 2 — always)
  ├─→ cli-agent-readiness        (Phase 3 — skippable with --skip-readiness)
  ├─→ cli-agent-evaluate-batch   (Phase 4)
  └─→ cli-agent-report           (Phase 5, mode=all)

cli-agent-evaluate
  └─→ cli-agent-onboard          (Step 0 — auto-invoked if environment.md missing)

cli-agent-evaluate-batch
  └─→ cli-agent-onboard          (Step 0 — auto-invoked if environment.md missing)

cli-agent-readiness
  └─→ cli-agent-onboard          (Step 0 — auto-invoked if environment.md missing)

fixlayer-report
  └─→ cli-agent-diagnose         (scripts/diagnose.py --explain)

cli-agent-implement, cli-agent-diagnose, and validate-links have no skill-level dependencies.


Artifact flow

All evaluation artifacts live under evaluations/<cli-name>/. Skills read and write specific files; no skill reads another skill's files at runtime (they share the directory, not live state).

cli-agent-onboard
  writes → evaluations/<cli>/environment.md

cli-agent-evaluate  /  cli-agent-evaluate-batch
  reads  → evaluations/<cli>/environment.md
  writes → evaluations/<cli>/findings.md
  writes → evaluations/<cli>/issues.md
  writes → evaluations/<cli>/trace.md

cli-agent-readiness
  reads  → evaluations/<cli>/environment.md
  writes → evaluations/<cli>/readiness.md

cli-agent-report (mode=all)
  reads  → evaluations/<cli>/findings.md
  reads  → evaluations/<cli>/issues.md
  reads  → evaluations/<cli>/trace.md
  reads  → evaluations/<cli>/readiness.md       (optional)
  reads  → evaluations/<cli>/environment.md
  writes → evaluations/<cli>/report-dev.md
  writes → evaluations/<cli>/report-agent-dev.md
  writes → evaluations/<cli>/report-runtime.md
  writes → evaluations/<cli>/report-issues.md
  writes → evaluations/<cli>/report-index.md
  writes → evaluations/<cli>/README.md
  writes → evaluations/<cli>/linkedin.md        (gitignored)
  writes → evaluations/<cli>/x.md               (gitignored)
  writes → docs/evaluations/<cli>/.pages

Single-mode cli-agent-report runs (dev, agent-dev, runtime, issues) emit output to the conversation only — they never write files.


Artifact reference table

File Produced by Consumed by
environment.md cli-agent-onboard cli-agent-evaluate, cli-agent-evaluate-batch, cli-agent-readiness, cli-agent-report
findings.md cli-agent-evaluate, cli-agent-evaluate-batch cli-agent-report
issues.md cli-agent-evaluate, cli-agent-evaluate-batch cli-agent-report (modes: issues, all)
trace.md cli-agent-evaluate, cli-agent-evaluate-batch cli-agent-report (modes: issues, all)
readiness.md cli-agent-readiness cli-agent-report (modes: all)
report-dev.md cli-agent-report humans — CLI authors
report-agent-dev.md cli-agent-report humans — agent builders
report-runtime.md cli-agent-report AI agents at runtime
report-issues.md cli-agent-report agent users — concrete bug list
report-index.md cli-agent-report humans — navigation
README.md cli-agent-report humans — entry point
linkedin.md cli-agent-report humans — social post draft
x.md cli-agent-report humans — social post draft

Skill Descriptions

cli-agent-onboard

Role: foundation. Every evaluation skill reads environment.md before running any check. Run once per CLI per machine; re-run with --force to refresh.

Steps: reads agent docs (AGENTS.md, CODING_AGENTS.md, README.md) → detects runtime and toolchain from manifest files → locates and verifies the binary → detects OS timeout method → discovers non-interactive and output-format flags → saves evaluations/<cli>/environment.md.

Invoked automatically by: cli-agent-evaluate, cli-agent-evaluate-batch, cli-agent-readiness (Step 0) and cli-agent-audit (Phase 2).


cli-agent-evaluate

Role: targeted single-failure-mode evaluation. Use when investigating one specific §N, or when building a custom evaluation sequence.

Steps: loads environment profile → locates failure mode file → reads ### Evaluation section → runs **Check:** command → assigns score 0–3 (or ?/3 on timeout) → reads ### Agent Workaround if score < 3 → emits structured result block → saves to findings.md and trace.md.

Score semantics: 3 = fully passing, 0 = failing, ?/3 = check timed out (indeterminate — not a fail).

Resumable: skips §N rows already present in findings.md unless --refresh is passed.


cli-agent-evaluate-batch

Role: efficient multi-failure-mode evaluation in a single run. Accepts a severity filter (critical, high, medium, all), a part number (part 1part 7), or an explicit §N list.

Difference from cli-agent-evaluate: builds a queue from the failure mode index, processes all entries in order, saves findings after each §N (resumable), emits a final scorecard table sorted by severity then §N.

Resumable: skips §N rows that have a complete row in findings.md AND a complete block in trace.md. Re-evaluate with --refresh.


cli-agent-readiness

Role: positive readiness score — measures what the CLI provides, not just what it avoids breaking. Complements cli-agent-evaluate-batch.

Five dimensions (3 points each, 15 total):

# Dimension What it measures
1 Documentation Quality Can an agent learn correct invocation from docs alone
2 Self-Description Does --schema/manifest return a machine-readable ManifestResponse
3 Pre-built Integrations MCP server, OpenAPI spec, Claude skill, workflow recipes — with co-versioning check
4 Setup Reproducibility Non-interactive idempotent install; dependency manifest; health-check command
5 Workflow Coverage Copy-pasteable examples covering CRUD operations, verified by execution

Depth: quick evaluates dimensions 1–2 only (score out of 6); full evaluates all five.

Grade scale: A 13–15 · B 10–12 · C 7–9 · D 4–6 · F 0–3.


cli-agent-report

Role: translate raw findings into actionable output for a specific audience. Never re-runs CLI checks — reads only pre-existing artifacts.

Five modes:

Mode Audience Output
dev CLI authors Fix list: failing §N → ### Solutions section from challenge file
agent-dev Agent builders Integration guide: binary invocation, env vars, per-§N workarounds, invocation invariants
runtime AI agents Compact brief: always-include flags, never-do actions, output patterns to watch
issues Agent users Concrete bugs from issues.md + gap list from failing §N
all All above Runs all four modes, writes 8 files, generates index + README + LinkedIn/X drafts

all mode validation: runs scripts/validate_report_bundle.py before declaring the bundle complete.


cli-agent-audit

Role: autonomous end-to-end pipeline. Single command, zero human steps in the happy path.

Six phases:

Phase Action Halt on failure
0 Pre-flight — check existing artifacts, determine what to skip n/a
1 Install — non-interactive package install, binary verification yes
2 Onboard — delegate to cli-agent-onboard yes
3 Readiness — delegate to cli-agent-readiness no (warn and continue)
4 Evaluate — delegate to cli-agent-evaluate-batch no (note skipped §N)
5 Report — delegate to cli-agent-report mode=all no (fix reported errors)

Scope: critical (default), critical+high, or all.

Flags: --skip-install, --skip-readiness, --refresh.

Artifacts produced: 13 files under evaluations/<cli>/ — see the cli-agent-audit SKILL.md for the full table.


cli-agent-implement

Role: guides a CLI framework author through implementing the spec, tier by tier.

Steps: naming audit (corpus alignment, verb gaps, flag gaps) → identify target language and framework → generate language-specific types from JSON schemas → implement REQ-F (Framework-Automatic) → REQ-C (Command Contract) → REQ-O (Opt-In) → verify acceptance criteria.

Key invariant to add post-codegen: retryable: true implies side_effects: "none" in ExitCodeEntry — code generators do not enforce this; add a validation snippet manually.

Standalone: does not read evaluation artifacts or invoke other skills.


cli-agent-diagnose

Role: post-hoc failure classifier. Given a failed CLI call (command, stdout, stderr, exit code), identifies the matching §N failure mode and returns an actionable workaround, a memory string, and a skill patch.

Input forms: failed call visible in the current conversation, a JSON object, or a path to a message history file (OpenAI format, LangSmith, or Langfuse).

Scripts: - scripts/diagnose.py — classifier; output is always JSON regardless of exit code - scripts/runner.py — drop-in subprocess.run wrapper that applies §N fixes transparently - scripts/preflight_hook.py — Claude Code PreToolUse hook that intercepts Bash calls before they run

Exit codes of diagnose.py: 0 match found · 2 trace too sparse · 3 no match.

Scope: tool execution layer only. Reasoning failures, wrong answers, and incorrect domain logic are out of scope — diagnose.py returns no_match for those.


fixlayer-report

Role: visual counterpart to cli-agent-diagnose. Renders a self-contained HTML audit report in the FixLayer design system from a tool-execution trace file.

Input formats: Claude Code PostToolUse hook log (JSONL), OpenAI message history (JSON array), or single-event JSON.

Script: scripts/generate_report.py <trace-file> — calls diagnose.py --explain, collects session stats (tool counts, durations, retries), renders HTML, prints output path.

Report sections: verdict strip (pass/fail + §N codes) → match cards (evidence, triggering tool calls, workaround, limitation) → memory strings and skill patches → action items.

Depends on: cli-agent-diagnose scripts installed alongside it.


Role: spec consistency validator. Detects broken file references and symmetry violations before they accumulate.

Five checks:

# Check What it validates
1 Broken file links Every relative link in every .md file resolves to an existing file
2 Schema↔requirement symmetry Used by in schema .md## Schema link in requirement; bidirectional
3 Index completeness Every file linked from requirements/index.md and schemas/index.md exists; every file in those directories is listed
4 Content completeness Challenge and requirement files have all required sections (informational — not a hard error)
5 Counter consistency Numbers declared in README.md, AGENTS.md, challenges/index.md, requirements/index.md match actual file counts on disk

Standalone: reads spec files only; does not invoke other skills or write any artifacts.

When to run: after adding or editing any failure mode, requirement, schema, or guide file.


Quick-Reference: Which Skill to Use

Goal Skill
Full autonomous audit of a CLI cli-agent-audit
Evaluate one specific §N cli-agent-evaluate
Evaluate a set of failure modes cli-agent-evaluate-batch
Profile a CLI before evaluation cli-agent-onboard
Score what a CLI provides to agents cli-agent-readiness
Generate a fix list for the CLI author cli-agent-report mode=dev
Generate an integration guide for an agent builder cli-agent-report mode=agent-dev
Generate a runtime brief for an AI agent cli-agent-report mode=runtime
Generate all reports + index + social drafts cli-agent-report mode=all
Understand why a CLI call failed cli-agent-diagnose
Visual HTML report from a trace file fixlayer-report
Implement the spec in a CLI framework cli-agent-implement
Check spec cross-links after editing validate-links