Skip to content

13 critical partial failure

Part II: Execution & Reliability | Challenge §13

13. Partial Failure & Atomicity

Severity: Critical | Frequency: Common | Detectability: Hard | Token Spend: High | Time: High | Context: Medium

The Problem

Multi-step commands can fail mid-execution, leaving the system in an unknown intermediate state. The agent receives a failure but doesn't know what was completed.

Partial failure with no rollback:

$ tool migrate-database
Step 1/4: backup... done
Step 2/4: apply schema changes... done
Step 3/4: migrate data... FAILED (disk full)
Step 4/4: (not reached)
exit 1

Agent retries. Step 2 runs again. Now schema is applied twice → error.

Batch operations with partial success:

$ tool send-notifications --users 1,2,3,4,5
Sent to user 1: ok
Sent to user 2: ok
Sent to user 3: FAILED (invalid email)
Sent to user 4: ok
Sent to user 5: FAILED (rate limited)
exit 1

Exit code 1 tells agent "failed" but 3/5 succeeded. Retry sends duplicates to 1, 2, 4.

Impact

  • Retry causes double execution of already-completed steps
  • Agent cannot determine safe recovery path
  • Manual intervention required to resolve unknown state

Solutions

Structured partial failure output:

{
  "ok": false,
  "partial": true,
  "completed_steps": ["backup", "apply_schema"],
  "failed_step": "migrate_data",
  "error": {"code": "DISK_FULL", "message": "..."},
  "resume_from": "migrate_data",
  "rollback_available": true
}

Batch result per item:

{
  "ok": false,
  "partial": true,
  "results": [
    {"id": 1, "ok": true,  "effect": "sent"},
    {"id": 2, "ok": true,  "effect": "sent"},
    {"id": 3, "ok": false, "error": {"code": "INVALID_EMAIL"}},
    {"id": 4, "ok": true,  "effect": "sent"},
    {"id": 5, "ok": false, "error": {"code": "RATE_LIMITED"}}
  ],
  "summary": {"total": 5, "succeeded": 3, "failed": 2}
}

Resumable commands:

tool migrate-database --resume-from migrate_data
# Only runs remaining steps

For framework design: - All multi-step commands emit a step manifest at start - Each step emits its result as it completes (streaming JSON lines) - Final summary always includes completed, failed, skipped counts - --rollback-on-failure flag as standard option

Evaluation

Score Condition
0 Multi-step command exits 1 on mid-execution failure with no indication of what completed
1 Completed steps mentioned in stderr text but not structured; no resume or rollback support
2 Structured completed_steps / failed_step in JSON output; exit code distinguishes partial failure
3 Per-item results for batch commands; resume_from token; --rollback-on-failure flag; step manifest emitted at start

Check: Trigger a deliberate mid-run failure (e.g., bad step 3 of 5) — the response must contain partial: true, list completed steps, and identify where to resume.


Agent Workaround

Parse structured partial failure output to determine safe retry scope:

result = run(["tool", "migrate-database"])
parsed = json.loads(result.stdout)

if parsed.get("partial"):
    completed = parsed.get("completed_steps", [])
    resume_from = parsed.get("resume_from")
    rollback_available = parsed.get("rollback_available", False)

    if rollback_available:
        # Roll back to clean state before retrying from scratch
        run(["tool", "migrate-database", "--rollback"])
    elif resume_from:
        # Resume from the failed step only
        run(["tool", "migrate-database", f"--resume-from={resume_from}"])
    else:
        # No structured resume info — do not retry; requires manual investigation
        raise RuntimeError(f"Partial failure at unknown step. Completed: {completed}")

For batch commands, collect failed IDs and retry only those:

results = parsed.get("results", [])
failed_ids = [r["id"] for r in results if not r["ok"]]
# Retry only failed items
run(["tool", "send-notifications", "--users", ",".join(map(str, failed_ids))])

Limitation: If the tool emits only a text error with no structured step information, the agent cannot determine what succeeded — do not retry the full operation without verifying current state first, as re-running completed steps may cause duplicate side effects