16 high signal handling
Part II: Execution & Reliability | Challenge §16
16. Signal Handling & Graceful Cancellation
Severity: High | Frequency: Situational | Detectability: Hard | Token Spend: Medium | Time: Medium | Context: Low
The Problem
Agents enforce time budgets by killing processes (SIGTERM, then SIGKILL). Most CLI tools handle this by dying instantly — no cleanup, no output, no indication of what state was left behind.
Default signal behavior (the bad path):
$ tool migrate-database &
PID=1234
# Agent times out, kills the process:
$ kill -TERM 1234
# Tool dies immediately:
# - No output emitted
# - Temp files left on disk
# - Lock file not released
# - Database partially migrated
# - Agent receives: exit code 143 (128+SIGTERM), empty stdout
SIGPIPE on broken pipe:
$ tool list-logs | head -5
# After head exits, tool receives SIGPIPE
# Default: Python raises BrokenPipeError → ugly traceback to stderr
# Default: Go panics or silently exits non-zero
# Agent sees an error that isn't really an error
No grace period between SIGTERM and SIGKILL:
# Agent sends SIGTERM, waits 0ms, sends SIGKILL
# Tool had no chance to write partial results or clean up
Impact
- Unknown intermediate state after cancellation
- Lock files and temp files accumulate, causing failures on next run
- Agent gets no information about what was completed before kill
- SIGPIPE masquerades as error, causing unnecessary retries
Solutions
Register signal handlers that emit JSON then exit cleanly:
import signal, sys, json, atexit
_cleanup_done = False
def handle_sigterm(signum, frame):
global _cleanup_done
if _cleanup_done:
return
_cleanup_done = True
# Emit partial result to stdout before exit
result = {
"ok": False,
"partial": True,
"error": {"code": "CANCELLED", "message": "Process received SIGTERM"},
"completed_steps": get_completed_steps(),
"resume_from": get_current_step()
}
sys.stdout.write(json.dumps(result) + "\n")
sys.stdout.flush()
cleanup_temp_files()
release_locks()
sys.exit(143) # 128 + SIGTERM
signal.signal(signal.SIGTERM, handle_sigterm)
atexit.register(cleanup_temp_files)
SIGPIPE handling:
# Python: suppress BrokenPipeError on stdout
signal.signal(signal.SIGPIPE, signal.SIG_DFL)
# or wrap all stdout writes in try/except BrokenPipeError
Advertise cancellation support in schema:
{
"command": "migrate-database",
"cancellable": true,
"cancel_signal": "SIGTERM",
"cancel_grace_period_ms": 5000,
"on_cancel": "emits partial result + rollback available"
}
For framework design:
- Framework installs SIGTERM and SIGPIPE handlers automatically for every command
- Every command declares a cleanup() hook called on signal
- Grace period: framework sends SIGTERM, waits cancel_grace_period_ms, then SIGKILL
- Partial result always emitted to stdout before exit, even on cancellation
Evaluation
| Score | Condition |
|---|---|
| 0 | SIGTERM causes immediate death with no output; SIGPIPE produces traceback on stderr |
| 1 | SIGTERM handled for cleanup only (lock release, temp deletion) but no JSON output emitted |
| 2 | SIGTERM emits partial JSON result before exit; exits 143; SIGPIPE suppressed |
| 3 | SIGTERM emits partial result with completed_steps and resume_from; all locks released; cancellable: true declared in manifest |
Check: Start a long-running command, send SIGTERM after 1s — verify it emits valid JSON to stdout within 2s and exits 143 (not 1 or 130).
Agent Workaround
Send SIGTERM and collect any partial JSON emitted during the grace period:
import subprocess, signal, json, time
proc = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
# Wait for timeout, then cancel gracefully
time.sleep(budget_seconds)
proc.send_signal(signal.SIGTERM)
# Give the tool up to 5s to flush partial output
try:
stdout, stderr = proc.communicate(timeout=5)
except subprocess.TimeoutExpired:
proc.kill()
stdout, stderr = proc.communicate()
# Try to parse any partial result flushed before exit
for line in reversed(stdout.decode(errors="replace").strip().splitlines()):
try:
partial = json.loads(line)
# Use partial["completed_steps"] and partial["resume_from"] to plan next step
break
except json.JSONDecodeError:
continue
Suppress SIGPIPE errors when piping tool output:
# Python: run the tool with SIGPIPE set to default (not raise)
proc = subprocess.Popen(cmd, preexec_fn=lambda: signal.signal(signal.SIGPIPE, signal.SIG_DFL))
Limitation: If the tool installs no SIGTERM handler, it dies instantly with no output — the agent receives exit 143 with empty stdout and cannot determine what state was left behind; assume the operation is in an unknown partial state and verify before retrying