A Skill is a prompt. The model can drift, skip, or misinterpret it under long context.
A Playbook gate is code — the executor calls waitForUserInput()
and the LLM is not in the loop. It cannot skip it, hallucinate past it, or decide it knows better.
A suggestion inside a prompt. Works for personal workflows where soft enforcement is fine.
Code. Hard enforcement. For teams where a skipped gate could push a broken image to production.
Before a playbook runs a single step, it validates every input parameter it needs. Not at runtime, mid-step — before anything starts. Wrong input never reaches the agent.
Every parameter declares a type. The executor enforces it before the LLM sees anything.
You literally cannot pass production instead of prod. The schema rejects it, not the agent.
allow-destructive defaults to false. The playbook aborts the moment it finds a DROP TABLE — the LLM doesn't decide that. The schema does.
You cannot start incident response without knowing what broke. The playbook won't run. No guessing, no hallucinated defaults.
Each step has an ID, declares what it needs, and declares what it produces. State is saved by step ID — crash at step 4, resume from step 4.
Step name. State is checkpointed here. Crash at step 4, resume from step 4 — not from the beginning.
Dependency graph. The executor topologically sorts steps. apply-staging cannot run until review-schema completes.
The LLM instructions for this step. The model only sees this file when the step runs — nothing from other steps leaks in.
Shell script that validates output after the LLM finishes. Exit code zero means pass, non-zero halts the playbook. The LLM doesn't validate itself.
The LLM result is saved under this key. Later steps reference it with {{state.staging_result}} in their prompts.
Human-in-the-loop checkpoint. None · Confirm · Review · Approve. Enforced by the executor, not the LLM.
Every step declares a gate. The executor enforces it. The LLM is not in the loop at that moment — it cannot skip, misinterpret, or hallucinate past it.
The step runs, completes, and moves on. Zero interruption. Used for safe reads and analysis steps.
The agent pauses and shows you what it's about to do. You type yes or no. Used for reversible but significant operations like staging applies.
The agent produces output — a schema diff, PR description — and presents it. You read it, edit it, then approve or reject. The agent doesn't publish until you sign off.
For production database applies, git tag pushes, and PR creation. The agent shows a full summary of what will happen if you approve, and waits for explicit sign-off.
For a P0 incident, every gate is enforced. The agent cannot skip them. The LLM is instructed — and the executor enforces — that it cannot auto-approve its own actions in a production outage scenario. This is how you give an AI agent real production access without losing your mind.
Traditional agents load your entire system prompt, all tool definitions, and all history into every single LLM call. Playbooks don't do that.
{{state.key}} — not by restuffing the entire context history. In Skills you have one big markdown file. In Playbooks, an agent is divided into focused sub-files — each step editable, versionable, reviewable in isolation.
Each step is an isolated file. PRs that change only the production apply logic are reviewable without reading 500 lines of skill.
pr-ready calls commit as a sub-step. release can also call commit. Logic is shared, not copy-pasted.
Version individual steps in git. Diff just the smoke-test script. Composable units scale where monolithic skills can't.
Pre-execution validation. Named steps with dependency graphs. Four levels of human-in-the-loop gates. Selective context loading. Folder-based composition. All in the OpenMonoAgent.ai repo — self-hosted, free forever.