// openmonoagent.ai · playbooks · BETA

Your agent
needs guarantees,
not suggestions.

A Skill is a prompt. The model can drift, skip, or misinterpret it under long context. A Playbook gate is code — the executor calls waitForUserInput() and the LLM is not in the loop. It cannot skip it, hallucinate past it, or decide it knows better.

~/my-project — openmono
$
Skill

A suggestion inside a prompt. Works for personal workflows where soft enforcement is fine.

vs
Playbook Gate

Code. Hard enforcement. For teams where a skipped gate could push a broken image to production.

// 01 · pre-execution

Fail before you
start wrong.

Before a playbook runs a single step, it validates every input parameter it needs. Not at runtime, mid-step — before anything starts. Wrong input never reaches the agent.

db-migrate/PLAYBOOK.yaml
# parameters block — validated before step 1 runs parameters: target: type: String required: true enum: [dev, staging, prod, all] allow-destructive: type: Boolean required: false default: false migration-path: type: String required: false default: "./migrations"
Typed schemas, not hints

Every parameter declares a type. The executor enforces it before the LLM sees anything.

Enum rejects typos at the gate

You literally cannot pass production instead of prod. The schema rejects it, not the agent.

Safe defaults without negotiation

allow-destructive defaults to false. The playbook aborts the moment it finds a DROP TABLE — the LLM doesn't decide that. The schema does.

Incident response requires severity + service

You cannot start incident response without knowing what broke. The playbook won't run. No guessing, no hallucinated defaults.

// 02 · named steps

Every step is a
first-class object.

Each step has an ID, declares what it needs, and declares what it produces. State is saved by step ID — crash at step 4, resume from step 4.

db-migrate/PLAYBOOK.yaml — steps
steps: - id: validate file: steps/01-validate.md gate: None script: scripts/validate.sh output: validation_report   - id: dry-run-dev requires: [validate] file: steps/02-dry-run.md gate: None output: dry_run_report   - id: apply-staging requires: [review-schema] file: steps/04-apply.md gate: Confirm script: scripts/apply.sh output: staging_result   - id: apply-prod requires: [smoke-test] file: steps/06-apply-prod.md gate: Approve script: scripts/apply-prod.sh output: prod_result
id

Step name. State is checkpointed here. Crash at step 4, resume from step 4 — not from the beginning.

requires

Dependency graph. The executor topologically sorts steps. apply-staging cannot run until review-schema completes.

file

The LLM instructions for this step. The model only sees this file when the step runs — nothing from other steps leaks in.

script

Shell script that validates output after the LLM finishes. Exit code zero means pass, non-zero halts the playbook. The LLM doesn't validate itself.

output

The LLM result is saved under this key. Later steps reference it with {{state.staging_result}} in their prompts.

gate

Human-in-the-loop checkpoint. None · Confirm · Review · Approve. Enforced by the executor, not the LLM.

// 03 · human in the loop

Four levels of
trust enforcement.

Every step declares a gate. The executor enforces it. The LLM is not in the loop at that moment — it cannot skip, misinterpret, or hallucinate past it.

None

Fully automated

The step runs, completes, and moves on. Zero interruption. Used for safe reads and analysis steps.

→ validate.sh passed (0 errors)
✓ Proceeding to next step
Confirm

Checkpoint before action

The agent pauses and shows you what it's about to do. You type yes or no. Used for reversible but significant operations like staging applies.

⚠ About to apply to staging. Proceed? [y/N]
Review

Generate, then hand off

The agent produces output — a schema diff, PR description — and presents it. You read it, edit it, then approve or reject. The agent doesn't publish until you sign off.

→ PR description generated (from actual commits)
◉ Review and approve to continue
Approve

Irreversible — explicit sign-off

For production database applies, git tag pushes, and PR creation. The agent shows a full summary of what will happen if you approve, and waits for explicit sign-off.

→ Will apply 3 migrations to prod (142 rows affected)
⛔ APPROVE required to continue [type APPROVE]
incident-response/PLAYBOOK.yaml — constraints
constraints: inline: - "If severity is P0, every gate is mandatory — never auto-proceed regardless of context."

For a P0 incident, every gate is enforced. The agent cannot skip them. The LLM is instructed — and the executor enforces — that it cannot auto-approve its own actions in a production outage scenario. This is how you give an AI agent real production access without losing your mind.

// 04 · context optimization

Less context.
Less hallucination.

Traditional agents load your entire system prompt, all tool definitions, and all history into every single LLM call. Playbooks don't do that.

db-migrate/PLAYBOOK.yaml
context-mode: Selective max-context-tokens: 6000
steps/01-validate.md
# Step: Validate Migration Files Parse all pending migration files and flag issues before touching any database.   1. List all migration files in order: find {{params.migration-path}} -name "*.sql" | sort   2. If {{params.allow-destructive}} is false and any destructive operation found: Abort immediately.
Traditional agent context
100k+
tokens per call — full history, all tools, all instructions loaded every time
Playbook step context
6k
tokens per step — only the step instructions and relevant state
Selective context mode — each step only gets its own instructions. When validate runs, the LLM sees only the validate file. Production apply instructions don't exist yet.
Hallucination is a concentration problem. Give an LLM a 50-page document and ask about page 3 — it pulls noise from pages 20-40. Narrow scope, accurate output.
State threads forward cleanly. Previous step outputs are injected via {{state.key}} — not by restuffing the entire context history.
Lower cost per call. 6k vs 100k+ tokens per call means dramatically lower API spend for cloud-backed runs.
// 05 · composition

Folder-based, not
monolithic.

In Skills you have one big markdown file. In Playbooks, an agent is divided into focused sub-files — each step editable, versionable, reviewable in isolation.

~/playbooks/
db-migrate/ ├── PLAYBOOK.yaml # definition, params, manifest ├── steps/ │ ├── 01-validate.md │ ├── 02-dry-run.md │ ├── 03-review-schema.md │ ├── 04-apply.md │ ├── 05-smoke-test.md │ ├── 06-apply-prod.md │ └── 07-verify-counts.md └── scripts/ ├── validate.sh ├── dry-run.sh ├── apply.sh ├── smoke-test.sh └── apply-prod.sh   pr-ready/ ├── PLAYBOOK.yaml └── steps/   commit/ ├── PLAYBOOK.yaml └── steps/
pr-ready/PLAYBOOK.yaml — composition
- id: stage-remaining requires: [lint] inline-prompt: > Check for uncommitted changes. If any, run the commit playbook. playbook: commit   depends-on: - commit
Edit step 4 without touching step 1

Each step is an isolated file. PRs that change only the production apply logic are reviewable without reading 500 lines of skill.

Reusable sub-playbooks

pr-ready calls commit as a sub-step. release can also call commit. Logic is shared, not copy-pasted.

Software engineering applied to agents

Version individual steps in git. Diff just the smoke-test script. Composable units scale where monolithic skills can't.

// open source

Reliable agents start
with hard enforcement.

Pre-execution validation. Named steps with dependency graphs. Four levels of human-in-the-loop gates. Selective context loading. Folder-based composition. All in the OpenMonoAgent.ai repo — self-hosted, free forever.