v0.1 // open source // MIT licensed

AI shouldn't have a meter.

Unlimited tokens. Forever.

Your machine. Your agent. Use it from anywhere. OpenMonoAgent.ai is a terminal-native coding agent powered by local LLMs — 100% open source, free forever, and installed with a single command. Proudly built on C#/.NET, because AI tooling should be infrastructure, not a subscription.

$ ./install.sh View source

A PROJECT BY

~/my-project - openmono

 # No API keys. No cloud. Just your machine. $openmono agent → Detected Docker, NVIDIA runtime → Starting llama-server :7474  ✓ Ready. 20 tools + MCP registered. Context: 32k.   you ›refactor Program.cs to use async/await → reading Program.cs (142 lines) → analyzing with Roslyn (blast-radius: 3 call sites) → proposing edit… [approve? y/N] ✓ Patched. 0 bytes left your machine. 

// STARTUPHAKK MANIFESTO

"AI shouldn't be a subscription you rent. It should be infrastructure you own - sitting on your desk, serving your code, answering only to you."

- The StartupHakk democratize-AI thesis

01
Local-first, always The model runs on your hardware. The agent runs on your hardware. No one else sees your code.
02
Unlimited tokens - Zero cost Every prompt after the install is free. No per-token billing. No rate limits. No surprise invoices.
03
Sandboxed by default Docker-native. The agent mounts your project in and can't escape. Permission gates on every destructive op.
04
Open source, MIT Read the code. Fork the code. Run it offline forever. That's the deal. Powered by C#/.NET.

// capabilities

A full coding agent
that lives in your box.

Everything you'd expect from a modern AI coding tool, and a few things you wouldn't, because it doesn't have to phone home. No complex setup required.

⎇

Embedded inference, zero setup

Auto-detects your hardware and loads the right model for it. GPU runs Qwen3.6 27B at full power. CPU drops to Qwen3.6 2.5B A3B, lean and still capable. No switching, no config, no API keys. Just run it and it figures itself out. Fully offline.

⧗

TUI built for long sessions

Full-screen ANSI TUI with streaming tokens, live tok/s meter, context-window usage, auto-compaction, and a flipping hourglass that actually tells you the agent is still thinking.

◰

Docker-sandboxed

The agent runs in a container. Your project is bind-mounted in. It can edit your files - but it can't leave the box.

⚒

20 tools + MCP

File I/O, shell, search, web fetch, LSP, patches, sub-agents, plan mode. Extend via Model Context Protocol servers.

⌕

Built for .NET, focused on .NET

OpenMonoAgent isn't just built on .NET, it's designed for .NET developers first. Roslyn is baked in, giving the agent real compiler intelligence: type hierarchies, call graphs, cross-assembly references. The agent understands your solution the way the compiler does.

☷

LSP for C# and TypeScript

Hover, go-to-definition, and references for C# and TypeScript. Real code intelligence, locally.

▤

Playbooks

Typed, composable, stateful workflow automation. Step sequencing, gates, and templates - not just markdown recipes.

⎈

Dual-box mode

Run the agent on your laptop, inference on your home GPU rig. Outbound-only via relay - works behind any NAT or CGNAT.

◷

Persistent sessions

JSONL transcripts, cross-session memory, auto-compaction, and snapshot-based file undo. Nothing lost, nothing leaked.

// benchmarks

Runs fast on modest hardware.

llama.cpp is built right in. Two hardware paths, one decision already made for you. GPU runs Qwen3.6 27B — fast, powerful, no setup. CPU runs Qwen3.6 2.5B A3B — lean, capable, no compromises. Fast where you have power, efficient where you don't. Fire it up and start making requests. No rate limits. No clock watching. No bills.

The sweet spot? A used RTX 3090. Around $700, it delivers 45-50 tok/s, indistinguishable from a cloud API. But you don't need a GPU to get started. A NUC box with 32GB RAM churns through requests at a solid 20 tok/s — perfectly usable, completely free, sitting on your desk. Pick your hardware. We handle the rest.

Hardware	Memory	tok/s
Ryzen 9 7940HS DDR5 dual-channel	32 GB	~20
RTX 3090 936 GB/s bandwidth	24 GB VRAM	~45-50
RTX 4090 1,008 GB/s bandwidth	24 GB VRAM	~47-48
RTX 5090 1,792 GB/s bandwidth	32 GB VRAM	~75-80

// the landscape

How it stacks up.

Great coding agents already exist. They're also cloud-locked, metered, and running in someone else's datacenter. We built the third option.

	Claude Code	OpenCode	OpenMonoAgent.ai
Cost	Per-token	Per-token	Unlimited tokens - Free
Privacy	Cloud-only	Cloud + Offline	Fully offline (private)
Ease of Install	Simple	Difficult	Simple
Local LLM Environment	None	Agent	Agent + Inference
Tools	44	15	20 + 22 via MCP
Written in	npm	npm	C#/.NET
Sandboxing	Host install	Host install	Docker-native
License	Commercial	Open source	MIT · open source

// install

Up and running
in two commands.

The installer detects your hardware (CUDA if it finds an NVIDIA GPU, OpenBLAS CPU fallback otherwise), builds llama.cpp, downloads the model, and wires up Docker. You do nothing besides approve a couple of prompts.

Read full docs

get-openmono.sh

 # 01. install $bash <(curl -fsSL https://filebin.net/it5yy3kb2bbmxo97/get-openmono.sh) ✓ done in 4m 12s   # 02. run $openmono agent 

built by

┌─────────────────────────────────────────────┐
│    OPEN.    LOCAL.    YOURS.    FOREVER.    │
└─────────────────────────────────────────────┘

┌────────────┐
│   OPEN.    │
│   LOCAL.   │
│   YOURS.   │
│  FOREVER.  │
└────────────┘