Your machine. Your agent. Use it from anywhere. OpenMonoAgent.ai is a terminal-native coding agent powered by local LLMs — 100% open source, free forever, and installed with a single command. Proudly built on C#/.NET, because AI tooling should be infrastructure, not a subscription.
"AI shouldn't be a subscription you rent. It should be infrastructure you own - sitting on your desk, serving your code, answering only to you."
- The StartupHakk democratize-AI thesis
Everything you'd expect from a modern AI coding tool, and a few things you wouldn't, because it doesn't have to phone home. No complex setup required.
Auto-detects your hardware and loads the right model for it. GPU runs Qwen3.6 27B at full power. CPU drops to Qwen3.6 2.5B A3B, lean and still capable. No switching, no config, no API keys. Just run it and it figures itself out. Fully offline.
Full-screen ANSI TUI with streaming tokens, live tok/s meter, context-window usage, auto-compaction, and a flipping hourglass that actually tells you the agent is still thinking.
The agent runs in a container. Your project is bind-mounted in. It can edit your files - but it can't leave the box.
File I/O, shell, search, web fetch, LSP, patches, sub-agents, plan mode. Extend via Model Context Protocol servers.
OpenMonoAgent isn't just built on .NET, it's designed for .NET developers first. Roslyn is baked in, giving the agent real compiler intelligence: type hierarchies, call graphs, cross-assembly references. The agent understands your solution the way the compiler does.
Hover, go-to-definition, and references for C# and TypeScript. Real code intelligence, locally.
Typed, composable, stateful workflow automation. Step sequencing, gates, and templates - not just markdown recipes.
Run the agent on your laptop, inference on your home GPU rig. Outbound-only via relay - works behind any NAT or CGNAT.
JSONL transcripts, cross-session memory, auto-compaction, and snapshot-based file undo. Nothing lost, nothing leaked.
llama.cpp is built right in. Two hardware paths, one decision already made for you. GPU runs Qwen3.6 27B — fast, powerful, no setup. CPU runs Qwen3.6 2.5B A3B — lean, capable, no compromises. Fast where you have power, efficient where you don't. Fire it up and start making requests. No rate limits. No clock watching. No bills.
| Hardware | Memory | tok/s |
|---|---|---|
| Ryzen 9 7940HS DDR5 dual-channel | 32 GB | ~20 |
| RTX 3090 936 GB/s bandwidth | 24 GB VRAM | ~45-50 |
| RTX 4090 1,008 GB/s bandwidth | 24 GB VRAM | ~47-48 |
| RTX 5090 1,792 GB/s bandwidth | 32 GB VRAM | ~75-80 |
Great coding agents already exist. They're also cloud-locked, metered, and running in someone else's datacenter. We built the third option.
| Claude Code | OpenCode | OpenMonoAgent.ai | |
|---|---|---|---|
| Cost | Per-token | Per-token | Unlimited tokens - Free |
| Privacy | Cloud-only | Cloud + Offline | Fully offline (private) |
| Ease of Install | Simple | Difficult | Simple |
| Local LLM Environment | None | Agent | Agent + Inference |
| Tools | 44 | 15 | 20 + 22 via MCP |
| Written in | npm | npm | C#/.NET |
| Sandboxing | Host install | Host install | Docker-native |
| License | Commercial | Open source | MIT · open source |
The installer detects your hardware (CUDA if it finds an NVIDIA GPU, OpenBLAS CPU fallback otherwise), builds llama.cpp, downloads the model, and wires up Docker. You do nothing besides approve a couple of prompts.
Read full docsClone the repo. Run the installer. Own your AI dev stack before the end of the afternoon.