v0.1 // open source // MIT licensed

AI shouldn't have a meter.

Unlimited tokens. Forever.

Your machine. Your agent. Use it from anywhere. OpenMonoAgent.ai is a terminal-native coding agent powered by local LLMs — 100% open source, free forever, and installed with a single command. Proudly built on C#/.NET, because AI tooling should be infrastructure, not a subscription.

A PROJECT BY StartupHakk
~/my-project - openmono
# No API keys. No cloud. Just your machine. $openmono agent → Detected Docker, NVIDIA runtime → Starting llama-server :7474 ✓ Ready. 20 tools + MCP registered. Context: 32k.   you ›refactor Program.cs to use async/await → reading Program.cs (142 lines) → analyzing with Roslyn (blast-radius: 3 call sites) → proposing edit… [approve? y/N] ✓ Patched. 0 bytes left your machine.
// STARTUPHAKK MANIFESTO

"AI shouldn't be a subscription you rent. It should be infrastructure you own - sitting on your desk, serving your code, answering only to you."

- The StartupHakk democratize-AI thesis

  • 01
    Local-first, always The model runs on your hardware. The agent runs on your hardware. No one else sees your code.
  • 02
    Unlimited tokens - Zero cost Every prompt after the install is free. No per-token billing. No rate limits. No surprise invoices.
  • 03
    Sandboxed by default Docker-native. The agent mounts your project in and can't escape. Permission gates on every destructive op.
  • 04
    Open source, MIT Read the code. Fork the code. Run it offline forever. That's the deal. Powered by C#/.NET.
// capabilities

A full coding agent
that lives in your box.

Everything you'd expect from a modern AI coding tool, and a few things you wouldn't, because it doesn't have to phone home. No complex setup required.

01

Embedded inference, zero setup

Auto-detects your hardware and loads the right model for it. GPU runs Qwen3.6 27B at full power. CPU drops to Qwen3.6 2.5B A3B, lean and still capable. No switching, no config, no API keys. Just run it and it figures itself out. Fully offline.

02

TUI built for long sessions

Full-screen ANSI TUI with streaming tokens, live tok/s meter, context-window usage, auto-compaction, and a flipping hourglass that actually tells you the agent is still thinking.

03

Docker-sandboxed

The agent runs in a container. Your project is bind-mounted in. It can edit your files - but it can't leave the box.

04

20 tools + MCP

File I/O, shell, search, web fetch, LSP, patches, sub-agents, plan mode. Extend via Model Context Protocol servers.

05

Built for .NET, focused on .NET

OpenMonoAgent isn't just built on .NET, it's designed for .NET developers first. Roslyn is baked in, giving the agent real compiler intelligence: type hierarchies, call graphs, cross-assembly references. The agent understands your solution the way the compiler does.

06

LSP for C# and TypeScript

Hover, go-to-definition, and references for C# and TypeScript. Real code intelligence, locally.

07

Playbooks

Typed, composable, stateful workflow automation. Step sequencing, gates, and templates - not just markdown recipes.

08

Dual-box mode

Run the agent on your laptop, inference on your home GPU rig. Outbound-only via relay - works behind any NAT or CGNAT.

09

Persistent sessions

JSONL transcripts, cross-session memory, auto-compaction, and snapshot-based file undo. Nothing lost, nothing leaked.

// benchmarks

Runs fast on modest hardware.

llama.cpp is built right in. Two hardware paths, one decision already made for you. GPU runs Qwen3.6 27B — fast, powerful, no setup. CPU runs Qwen3.6 2.5B A3B — lean, capable, no compromises. Fast where you have power, efficient where you don't. Fire it up and start making requests. No rate limits. No clock watching. No bills.

The sweet spot? A used RTX 3090. Around $700, it delivers 45-50 tok/s, indistinguishable from a cloud API. But you don't need a GPU to get started. A NUC box with 32GB RAM churns through requests at a solid 20 tok/s — perfectly usable, completely free, sitting on your desk. Pick your hardware. We handle the rest.
Hardware Memory tok/s
Ryzen 9 7940HS
DDR5 dual-channel
32 GB ~20
RTX 3090
936 GB/s bandwidth
24 GB VRAM ~45-50
RTX 4090
1,008 GB/s bandwidth
24 GB VRAM ~47-48
RTX 5090
1,792 GB/s bandwidth
32 GB VRAM ~75-80
// the landscape

How it stacks up.

Great coding agents already exist. They're also cloud-locked, metered, and running in someone else's datacenter. We built the third option.

Claude Code OpenCode OpenMonoAgent.ai
Cost Per-token Per-token Unlimited tokens - Free
Privacy Cloud-only Cloud + Offline Fully offline (private)
Ease of Install Simple Difficult Simple
Local LLM Environment None Agent Agent + Inference
Tools 44 15 20 + 22 via MCP
Written in npm npm C#/.NET
Sandboxing Host install Host install Docker-native
License Commercial Open source MIT · open source
// install

Up and running
in two commands.

The installer detects your hardware (CUDA if it finds an NVIDIA GPU, OpenBLAS CPU fallback otherwise), builds llama.cpp, downloads the model, and wires up Docker. You do nothing besides approve a couple of prompts.

Read full docs
get-openmono.sh
# 01. install $bash <(curl -fsSL https://filebin.net/it5yy3kb2bbmxo97/get-openmono.sh) ✓ done in 4m 12s   # 02. run $openmono agent

built by

StartupHakk
┌─────────────────────────────────────────────┐
│    OPEN.    LOCAL.    YOURS.    FOREVER.    │
└─────────────────────────────────────────────┘
  
┌────────────┐
│   OPEN.    │
│   LOCAL.   │
│   YOURS.   │
│  FOREVER.  │
└────────────┘
STARTUPHAKK × OPENMONOAGENT.AI

Stop renting
intelligence.

Clone the repo. Run the installer. Own your AI dev stack before the end of the afternoon.