A local-first coding agent for laptops that should not be running coding agents.
n0x reads your repo, edits files, runs shell commands, writes tests, explains code,
and can generate commits. The default setup runs a tiny Bonsai GGUF model through
llama-server, so it works on machines where normal coding models would swap
the system to death.
Most coding agents assume one of two things:
- you are fine sending your repo to a hosted model
- your machine has enough RAM for a serious local model
That leaves out a lot of people: students, cheap VPS users, old ThinkPads, offline setups, privacy-sensitive work, and anyone who just does not want another monthly bill.
n0x is the opposite bet. It is small, local, hackable, and honest about the tradeoff. Bonsai is not a frontier model. It will not beat Claude Code on a hard multi-file refactor. But it can run locally, for free, on modest hardware, and it can still do useful work.
npm install -g n0x-cliThen run it inside a project:
cd ~/my-project
n0x run "add a dark mode toggle"On first run, n0x creates ~/.n0x/config.toml, helps you choose a model,
downloads it, and starts the local backend.
Check your setup any time:
n0x doctor# Core agent loop
n0x run "fix the login redirect bug"
n0x run "write tests for src/auth.ts" --dry
n0x run "refactor this module" -i
n0x run --model qwen2.5-coder:7b "clean up the API layer"
# Interactive session
n0x chat
# One-shot utilities
n0x explain src/server.ts
n0x fix "TypeError: Cannot read properties of undefined"
n0x commit
# Project context
n0x init
n0x map
n0x symbols
n0x memory
n0x reflections
n0x checkpoint "before refactor"
n0x checkpoints
n0x restore latest
# Models and backends
n0x setup
n0x models
n0x use llama-server
n0x use ollama
n0x use autoInside n0x chat, use slash commands:
/help
/status
/model qwen2.5-coder:7b
/memory
/checkpoint before risky edit
/checkpoints
/restore latest
/clear
/exit
n0x is a ReAct-style agent:
read goal
build context
make a plan
call a tool
observe output
repeat
The model gets tools for reading files, editing files, applying patches, searching with ripgrep/glob, running bash, and inspecting the repo. The loop is plain TypeScript, not a hidden service.
The default backend is:
n0x -> llama-server -> local GGUF model
You can also point it at Ollama or any OpenAI-compatible endpoint:
n0x -> Ollama
n0x -> OpenAI-compatible local or remote server
n0x defaults to Bonsai through llama-server.
| Model | Approx model size | Use it for |
|---|---|---|
| Ternary Bonsai 1.7B | 370MB | very low RAM, small edits |
| Ternary Bonsai 4B | 1025MB | default for 4GB-ish machines |
| Ternary Bonsai 8B | 1.75GB | better quality if you have room |
If you already use Ollama:
ollama pull qwen2.5-coder:3b
n0x use ollama
n0x run --model qwen2.5-coder:3b "write tests for utils"For a custom OpenAI-compatible server, edit ~/.n0x/config.toml:
default_model = "your-model"
base_url = "http://localhost:8000/v1"
backend = "openai-compatible"
api_key = "none"The config lives at:
~/.n0x/config.toml
Typical llama-server config:
default_provider = "local"
default_model = "ternary-bonsai-4b"
base_url = "http://localhost:8080/v1"
backend = "llama-cpp"
api_key = "none"
max_steps = 20
bash_timeout_ms = 120000
llm_timeout_ms = 300000
stream_output = true
sandbox_docker = false
sandbox_image = "node:22-alpine"
model_path = "/home/you/.n0x/models/ternary-bonsai-4b-q2.gguf"
tavily_enabled = false
tavily_search_depth = "basic"
tavily_extract_depth = "basic"
# tavily_api_key = "tvly-..."Backend rules are simple:
backend = "llama-cpp"uses your GGUF file and can auto-startllama-serverbackend = "ollama"calls Ollama on port11434backend = "openai-compatible"calls whateverbase_urlpoints tobackend = "auto"probes common local ports and uses what is alive
n0x use ... updates both base_url and backend.
n0x is an agent that can edit files and run commands. Treat it like a junior developer with shell access.
What is built in:
--drypreviews changes without writing files-iasks before applying writes, edits, patches, deletes, and renames- apply/interactive runs create a checkpoint before the agent can edit
n0x restore latestreverts the workspace to the last checkpoint- file tools are confined to the current workspace
- symlink traversal is blocked
- risky shell patterns like
rm -rf /, fork bombs, andcurl | bashare denied - existing files are backed up under
~/.n0x/backups/before mutation - Docker sandboxing is available with
sandbox_docker = true
There is no magic trust layer. Review diffs for important code.
Good fits:
- small apps
- tests
- one-file refactors
- bug fixes from stack traces
- code explanation
- learning how coding agents work
- offline or private repos
Bad fits:
- big architecture rewrites
- production-critical edits without review
- huge monorepos on tiny context windows
- tasks where you need frontier-model reasoning
If you have a strong hosted agent and do not care about local/offline use, use that. n0x is for the other cases.
Run this first:
n0x doctorIf the model file exists but n0x says it is not configured, check:
backend = "llama-cpp"
model_path = "/absolute/path/to/model.gguf"
base_url = "http://localhost:8080/v1"If n0x is using Ollama when you wanted llama-server:
n0x use llama-server
n0x doctorIf you want Ollama:
ollama serve
ollama pull qwen2.5-coder:3b
n0x use ollama
n0x run --model qwen2.5-coder:3b "your task"If llama-server is missing:
which llama-servern0x setup tries to download a matching llama-server build. If that fails,
install llama.cpp so llama-server is on PATH, or point n0x at a manual
binary:
export N0X_LLAMA_SERVER=/absolute/path/to/llama-serverThen rerun:
n0x doctorIf the model gets lost in context, use a narrower prompt:
n0x run "only edit src/auth.ts: add validation for empty email"Or reduce the run:
n0x run --max-steps 8 --dry "inspect the bug and propose a patch"If an agent run made bad edits:
n0x checkpoints
n0x restore latestRestore is destructive by design: it returns the workspace to the checkpoint and removes files created after that checkpoint. Use git too.
src/
agent/ loop, planner, memory, reflection
tools/ read, write, edit, patch, bash, grep, glob
llm/ OpenAI-compatible client, backend detection, health checks
context/ repo context, symbols, session, compression
setup/ model download, llama-server lifecycle, terminal UI
config/ schema and config parsing
The important files:
src/agent/loop.tsis the main ReAct loopsrc/tools/is the tool surfacesrc/llm/client.tsis the OpenAI-compatible chat clientsrc/config.tsowns config loading and backend selectionsrc/setup/manager.tsstartsllama-server
git clone https://github.com/ixchio/n0x-cli.git
cd n0x-cli
npm install
npm run dev -- run "read the repo and summarize it" --dryChecks:
npm run typecheck
npm run lint
npm test
npm run buildOne command:
npm run checkNo. Claude Code is better for hard agentic coding. n0x is for local, cheap, offline, low-RAM work. Different constraint, different product.
Because memory is the bottleneck on cheap machines. Ternary weights make the model small enough to run where normal local coding models do not fit.
Yes. Use Ollama or point base_url at another OpenAI-compatible server.
ollama pull qwen2.5-coder:7b
n0x use ollama
n0x run --model qwen2.5-coder:7b "refactor the router"Yes after the model is downloaded. Web search is off by default.
The model can run locally, and tools are constrained, but this is still an
agent with filesystem and shell access. Use --dry, -i, git, and code review.
MIT