NeuronAI — Controlled autonomous AI for Linux

What NeuronAI can do

A commander, a memory, and a team of troops.

Built end-to-end on a Sacred Layer of identity and rules above a working runtime.

Commands a team of troops

A CEO-style commander auto-selects and deploys the best-fit specialist “troop” — each its own local model — for the mission. A second, always-warm reasoner gives an independent opinion.

Remembers with a neural graph

A neural memory and knowledge graph — remember, recall, link and explore — plus cross-session memory and decision logs. Context that compounds, entirely on-device.

Learns from every mission

A lessons-and-feedback loop captures what worked and feeds it back; per-troop after-action reviews make the next run better than the last.

Compresses its own models

TurboQuant compression runs capable models inside a laptop’s memory — benchmarked, estimated and validated — so commodity hardware punches above its weight.

Governs itself

Red-team challenges, approval gates, an audit trail, policies and a taxonomy sit under a Sacred Layer of fixed identity and golden rules — autonomy with a leash.

Operates your machine

Git, Postgres, SQLite, the filesystem, the shell, Kubernetes and HTTP — NeuronAI does real dev and ops work, not just conversation. It profiles the hardware and picks the right model for the job.

Retrieval-grounded answers

Retrieval-augmented generation over an offline corpus, with each persona paired to its own scoped knowledge — specialists that actually know their field, anchored to your documents.

Briefs you, tracks the work

A daily brief and an operator cockpit; project goals, progress and next actions are tracked so the system stays accountable to the mission.

The engineering story

Optimising the half of latency everyone misses.

On CPU-only hardware, a “fast” query took ~90 seconds. The obvious suspect is token generation — and it’s the wrong place to look.

DiagnoseProfiling showed the latency was dominated by prompt prefill — the model reading the system prompt and context before writing a single token. On CPU you feel every token of it; generation wasn’t the bottleneck, the prompt was.

Re-architectAn effort-routing layer keeps the prompt path lean for common queries and only escalates context and model size when the question genuinely warrants it.

ResultCommon-query latency dropped roughly four-fold — from ~90s to ~22s — on the same 14 GB CPU laptop. No new hardware; compute simply spent where it actually goes.

One real number, honestly stated. The ~4× win is measured on a 14 GB CPU-only laptop for common queries — not a synthetic benchmark, and not extrapolated to hardware we don’t run. NeuronAI is in active development; the commander, memory and ops layers are working today.