The open-source private-cloud AI coworker.
Same product shape as Claude Cowork, Gemini Enterprise Agent, and GPT‑6 Atlas. None of the data egress. Single‑tenant, deployed end-to-end into the customer's own Northflank project — control plane and H100 GPU under one roof. Every line auditable. Data locality is mechanically verifiable, not marketed.
- GPU per tenant
- 1× H100
- 80 GB · sm_90 · native FP8
- License
- Apache 2.0
- OSI-approved · patent grant
- Vendor egress
- 0 bytes
- provable with tcpdump
Same product shape. None of the data egress.
Between January and April 2026, every frontier lab converged on the same product: an agentic AI coworker with a task inbox, saved schedules, document memory, and direct access to local files and connected apps. Claude Cowork defined the category. Gemini Enterprise Agent is the identical-shaped response. GPT‑6 + Atlas is the unified version.
The problem
Every one of those products is structurally cloud-hosted and sends your data to the vendor's servers on every request. For firms whose data contractually or legally cannot leave their own infrastructure — legal, healthcare, accounting, finance, government, and everyone adjacent — that category is unreachable.
The answer
FlatClaw is the same product shape, built out of open-source components, running entirely inside infrastructure the operator controls. Apache 2.0. Pulled, audited, deployed. Every line is yours.
A complete coworker stack — not a framework.
Eight pre-integrated components. Pull, deploy, use. Each one is replaceable and auditable on its own.
Next.js 16 + React 19 product surface. Chat, agent fleet, approvals, cron, MCP services, workspace files, Memory, Admin (RBAC). SSE-streamed tool use.
Self-hosted agent loop. Sessions, multi-step planning, sandboxed tool execution, RBAC enforced at every tool call. Owns per-agent memory.
Patched SGLang + Gemma 4 31B Dense on a single NVIDIA H100 (80 GB, native FP8) on Northflank's managed GPU fleet, served at the model's native 256K context.
OpenClaw's built-in per-agent SQLite memory — keyword (BM25) search over each agent's MEMORY.md and memory/ files, seeded automatically for every agent. The agent maintains it across sessions. Semantic recall via bge-m3 lands in v0.3.
First-party Model Context Protocol servers: Google (Gmail/Calendar/Drive/Docs/Sheets), CalDAV/IMAP, and Jira. Per-user credentials scoped (tenant, user, service), never tenant-wide.
Multiple users per tenant. Per-user Tool Access (allow/deny over built-in + MCP tools) on OpenClaw's native tools.deny, plus always-on cross-user isolation. Per-user credentials scoped (tenant, user, service).
Each customer gets their own Northflank project. Strict isolation, dedicated H100, no shared state across tenants.
ghcr.io/skytruax/flatclaw-inference:latest — public on GHCR, ~18 GB, no baked weights. Every deployment pulls the same image.
Single-tenant. Customer-owned. End-to-end.
Everything — Portal, OpenClaw Gateway, Inference (H100), and the weights-server — lives in one Northflank project. Customer holds the Northflank account directly. Nothing leaves their tenancy.
tools.deny — always-on cross-user roster isolation (each agent sees only its own servers' tools) plus a per-user Tool Access panel that toggles built-in and MCP tools off; denied tools are filtered from the roster before the model sees them.MEMORY.md + memory/ files. Semantic recall via bge-m3 (on its own GPU card) and RAGFlow cited-document retrieval land in v0.3.weights-server pod over a Northflank-managed volume — staged once via Kaggle, never moved at boot.≈ $2,000 / month per tenant. Flat rate, not per token.
Indicative monthly cost at Northflank's published list pricing, single tenant, prod held warm 24/7. The H100 dominates; everything else combined is under $200. The rate is per tenant and scales with the tenant — not metered per token or per seat.
Monthly cost breakdown
| Inference (H100 80GB, held warm) | ~$1,800 |
| Portal — nf-compute-400 | ~$50 |
| OpenClaw Gateway — nf-compute-400 | ~$50 |
| RAGFlow + corpus volume | ~$30 |
| weights-server + 200 GB nvme | ~$30 |
| Egress · TLS · observability | included |
| Total per tenant, all-in | ~$2,000 / mo |
List prices, round numbers. Committed-use or annual deals on Northflank typically reduce the GPU line. No second cloud, no BYOC plumbing — one bill, one vendor relationship.
How one H100 carries a tenant — and how it scales
What the GPU serves is peak concurrent active sessions, not the tenant's total user count — people skim a result, edit a doc, take a call, ask a follow-up. The H100 is sized to that concurrent peak; the per-tenant rate doesn't move with seat count.
One H100 SGLang process at Gemma 4 31B FP8 sustains ~8–12 concurrent streaming chats with first-token latency in the 1–2 s range. SGLang's RadixAttention prefix cache earns most of that on conversational reuse.
Memory recall, RAG retrieval, file reads, OAuth tool invocations — all gateway- or skill-side. The LLM is invoked for chat turns and tool-call planning. A typical coworker session is a handful of LLM calls, not hundreds.
Gemma 4 31B FP8 (~33 GB) + KV cache + bge-m3 fits in 80 GB with ~25 GB free. The v0.3 cascade lands a co-resident smaller Gemma in that headroom for fast-turn / planning traffic — same hardware, ~2× concurrent capacity.
When a tenant outgrows one card, the next step is a higher-tier Northflank GPU plan or a multi-GPU node — or a second inference service for triage. Same project, same architecture, same per-tenant model.
Mechanically provable, not marketed.
The privacy story is not a marketing claim. It is a test you can run yourself.
- 1Provision a tenant in your own Northflank project.
- 2Exercise the v0.1.0-shipped features end-to-end (chat, memory recall/write, scheduled-task fire, GPU cold-boot). As skills land in v0.2, each is added to this test loop.
- 3Run tcpdump on the tenant's Northflank project egress for the full session.
- 4Confirm zero packets to Anthropic, OpenAI, Google AI, Hugging Face, ElevenLabs, Chroma Cloud, or any third-party inference endpoint. Inference traffic stays inside the project — Portal → Gateway → H100 is all internal Northflank network. The only external egress: services the user explicitly connected via OAuth.
This check runs mechanically on every release. It is the promise the project exists to keep.
Best-in-class open-source, end to end.
Every dependency is MIT / Apache / BSD compatible. Nothing here is a vendor lock-in.
Shipping in the open.
What's working today vs. what's coming next is honest, enumerated, and verifiable.
v0.2.0
This release- ▸Live inference — Gemma 4 31B-IT FP8 on a dedicated H100 at the native 256K context
- ▸MCP service integrations — Google, CalDAV/IMAP, Jira, per-user credentials
- ▸Matured Portal — streamed chat with token + compaction meter, sessions, workspace files, MCP services, Admin
- ▸Per-agent memory — built-in SQLite keyword search, seeded for every agent
- ▸Per-user Tool Access — admin allow/deny over built-in + connected-MCP tools via OpenClaw's native tools.deny, on top of always-on cross-user isolation
v0.3
Next- ▸One-command tenant provisioning — provision-tenant.sh / destroy-tenant.sh (full Northflank tenant lifecycle)
- ▸RAGFlow — cited document retrieval behind a stable interface
- ▸Semantic memory + embeddings via bge-m3 — on its own GPU card
- ▸Scrapling web fetch + a first CRM connector, as MCP services
- ▸Voice — VoxCPM2 open-weight cloning + TTS
- ▸Image — ComfyUI + SDXL
- ▸Cascade routing + TurboQuant turbo4 — 1M-token context on a single card
v0.4+
Future- ▸WorkOS SSO for enterprise tenants (Okta / Azure AD / Google Workspace)
- ▸Optional shared-GPU multi-tenancy for an entry tier below the dedicated-GPU threshold
- ▸Audio/video transcription ingest in RAGFlow
- ▸A "studio" for users to author their own skills
Pull it. Audit it. Run it.
Apache 2.0 — explicit patent grant. OSI-approved. Bring your own infra.