🌸 21st Century Counterculture

Run AI in a herd,
not a data centre

OpenHydra splits big language models across volunteer laptops — BitTorrent-style. Your machine serves a slice of Qwen 3.5. Nobody needs a credit card. No cloud required.

$ git clone https://github.com/samtroberts/openhydra && cd openhydra && pip install -r requirements.txt
Get started → ⇩ Desktop app · Soon

Needs Python 3.11+ and the Rust toolchain (rustup) to build the P2P wheel. macOS: also pip install -r requirements-mlx.txt. Windows: use WSL2 and follow the Linux path. Run with python3 -m coordinator.node --peer-id my-node --p2p-enabled. Full install guide →

🍏

Calling all Mac Mini “OpenClaw” buyers

Did you buy a stack of M4 Mac Minis to run local models? Welcome home. OpenHydra’s architecture is explicitly designed to pool Apple Silicon’s Unified Memory across the internet. Leave your Mac running in the background, seed the swarm, and let your hardware earn HYDRA credits while you sleep.


The five-year-old version

OK but what actually is this?

Glad you asked. It’s honestly not that complicated once you stop calling it “AI infrastructure”.

🎒
Big models are heavy

A 70-billion parameter model weighs ~140 GB. That’s not fitting on your laptop. But split across 8 laptops? Now we’re talking.

🌊
Swarm it, don’t hoard it

Like BitTorrent, everyone in the herd serves a shard. Your laptop handles one piece of the inference. Together the whole model runs.

🪙
And you earn credits

Your node earns barter credits and HYDRA tokens for every request it serves. These are in-network credits — not crypto, not fiat — redeemable for inference on the swarm. A mystery-shopper bot checks quality. Good llamas get priority routing. Cheaters get slashed.

Small models like Qwen 3.5 0.8B run on a single laptop. Bigger ones like Qwen 3 72B need 8 peers. The default install gets you going with Qwen 3.5 immediately — no beefy GPU required.


How it works

Three steps. Zero configuration.

We eliminated every barrier between you and the swarm. No terminal. No VRAM calculations. No model selection anxiety. You just open the app.

01
Open the app — you’re already in

Double-click. That’s the hard part. The app auto-detects your hardware and immediately joins the base swarm running Qwen 3.5 0.8B — just 2 GB of RAM. Green light. You’re contributing to the global network. You didn’t need to know what a “parameter” is.

02
Get auto-promoted

Got a beefy Mac or GPU? The app notices. It gently nudges you: “Your hardware can power the Frontier Swarm (27B). Upgrade your contribution and earn 4× rewards.” One click. The network just got smarter because you showed up.

03
Earn while you idle

Your laptop serves AI tokens in the background. You earn HYDRA credits for every request. The swarm handles routing, verification, and privacy. You just leave the app open. Nobody needed a data centre.


What you get

Features, listed professionally

We also have a proper features page in the docs, but here’s the version where we’re allowed to be slightly smug.

Drop-in OpenAI & Ollama API

Change one URL. Your existing code works. /v1/chat/completions with SSE streaming, plus Ollama-compatible /api/chat for Open WebUI and Continue.dev.

🧠
KV cache compaction

4-phase Attention Matching keeps long conversations alive without nuking your VRAM. Based on arXiv:2602.16284 — we read the papers so you don’t have to.

🔗
Dual-stack DHT routing

HTTP DHT + Hivemind Kademlia across three continents. Auto-join on startup. If one bootstrap goes down, the llamas find another way. No single point of failure.

🖥
Desktop app + Local Mode

Tauri v2 app for macOS and Windows. Runs offline at 75 tok/s in Local Mode (your own private LLM), or flip the toggle to join the global swarm. One-click switch, zero restart.

🛡
Onion routing & encryption

Ed25519 identity, X25519 ECDH + AES-256-GCM per hop, concentric onion routing, and differential privacy noise. No peer sees your full query. Overhead: 0.02%.

🌎
Python & TypeScript SDKs Coming Soon

Zero-dependency Python client. Browser-native TypeScript SDK. The internal SDK scaffolding exists — public release and docs are coming in v1.1.


Standing on the shoulders of giants

We didn’t invent this. We just added llamas.

OpenHydra builds directly on two brilliant ideas. We want to be upfront about our inspirations, because intellectual honesty is cool (and mandatory if you don’t want to get ratio’d on HackerNews).

Academic inspiration
🌸 Petals

“Run large language models at home, BitTorrent‑style.” Petals proved that volunteer compute can serve real LLM inference across the internet. We took that idea and bolted on a token economy, a desktop app, and a very strong llama motif.

petals.dev →
Protocol inspiration
🌊 BitTorrent

Since 2001, BitTorrent has proved you can distribute enormous files to billions of people without a central server. If it works for a band’s entire discography, it can work for Qwen 3.5 tokens. Same energy.

bittorrent.com →

🦎 Fun llama fact #1: Real llamas are pack animals because they share the load across the herd. The weakest llama doesn’t carry the whole tent. This is also the core architectural principle of OpenHydra.

🦎 Fun llama fact #2: A group of llamas is called a herd. OpenHydra’s network of peers is also called a herd. We are very consistent in our metaphors and proud of this.

🦎 Fun llama fact #3: The Hydra in Greek mythology had multiple heads — cut one off and two grow back. Our bootstrap nodes work the same way. (Please don’t cut our bootstrap nodes.)

🦎 Fun llama fact #4: Llamas can spit up to 10 feet when stressed. Our nodes politely return HTTP 503 instead. Both are valid responses to being overwhelmed.


What runs on it

It’s Qwen all the way down (mostly)

The default is Qwen 3.5 0.8B — tiny enough for any laptop. Larger models shard automatically across multiple peers. NF4 quantisation cuts VRAM by 4x. Add any HuggingFace model by editing models.catalog.json.

Default · 1 peer · 2 GB
Qwen 3.5 0.8B
Runs on a potato. The default.
Compact · 1 peer · 5 GB
Qwen 3.5 2B
Strong multilingual. Single peer.
Mid-range · 1 peer · 9 GB
Qwen 3.5 4B
Reasoning on a single peer.
Advanced · 2 peers · 18 GB
Qwen 3.5 9B
High-quality reasoning. int8 quantised.
Frontier · 4 peers · 16 GB/peer
Qwen 3.5 27B
int4 quantised. Bring your friends.

5 models in the default catalog. Add any HuggingFace model to models.catalog.json. If the requested model lacks peers, the coordinator gracefully degrades to the nearest available smaller model.


Measured, not promised

Benchmarks from real hardware

Push mode + KV-aware caching on real peers in the wild. Cross-ISP means a Mac on home broadband talking to a GPU on AWS through libp2p Circuit Relay — no VPN, no port forwarding, no SSH tunnel.

LAN · 2 × MacBook Air M1
6.9 TPS
Qwen 3.5 2B sharded across two M1 8GB Macs (MLX 8-bit), peer‑to‑peer push mode on local network.
VPC · 2 × NVIDIA T4
9.8 TPS
Qwen 3.5 2B sharded across two T4 GPUs (PyTorch CUDA), auto‑discovered via Kademlia DHT.
VPC · 2 × NVIDIA T4 · 9B model
7.3 TPS
Qwen 3.5 9B sharded across two T4 GPUs. Bigger model, same pipeline, barely breaks a sweat.
🌎 Cross‑ISP · Mac MLX ↔ T4 CUDA
1.09 TPS
Mac behind home NAT + T4 on AWS, sharded via Circuit Relay v2 + fire‑and‑forget push ring. 128‑token run from a clean GitHub install. No tunnel. Just llamas on different continents.

All numbers measured end‑to‑end with real tokens, real prompts, fresh install from GitHub. Full methodology →

Your laptop is a supercomputer waiting to happen.

It’s sitting there doing nothing useful right now. One click and it joins a global swarm of volunteers running frontier AI models together. No cloud. No credit card. Just llamas.