🌸 21st Century Counterculture
OpenHydra splits big language models across volunteer laptops — BitTorrent-style. Your machine serves a slice of Qwen 3.5. Nobody needs a credit card. No cloud required.
Needs Python 3.11+ and the Rust toolchain (rustup) to build the P2P wheel.
macOS: also pip install -r requirements-mlx.txt.
Windows: use WSL2 and follow the Linux path.
Run with python3 -m coordinator.node --peer-id my-node --p2p-enabled.
Full install guide →
Did you buy a stack of M4 Mac Minis to run local models? Welcome home. OpenHydra’s architecture is explicitly designed to pool Apple Silicon’s Unified Memory across the internet. Leave your Mac running in the background, seed the swarm, and let your hardware earn HYDRA credits while you sleep.
The five-year-old version
Glad you asked. It’s honestly not that complicated once you stop calling it “AI infrastructure”.
A 70-billion parameter model weighs ~140 GB. That’s not fitting on your laptop. But split across 8 laptops? Now we’re talking.
Like BitTorrent, everyone in the herd serves a shard. Your laptop handles one piece of the inference. Together the whole model runs.
Your node earns barter credits and HYDRA tokens for every request it serves. These are in-network credits — not crypto, not fiat — redeemable for inference on the swarm. A mystery-shopper bot checks quality. Good llamas get priority routing. Cheaters get slashed.
Small models like Qwen 3.5 0.8B run on a single laptop. Bigger ones like Qwen 3 72B need 8 peers. The default install gets you going with Qwen 3.5 immediately — no beefy GPU required.
How it works
We eliminated every barrier between you and the swarm. No terminal. No VRAM calculations. No model selection anxiety. You just open the app.
Double-click. That’s the hard part. The app auto-detects your hardware and immediately joins the base swarm running Qwen 3.5 0.8B — just 2 GB of RAM. Green light. You’re contributing to the global network. You didn’t need to know what a “parameter” is.
Got a beefy Mac or GPU? The app notices. It gently nudges you: “Your hardware can power the Frontier Swarm (27B). Upgrade your contribution and earn 4× rewards.” One click. The network just got smarter because you showed up.
Your laptop serves AI tokens in the background. You earn HYDRA credits for every request. The swarm handles routing, verification, and privacy. You just leave the app open. Nobody needed a data centre.
What you get
We also have a proper features page in the docs, but here’s the version where we’re allowed to be slightly smug.
Change one URL. Your existing code works. /v1/chat/completions with SSE streaming, plus Ollama-compatible /api/chat for Open WebUI and Continue.dev.
4-phase Attention Matching keeps long conversations alive without nuking your VRAM. Based on arXiv:2602.16284 — we read the papers so you don’t have to.
HTTP DHT + Hivemind Kademlia across three continents. Auto-join on startup. If one bootstrap goes down, the llamas find another way. No single point of failure.
Tauri v2 app for macOS and Windows. Runs offline at 75 tok/s in Local Mode (your own private LLM), or flip the toggle to join the global swarm. One-click switch, zero restart.
Ed25519 identity, X25519 ECDH + AES-256-GCM per hop, concentric onion routing, and differential privacy noise. No peer sees your full query. Overhead: 0.02%.
Zero-dependency Python client. Browser-native TypeScript SDK. The internal SDK scaffolding exists — public release and docs are coming in v1.1.
Standing on the shoulders of giants
OpenHydra builds directly on two brilliant ideas. We want to be upfront about our inspirations, because intellectual honesty is cool (and mandatory if you don’t want to get ratio’d on HackerNews).
“Run large language models at home, BitTorrent‑style.” Petals proved that volunteer compute can serve real LLM inference across the internet. We took that idea and bolted on a token economy, a desktop app, and a very strong llama motif.
petals.dev →Since 2001, BitTorrent has proved you can distribute enormous files to billions of people without a central server. If it works for a band’s entire discography, it can work for Qwen 3.5 tokens. Same energy.
bittorrent.com →🦎 Fun llama fact #1: Real llamas are pack animals because they share the load across the herd. The weakest llama doesn’t carry the whole tent. This is also the core architectural principle of OpenHydra.
🦎 Fun llama fact #2: A group of llamas is called a herd. OpenHydra’s network of peers is also called a herd. We are very consistent in our metaphors and proud of this.
🦎 Fun llama fact #3: The Hydra in Greek mythology had multiple heads — cut one off and two grow back. Our bootstrap nodes work the same way. (Please don’t cut our bootstrap nodes.)
🦎 Fun llama fact #4: Llamas can spit up to 10 feet when stressed. Our nodes politely return HTTP 503 instead. Both are valid responses to being overwhelmed.
What runs on it
The default is Qwen 3.5 0.8B — tiny enough for any laptop. Larger models shard automatically across multiple peers. NF4 quantisation cuts VRAM by 4x. Add any HuggingFace model by editing models.catalog.json.
5 models in the default catalog. Add any HuggingFace model to models.catalog.json.
If the requested model lacks peers, the coordinator gracefully degrades to the nearest available smaller model.
Measured, not promised
Push mode + KV-aware caching on real peers in the wild. Cross-ISP means a Mac on home broadband talking to a GPU on AWS through libp2p Circuit Relay — no VPN, no port forwarding, no SSH tunnel.
All numbers measured end‑to‑end with real tokens, real prompts, fresh install from GitHub. Full methodology →
It’s sitting there doing nothing useful right now. One click and it joins a global swarm of volunteers running frontier AI models together. No cloud. No credit card. Just llamas.