🌸 21st Century Counterculture
Plug in whatever you already run — Ollama, vLLM, LM Studio, llama.cpp, Exo — and join a global network that runs any model on Hugging Face. Serve inference to earn priority when you consume. No credit card. No crypto. No cloud required.
PyPI wheels coming soon; build from source for now. Needs Rust + a C toolchain — see the Quick Start for per-platform steps and Troubleshooting.
OHV2 binary ring mode: 17 tok/s across two T4 GPUs on a LAN, 12–13 tok/s over a cross-ISP relay, and 8 tok/s on two M1 MacBook Airs.
Did you buy a stack of M4 Mac Minis to run local models? Welcome home. OpenHydra’s architecture is explicitly designed to pool Apple Silicon’s Unified Memory across the internet. Leave your Mac running in the background and seed the swarm while you sleep.
Why OpenHydra
A peer-to-peer protocol for free, distributed AI inference — BitTorrent, for AI. Plug in what you have, use what the herd has.
Ollama, vLLM, LM Studio, llama.cpp, Exo — any engine, any OS, any GPU. A thin adapter joins your stack to the network. Keep the tools you know; reach the whole herd.
Requests route to whoever already has the model loaded — it runs there at full local speed. Models too big for any one node shard across peers, BitTorrent-style.
Give-to-get: serve inference to earn priority when you consume. No tokens, no crypto, no $20/month. Contribute when you’re idle, draw on the herd when you need it.
The five-year-old version
Glad you asked. It’s honestly not that complicated once you stop calling it “AI infrastructure”.
A 70-billion parameter model weighs ~140 GB. That’s not fitting on your laptop. But split across 8 laptops? Now we’re talking.
Like BitTorrent, everyone in the herd serves a shard. Your laptop handles one piece of the inference. Together the whole model runs.
Responses are spot-checked and cross-verified across peers. Good llamas build reputation and get routed more work; cheaters get downranked until the swarm stops trusting them.
Small models like Qwen 3.5 0.8B run on a single laptop. Bigger ones like Qwen 3 72B need 8 peers. The default install gets you going with Qwen 3.5 immediately — no beefy GPU required.
How it works
We eliminated every barrier between you and the swarm. No VRAM calculations. No model selection anxiety. Clone, install, and you’re in — a one-click desktop app is on the way.
Clone, install, run. That’s the hard part. OpenHydra auto-detects your hardware and immediately joins the base swarm running Qwen 3.5 0.8B — just 2 GB of RAM. You’re contributing to the global network. You didn’t need to know what a “parameter” is.
Got a beefy Mac or GPU? OpenHydra notices and nudges you: “Your hardware can power the Frontier Swarm (27B).” Bump up your contribution. The network just got smarter because you showed up.
Your laptop serves AI tokens in the background. The swarm handles routing, verification, and privacy. You just leave it running. Nobody needed a data center.
What you get
We also have a proper features page in the docs, but here’s the version where we’re allowed to be slightly smug.
Change one URL. Your existing code works. /v1/chat/completions with SSE streaming, plus Ollama-compatible /api/chat for Open WebUI and Continue.dev.
4-phase Attention Matching keeps long conversations alive without nuking your VRAM. Based on arXiv:2602.16284 — we read the papers so you don’t have to.
A Rust libp2p stack: Kademlia DHT discovery across three continents, QUIC + TCP, DCUtR hole-punching, and Circuit Relay v2 fallback. No central broker. If one bootstrap goes down, the llamas find another way.
A Tauri v2 desktop app for macOS and Windows is on the way: run offline in Local Mode (your own private LLM), or flip a toggle to join the global swarm — one-click switch, zero restart. For now, build from source.
Ed25519 peer identities; every connection encrypted in transit via libp2p Noise (and QUIC). Optional per-hop AES-256-GCM adds a second layer. For full prompt privacy, run LAN-only or in sharded mode, where no single peer sees your whole prompt.
Zero-dependency Python client. Browser-native TypeScript SDK. The internal SDK scaffolding exists — public release and docs are coming in v1.1.
Standing on the shoulders of giants
OpenHydra builds directly on two brilliant ideas. We want to be upfront about our inspirations, because intellectual honesty is cool (and mandatory if you don’t want to get ratio’d on HackerNews).
“Run large language models at home, BitTorrent‑style.” Petals proved that volunteer compute can serve real LLM inference across the internet. We took that idea and bolted on a desktop app and a very strong llama motif.
petals.dev →Since 2001, BitTorrent has proved you can distribute enormous files to billions of people without a central server. If it works for a band’s entire discography, it can work for Qwen 3.5 tokens. Same energy.
bittorrent.com →🦎 Fun llama fact #1: Real llamas are pack animals because they share the load across the herd. The weakest llama doesn’t carry the whole tent. This is also the core architectural principle of OpenHydra.
🦎 Fun llama fact #2: A group of llamas is called a herd. OpenHydra’s network of peers is also called a herd. We are very consistent in our metaphors and proud of this.
🦎 Fun llama fact #3: The Hydra in Greek mythology had multiple heads — cut one off and two grow back. Our bootstrap nodes work the same way. (Please don’t cut our bootstrap nodes.)
🦎 Fun llama fact #4: Llamas can spit up to 10 feet when stressed. Our nodes politely return HTTP 503 instead. Both are valid responses to being overwhelmed.
What runs on it
The default is Qwen 3.5 0.8B — tiny enough for any laptop. Larger models shard automatically across multiple peers. NF4 quantisation cuts VRAM by 4x. Add any HuggingFace model by editing models.catalog.json.
21 models in the default catalog (Qwen 2.5/3/3.5, Gemma 3/4, SmolLM2, TinyLLaMA — base + instruct variants). Add any HuggingFace model to models.catalog.json.
If the requested model lacks peers, the coordinator gracefully degrades to the nearest available smaller model.
It’s sitting there doing nothing useful right now. One command and it joins a global swarm of volunteers running large AI models together. No cloud. No credit card. No crypto. Just llamas.