Warzones — a real-time NFT-battle backend that's run in production for 2+ years
Role: Lead backend developer (sole author, 925 commits / 1,837 all-branch) · Client: 3BUX LLC
Stack: Node.js · Express · MongoDB (Mongoose, 63 models) · Redis (ioredis + Bull + Socket.io adapter + Redlock) · Firebase · Algorand JS SDK · Cloudinary · PM2
Live: warzones.ghettopigeon.com · marketplace at market.ghettopigeon.com
On-chain (Algorand mainnet): MGP "Mutant Ghetto Pigeon" — 666 NFTs, creator UN3E…BWFRWYQ · DNKY "Dapper Dogs" — 900 NFTs, creator DDAO…NBVLDI — ~1,566 enumerable assets (proof-index)
Timeline: Dec 2023 → present (~28 months, still active)
The problem
Warzones (Ghetto Pigeons) is an NFT-battle gaming platform: players own NFTs, enter battles and auctions, soft-stake into a Vault for yield, trade on a marketplace, and earn on-chain rewards. The backend has to do three awkward things at once:
- Be real-time — auctions, bids, and battle state must update instantly for every connected client.
- Settle on-chain — rewards, marketplace sales, and staking payouts are real Algorand transactions that must be reliable and idempotent even when the chain is slow or a node hiccups.
- Stay up for years — this isn't a hackathon demo; it's been continuously operated since 2023, which means the real engineering is in the incidents, not the happy path.
This is the largest system I own: 63 Mongoose models across 26 controllers / ~264 endpoints — auctions, bids, assets, collections, NFT metrics, pools, whitelists, the Vault, admin, IP filtering, rate-limited endpoints, file upload, and logging.
Architecture
The API runs as a PM2 cluster (multiple instances) behind one domain. All on-chain operations happen server-side — clients never sign contract calls — which centralizes custody, lets me batch and retry transactions, and keeps the trust boundary in one place.
The hardest decision: real-time across a multi-instance cluster
A single Node process can hold every websocket in memory and broadcast trivially. The moment you scale to a PM2 cluster for throughput and zero-downtime deploys, that breaks: a bid that lands on instance A has to reach a client connected to instance B.
Decision: put a Redis adapter under Socket.io so every instance shares one pub/sub fabric, and treat Redis as the source of truth for ephemeral real-time state. The tradeoff I accepted: every broadcast now pays a Redis round-trip and the system depends on Redis being healthy — so Redis became a first-class operational concern (it also backs Bull queues, rate-limiting, and caching), not a "nice to have." In exchange I got horizontal scale and the ability to deploy without dropping live auctions.
On-chain settlement goes through Bull queue workers rather than inline request handling. Submitting an Algorand transaction inside an HTTP request couples user-facing latency to chain latency and loses the work if the request dies. Pushing settlement into queues gives me retries, backpressure, and idempotency — the request returns immediately, the worker guarantees the payout eventually lands.
What broke (operating it for 2+ years)
The honest proof that this is a production system, not a prototype, is the incident work. Real things I designed, investigated, and fixed in the live system:
- Marketplace-sales security — investigated and hardened the sale settlement path against exploit conditions.
- Whitelist restoration — built a recovery path to rebuild whitelist state after corruption without downtime.
- Pool auto-reactivation — reward pools that stalled now detect and reactivate themselves instead of needing a manual nudge.
- Orphaned-media cleanup — a Cloudinary cleanup pipeline reclaims storage from NFT images no longer referenced, controlling cost as the collection grew.
- Log-volume reduction — cut operational log noise/cost after profiling what was actually useful for debugging vs. what was just expensive.
Results
- 925 commits (1,837 all-branch), sole author, over ~28 months of continuous production — the longest-running and largest backend I've built.
- 63 data models / 26 controllers / ~264 endpoints spanning auctions, marketplace, staking/Vault, pools, and on-chain rewards.
- Multi-instance real-time via Socket.io + Redis adapter (Redlock bid locks) with PM2 zero-downtime deploys.
- ~1,566 enumerable on-chain assets across two Algorand mainnet collections (MGP 666 + DNKY 900) — independently verifiable (see header / proof-index).
- Still live and serving today (verified 2026-06-16).
What this demonstrates
Owning a stateful, real-time, on-chain system end-to-end for years: horizontal scaling of websockets, queue-based on-chain settlement with idempotency, and — most importantly — the operational maturity to investigate incidents, restore corrupted state, and control cost in a system you can't take offline.