Build a real-time streaming AI chatbot with zero streaming infrastructure - async + webhooks + failover
· Dev.to
Have you ever tried building a production-ready AI chatbot that streams responses token-by-token, handles failover across providers, enforces structured JSON outputs, and lets you inject custom logic (like metadata tracking or approval gates) — all without managing WebSocket servers, polling, timeouts, or connection state?
Most vanilla setups (OpenAI/Anthropic streaming) force you into complex infra. But what if a lightweight gateway handled all that?
Enter this full-stack example using ModelRiver (an AI gateway I'm building). It demonstrates a clean pattern for true end-to-end streaming with async requests, event-driven webhooks, automatic failover, and easy local dev — no ngrok needed.
In ~30-45 minutes, you can recreate this: React frontend → Node.js backend → ModelRiver → real-time WebSocket back to browser.
(Disclosure: I work on ModelRiver. This is a genuine technical demo for feedback on production LLM patterns.)
Why This Pattern Matters in 2026
Modern AI apps need:
- Instant, human-like streaming UX
- Reliability (failover if a provider flakes)
- Structured, type-safe outputs (e.g., sentiment + action items)
- Business logic gates (validation, enrichment, custom IDs for DB)
- Zero heavy infra (no persistent WebSockets on your side)
This example solves all that with async + webhook callbacks + lightweight client SDK.
Architecture at a Glance
User (React) → Node.js Backend → ModelRiver Async API ↓ AI Processing (background, failover) ↓ Webhook to Backend (enrich/inject) ↓ Callback to ModelRiver ↓ WebSocket Stream → Frontend (real-time)Key magic: ModelRiver processes async, hits your webhook before final delivery → you enrich →