Build a real-time streaming AI chatbot with zero streaming infrastructure - async + webhooks + failover

2026-02-03 · Dev.to

Have you ever tried building a production-ready AI chatbot that streams responses token-by-token, handles failover across providers, enforces structured JSON outputs, and lets you inject custom logic (like metadata tracking or approval gates) — all without managing WebSocket servers, polling, timeouts, or connection state?

Most vanilla setups (OpenAI/Anthropic streaming) force you into complex infra. But what if a lightweight gateway handled all that?

Enter this full-stack example using ModelRiver (an AI gateway I'm building). It demonstrates a clean pattern for true end-to-end streaming with async requests, event-driven webhooks, automatic failover, and easy local dev — no ngrok needed.

In ~30-45 minutes, you can recreate this: React frontend → Node.js backend → ModelRiver → real-time WebSocket back to browser.

(Disclosure: I work on ModelRiver. This is a genuine technical demo for feedback on production LLM patterns.)

Why This Pattern Matters in 2026

Modern AI apps need:

Instant, human-like streaming UX
Reliability (failover if a provider flakes)
Structured, type-safe outputs (e.g., sentiment + action items)
Business logic gates (validation, enrichment, custom IDs for DB)
Zero heavy infra (no persistent WebSockets on your side)

This example solves all that with async + webhook callbacks + lightweight client SDK.

Architecture at a Glance

User (React) → Node.js Backend → ModelRiver Async API ↓ AI Processing (background, failover) ↓ Webhook to Backend (enrich/inject) ↓ Callback to ModelRiver ↓ WebSocket Stream → Frontend (real-time)

Key magic: ModelRiver processes async, hits your webhook before final delivery → you enrich →

Read full story at source