In 1996, Netscape Navigator 2.0 shipped a feature called server push that let a server keep an HTTP connection open and trickle data to the browser. It worked for about thirty seconds at a time before something — the firewall, the proxy, the browser — gave up. The web's first attempt at real-time was wonderful and broken in equal measure.
Fifteen years later, in December 2011, the IETF published RFC 6455 — the WebSocket protocol. Suddenly every browser could open a persistent bidirectional connection to a server. Real-time on the web stopped being a hack and became infrastructure.
This is the story of how the web got real-time, the substrate every collaborative editor and AI agent rides on today, and where Taskade fits in the stack.
TL;DR: The web wasn't designed for real-time — HTTP is request/response and short-lived. WebSockets (RFC 6455, December 2011) fixed that by adding bidirectional persistent connections. SSE (Server-Sent Events) added one-way streams for things like AI token output. HTTP/3 + QUIC + WebTransport are the 2020s next-gen substrate. Every real-time app you use — Slack, Figma, Google Docs, ChatGPT streaming, Linear, Taskade — rides on this layer. Try Taskade's WebSocket-powered real-time workspace →
🗺️ Real-Time Web Protocols Timeline
Thirty-five years from "request a page" to "AI agents streaming tokens into a workspace where humans type alongside them in real time."
🌐 The Pre-WebSocket Era: HTTP Wasn't Built for This
HTTP was designed by Tim Berners-Lee at CERN in 1989 for retrieving documents. The original HTTP/0.9 was almost comically simple:
Client: GET /index.htmlServer: <html>Hello world</html>
[connection closes]
That model worked beautifully for fetching documents. It worked terribly for anything that needed bidirectional or push-based communication. By the mid-1990s, developers were already hacking around HTTP's limitations to get real-time-ish behavior.
Netscape Server Push (1996)
Netscape Navigator 2.0 (1996) included an early attempt at server push using a content-type called multipart/x-mixed-replace. The server kept a single HTTP connection open and sent multiple "parts" — each part replacing the previous one in the browser. It was used for the first webcams and live image refreshes.
The problem: corporate proxies and firewalls hated long-running connections. The connection broke after a minute or two and the user had to refresh. It was real-time-ish, not real-time.
HTTP Polling
The pragmatic alternative: have the client repeatedly ask the server "anything new?" every few seconds.
Client (every 5s): GET /api/messages?since=42
Server: [most of the time: nothing new]
Server (rarely): {"message": "hi!"}
This worked, technically. It also wasted enormous server resources — every connected client was firing an HTTP request every few seconds whether anything had changed or not. Twitter's pre-2008 architecture famously buckled under the load of polling clients.
HTTP Long-Polling — The Comet Era (2000s)
The third generation of pre-WebSocket real-time was long-polling, collectively called Comet (a term coined by Alex Russell in 2006 as a play on Ajax — "Ajax with persistent connections"). The pattern:
Client: GET /api/messages?since=42
Server: [holds the connection open until something to send]
... 30 seconds pass ...
Server: {"message": "hi!"}
[connection closes; client immediately reopens]
Long-polling reduced wasted requests but required the server to handle thousands of in-flight HTTP requests sitting idle. Production Comet deployments at Meebo, Gmail Chat, and Facebook Chat in the late 2000s pushed the limits of server architecture; each platform built custom event-driven HTTP servers to scale.
The Comet era taught the industry that real-time on top of HTTP was achievable but never elegant. A clean replacement was needed.
📡 December 2011: RFC 6455 — The Real-Time Web
In 2008, Ian Hickson (at Google, editor of HTML5) and Michael Carter proposed the WebSocket protocol as part of the HTML5 specification work — originally called TCPConnection before being renamed in a #whatwg IRC discussion. The pitch was clean: replace HTTP's request/response model with a persistent bidirectional connection that uses HTTP only for the initial handshake. The IETF took over standardization in February 2010, with Ian Fette (Google) as the final editor.
WebSocket handshake (over HTTP):Client → Server:
GET /ws HTTP/1.1
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Sec-WebSocket-Version: 13
Server → Client:
HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=
[After handshake: bidirectional frames flow both directions
until either side closes the connection]
The Handshake, Header by Header
| Header | Purpose | Example value |
|---|---|---|
Upgrade: websocket |
Protocol switch request | — |
Connection: Upgrade |
Connection hop semantics | — |
Sec-WebSocket-Key |
Random 16-byte base64 nonce | dGhlIHNhbXBsZSBub25jZQ== |
Sec-WebSocket-Version |
Protocol version (always 13 post-RFC 6455) | 13 |
Sec-WebSocket-Accept (response) |
base64(SHA-1(key + GUID)) where GUID = 258EAFA5-E914-47DA-95CA-C5AB0DC85B11 |
s3pPLMBiTxaQ9kYGzzhZRbK+xOo= |
The magic GUID 258EAFA5-E914-47DA-95CA-C5AB0DC85B11 is hard-coded into RFC 6455 and is what makes the WebSocket handshake non-spoofable from a naive HTTP client.
Browser Support Timeline
The protocol was formally standardized as RFC 6455 in December 2011 by the IETF. Browser support shipped in:
- Chrome 4 (December 2009) — first browser with a prototype WebSocket implementation
- Chrome 14 (September 2011) — shipped the final RFC 6455 version
- Firefox 7 (September 2011)
- Safari 6 (July 2012)
- Internet Explorer 10 (October 2012)
By 2013, WebSockets were a mainstream browser feature. The real-time web stopped being a hack.
What WebSockets Enabled
The browser-as-real-time-client pattern lit up entire categories of applications:
| Category | Pre-WebSocket | With WebSockets |
|---|---|---|
| Chat / messaging | Long-polling hacks (Meebo, Gmail Chat) | Slack, Discord, WhatsApp Web |
| Collaborative editing | Polling-based (slow, ugly) | Google Docs concurrent editing, Etherpad, Taskade Projects |
| Live cursors / multiplayer UI | Not really feasible | Figma, Linear, Notion |
| Stock tickers / financial data | Pure Flash plugins | Bloomberg Terminal web, Robinhood |
| Real-time analytics | Polling dashboards | Datadog live tail, Honeycomb traces |
| Multiplayer games | WebSocket emulation via Flash | agar.io, browser-native games |
| Live customer support chat | Periodic refresh | Intercom, Drift, LiveChat |
| AI agent streaming (later) | n/a | ChatGPT, Claude.ai, Taskade EVE responses |
The story of real-time web from 2012 to 2026 is the story of these categories getting better as WebSockets matured.
📥 Server-Sent Events (SSE): The One-Way Cousin
In parallel with WebSockets, the HTML5 spec also standardized Server-Sent Events (SSE) — a simpler protocol for one-way persistent streams from server to client over standard HTTP.
SSE protocol:Client → Server:
GET /api/stream
Accept: text/event-stream
Server → Client (stays open):
HTTP/1.1 200 OK
Content-Type: text/event-stream
data: {"message": "hello"}
data: {"message": "world"}
data: {"message": "..."}
SSE is simpler than WebSockets:
| Feature | WebSockets | SSE |
|---|---|---|
| Direction | Bidirectional | Server → Client only |
| Protocol | Custom upgrade from HTTP | Pure HTTP |
| Auto-reconnect | Manual | Built in |
| Browser compatibility | Universal since 2013 | Universal except old IE |
| Binary support | Yes (binary frames) | Text only |
| Best for | Chat, collaboration, games | Notifications, AI streaming, dashboards |
SSE found its killer use case in 2023-2024: streaming AI tokens from LLMs. When you watch ChatGPT or Claude generate a response word-by-word, you are watching an SSE stream. OpenAI, Anthropic, and Google all expose their streaming APIs over SSE because it's the perfect fit — one-way, text-only, built-in reconnect, simple to implement.
📹 WebRTC: The P2P Sibling
Parallel to WebSockets, a different real-time protocol was standardized starting around 2011: WebRTC (Web Real-Time Communication). Where WebSockets connect a browser to a server, WebRTC connects two browsers directly to each other.
WebRTC enables:
- Browser-native video and voice calling (Google Meet, Discord video, FaceTime web, Whereby)
- Screen sharing (Zoom in some configurations, Loom)
- Peer-to-peer file transfer (Snapdrop, WebTorrent)
- Real-time multiplayer game data (some indie games)
WebRTC and WebSockets coexist — most real-time apps use WebSockets for control messages (presence, signaling, app state) and WebRTC for the actual media stream (audio, video, screen). Taskade uses WebSockets for collaborative project state; embedded video calling in Taskade workspaces uses WebRTC.
📊 Performance & Latency Across Real-Time Protocols
The four major real-time protocols compare cleanly on five dimensions:
| Protocol | Time-to-first-byte | Per-message latency | Max concurrent (typical) | Direction |
|---|---|---|---|---|
| HTTP polling (5s) | 0-5s | ~2.5s average | ~1K / server | Pull only |
| Long-polling (Comet) | 50-200ms | 50-200ms | ~10K / server | Pull-held |
| SSE | 50-150ms | 5-10ms | ~50K / server | Server → Client only |
| WebSocket | 50-150ms | 1-3ms | 5M+ / server (Discord) | Bidirectional |
| WebTransport (HTTP/3) | 0-RTT possible | <1ms (datagrams) | Emerging | Bidirectional + datagrams |
Discord's WebSocket Scale (the Reference Benchmark)
Discord publishes its real-time engineering metrics in detail. As of recent engineering posts:
- 12M+ concurrent users
- 2.6M concurrent voice users (at 220 Gbps / 120 Mpps)
- 26M WebSocket events/sec delivered to clients
- 40% WebSocket traffic reduction via Zstandard compression + Passive Sessions V2
- Built on Elixir / BEAM (the Erlang VM), not a custom Erlang stack
Discord is the upper-bound case of what WebSockets can do at scale. Most production applications run comfortably at 1K-100K concurrent connections per server with off-the-shelf infrastructure.
🚀 HTTP/2 and HTTP/3: The Next-Gen Substrate
While WebSockets matured, HTTP itself was getting major upgrades:
HTTP/2 (RFC 7540, May 2015)
- Multiplexed streams — many requests in parallel on one TCP connection
- Binary framing — replaces HTTP/1.1's text parsing
- Header compression (HPACK) — drops the overhead of repeated headers
- Server push — server can preemptively send resources to the client
WebSockets technically work over HTTP/2 (via RFC 8441, "Bootstrapping WebSockets with HTTP/2", September 2018), but support is uneven across browsers and servers. Most production WebSocket deployments in 2026 still negotiate the original HTTP/1.1 upgrade path.
HTTP/3 (RFC 9114, June 2022)
- Runs on QUIC — a new transport protocol over UDP (not TCP)
- Faster connection setup — 0-RTT or 1-RTT vs. TCP's 2-3 round trips
- No head-of-line blocking — each stream is independent
- Connection migration — survives changing IP addresses (WiFi → cellular handoff)
- Built-in TLS 1.3 — always encrypted
WebSocket support over HTTP/3 is standardized as RFC 9220 (June 2022). The practical advantage: much faster initial connection setup, especially on mobile, and smoother behavior under packet loss.
By 2026, HTTP/3 is available at all major CDNs (Cloudflare, Fastly, Cloudflare Workers) and Google services. Adoption is still maturing — Cloudflare Radar shows HTTP/3 at ~21% of web traffic as of April 2026, down from a peak of ~28% in May 2023. The plateau is partly due to research findings that QUIC can lose up to ~45% throughput vs HTTP/2 at very high bandwidths (>500 Mbps on fiber). WebSocket-over-HTTP/3 is rolling out gradually in tandem.
WebTransport: The Next API
WebTransport is a newer API standardized in 2023+ that runs natively on HTTP/3/QUIC. It provides bidirectional streams with optional reliability — datagrams can be sent without delivery guarantees (perfect for game state, where stale data is worse than missing data), or as reliable streams for important messages.
WebTransport is the emerging successor for some WebSocket use cases — game engines, low-latency AI agent video streams, high-frequency telemetry — where WebSockets' TCP-based reliability is sometimes too strong a guarantee.
🏭 The Real-Time Infrastructure Economy
WebSockets standardized in 2011 created the demand. A whole infrastructure economy emerged to serve it:
| Year | Product | What it provides |
|---|---|---|
| 2010 | Pusher | Hosted Pub/Sub WebSocket service |
| 2011 | Socket.IO | JavaScript library that auto-falls-back from WebSockets to long-polling |
| 2014 | PubNub | Real-time messaging-as-a-service |
| 2015 | Ably | Real-time messaging-as-a-service for enterprises |
| 2017 | Cloudflare Workers WebSockets | Real-time at the edge |
| 2019 | Soketi | Open-source Pusher-protocol-compatible server |
| 2020 | Liveblocks | Real-time collaboration sync engine |
| 2021 | PartyKit | Sync engine on Cloudflare Workers |
| 2022 | Replicache | CRDT-based sync engine |
| 2023+ | AI agent streaming infra | LiteLLM gateways, OpenAI/Anthropic streaming SDKs |
These platforms turned "set up a WebSocket server" from a custom-engineering project into an API call. Many modern real-time apps (Linear early, Notion early, many indie tools) outsource sync to one of these.
🤖 Real-Time and AI Agents
The AI-agent era of 2024-2026 multiplied real-time web demand by a new factor: agents stream output.
When you watch an AI agent compose a response in ChatGPT, Claude, Taskade EVE, or any modern AI tool, you're seeing:
- A user prompt sent over WebSocket or HTTPS POST
- The server invokes the LLM
- The LLM generates tokens
- The server streams those tokens to the client over SSE as they're produced
- The client renders them character-by-character
This pattern is now ubiquitous. Every AI tool uses some flavor of it. The streaming UX — watching the AI "think out loud" — is one of the most consequential UX shifts of the 2020s, and it would be impossible without the real-time web substrate.
For collaborative AI — humans and agents editing the same document together — SSE isn't enough. You need bidirectional, which means WebSockets. Taskade Projects use WebSockets specifically because EVE the meta-agent and Custom AI Agents need to both send edits to the document AND receive edits from humans simultaneously.
🧬 Taskade: The Real-Time Workspace Built on This Substrate

Genesis Capability Map — From the May 2026 Newsletters
| Newsletter chapter | What it ships |
|---|---|
| Workspace Memory · Mind Graph | Workspace-scoped knowledge graph |
| Agent Workflows · Tools Wired | 22+ built-in agent tools + 100+ bidirectional integrations |
| App Payments · Stripe Live | Native Stripe Checkout actions inside Genesis Apps |
| Frontier Models · Auto-Routed | Frontier models from OpenAI, Anthropic, Google + open-weight |
| Embed Apps · Anywhere | Genesis Apps embed as responsive widgets |
| Clone Apps · Instantly | 150,000+ apps in the Community Gallery; clone in 60 seconds |
Plus vibe coding · vibe payments · vibe workflows · vibe marketing · vibe tracking and MCP both sides (Taskade-as-Server + Taskade-as-Client). 198 platform releases in 2026.
Taskade's real-time architecture rides on the entire stack we've walked through:
What rides on this substrate:
- Real-time multi-cursor editing — humans typing into the same Project see each other's keystrokes in milliseconds
- AI agent live edits — Custom Agents v2 and EVE the meta-agent write to the same Project stream as humans
- Automation events — the 100+ bidirectional integrations push events into the Project as they arrive (Slack message lands, Stripe payment completes, GitHub PR merges)
- AI agent streaming — when an agent composes a response, you see tokens stream in via SSE
- Mind Graph live updates — Workspace DNA visualization redraws as the graph changes
- Vibe coding live builds — when Genesis is generating your app, you watch it construct in real time
This is what Taskade ships: a workspace where humans, AI agents, automations, and external integrations all converge on one real-time substrate. WebSockets and SSE under the hood, OT engine on top of that, Workspace DNA loop on top of that, Genesis app builder on top of that.
150,000+ apps built since launch. 3M+ automations executed. All on the WebSocket-based real-time substrate that started with one IETF working group standardizing RFC 6455 fifteen years ago.

🥊 Real-Time Backbones Compared: Taskade Genesis vs Discord vs Slack vs Liveblocks
Different products spend their WebSocket budget on different problems. Five-column read of where each backbone invests:
| Capability | Taskade Genesis | Discord | Slack | Liveblocks | Google Docs |
|---|---|---|---|---|---|
| Bidirectional WebSocket sync | Yes (ot-json0) |
Yes (Elixir/BEAM) | Yes (custom) | Yes (sync-engine SDK) | Yes (Drive RPC) |
| AI agent streaming on same channel | Yes — humans + AI Agents v2 + EVE on one OT stream | No (separate bots API) | Limited (slash bots) | Bring-your-own | No |
| 100+ bidirectional integrations | Yes — triggers pull events in, actions push data out | Webhook-only | App directory | n/a | n/a |
| Live app generation over the wire | Yes (Genesis vibe coding) | No | No | No | No |
| Built-in OT/CRDT engine | OT (ot-json0, Wave lineage) |
n/a | n/a | OT + CRDT optional | OT |
| Per-document version playback | Yes (project history) | n/a | Yes (channels) | Yes | Yes |
| Free tier | Yes — Free $0, Starter $6/mo | Free | Free | Developer SDK | Free with Google account |
Taskade is the only row that ships humans + AI agents + automations all editing the same document over one real-time substrate — the WebSocket layer carries Operational Transform changesets from a typist, an AI Agent v2 tool-call, and a Stripe-trigger automation alike.
🔮 What's Next
The 2026-2030 real-time web roadmap:
| Trend | Effect |
|---|---|
| HTTP/3 + QUIC default | Faster initial connections, especially mobile; smoother network handoffs |
| WebTransport adoption | New API for low-latency unreliable streams (game state, video frames) |
| WebCodecs + WebGPU streams | Real-time AI video generation streaming into browsers |
| AI agent collaboration protocols | Standards for agent-to-agent real-time coordination on top of MCP |
| Edge real-time | Sync engines running at the CDN edge, not central servers |
| Local-first hybrids | CRDT state on device + WebSocket sync to server for AI agent collaboration |
The endpoint: a web where every user has multiple AI agents running in real-time alongside them, every workspace is multiplayer by default, and the real-time substrate is invisible infrastructure.
🔗 Further Reading
- History of Real-Time Collaboration: From Engelbart to AI Agents — the application-side history
- OT vs CRDT: The Two Algorithms Behind Every Real-Time App — sync engine deep dive
- History of CRDTs — academic side
- Google Wave Lessons — first big production OT engine
- History of Mermaid.js — developer tooling lineage
- What Is Taskade? Complete History — the real-time workspace
- RFC 6455 — The WebSocket Protocol
- RFC 9114 — HTTP/3
- WebTransport API
❓ Frequently Asked Questions
Are WebSockets still the best choice in 2026?
For most bidirectional real-time use cases on the web: yes. WebSockets are universally supported, well-understood, and have a mature ecosystem. For some workloads — particularly low-latency game state and AI agent video streams — WebTransport (over HTTP/3/QUIC) is becoming preferred. For one-way server-to-client streams, SSE is simpler and equally effective. For peer-to-peer audio/video, WebRTC.
How many WebSocket connections can one server handle?
A modern WebSocket server can typically handle 10,000-100,000 concurrent connections on standard hardware. With purpose-built infrastructure (Discord famously uses Elixir/Erlang for 5M+ concurrent connections per node), the limit goes much higher. The bottleneck is usually memory (each connection has a small per-connection state) and the OS file-descriptor limit.
What's the difference between Socket.IO and raw WebSockets?
Socket.IO is a JavaScript library that auto-falls-back from WebSockets to long-polling and other transports when WebSockets aren't available (some corporate proxies, very old browsers, restrictive networks). It also adds features like automatic reconnection, rooms, namespaces, and acknowledgment-based messages. For new projects in 2026 where you can assume WebSocket support, raw WebSockets are simpler. For maximum compatibility, Socket.IO is still useful.
Do AI agents really need WebSockets?
For streaming output (the user watches the agent compose its response), Server-Sent Events (SSE) is usually sufficient — one-way server-to-client. For collaborative AI (humans and agents editing the same document together, as in Taskade Projects), WebSockets are required for the bidirectional capability. Most modern AI products use both — SSE for chat-style streaming, WebSockets for collaboration.
Where can I see real-time WebSocket-powered AI in action?
Try Taskade Genesis at /create — the free tier lets you build an app from a prompt and watch the build happen in real time over WebSockets, then collaborate with AI agents on it. 100+ bidirectional integrations, native Stripe Checkout, MCP support, 22+ built-in agent tools. 150K+ apps built since launch.





