BlogAIWhen AI Agents Join Your…

When AI Agents Join Your Multiplayer Document: The OT Challenge Nobody Talks About (2026)

April 18, 202621 min readStan ChangAI·#engineering #multiplayer #operational-transform

On this page (19)

I have been building multiplayer editing systems for nearly a decade. Our Operational Transform engine has been in production since 2017, syncing cursors and merging edits across millions of documents. It was designed for humans.

Then we added AI agents.

Everything broke.

TL;DR: Real-time collaboration engines (OT/CRDT) were designed for human editors
who type slowly and make small changes. AI agents generate content in bulk
— hundreds of nodes in a single operation. This 1,000x velocity mismatch creates
novel OT challenges: transform overload, cursor displacement, and UX disruption.
Here is how we solved agent-human multiplayer editing at Taskade. Try it free.

Real-time collaboration

🔍 The Problem Nobody Talks About

Most teams that add AI to their document editors treat the AI as a sidecar. The AI generates content in a modal, the user clicks "Insert," and the generated text replaces whatever was there before. It is a paste operation. The AI never touches the collaboration engine.

That is fine if only one person uses the document. But what happens when three people are editing a project at the same time, and an AI agent starts generating content into the same document?

The agent is not pasting. The agent is editing — inserting nodes, restructuring sections, creating subtasks — while humans are typing in the same tree. The agent needs to go through the same collaboration engine that humans use. The same OT engine. The same conflict resolution.

Nobody has written about this problem because almost nobody does it this way. Most AI integrations bypass the collaboration layer entirely. We made a different choice. Our AI agents are first-class participants in the multiplayer session. They use the same edit channel as human collaborators.

That decision created engineering challenges that no textbook covers.

⚡ The 1,000x Velocity Mismatch

Here is the core of the problem, expressed as a single number: 1,000x.

Humans type at roughly 5 characters per second, or about 40 words per minute on a good day. AI agents generate content at approximately 5,000 tokens per second. That is a 1,000x difference in edit velocity.

OT was built for the human side of that equation. When two humans edit the same document, conflicts are rare. One person types in paragraph three while another edits paragraph seven. Their edits barely overlap. The OT transform computation is trivial — a handful of operations per second, each transforming against at most a few concurrent operations.

Now introduce an agent that generates 500 operations in a single burst. Every one of those operations must be transformed against any concurrent human operations. The transform matrix computation grows with the product of operation counts. A few human ops against a few human ops is cheap. A few human ops against 500 agent ops is expensive. Multiple agents plus multiple humans is a combinatorial storm.

┌─────────────────────────────────────────────────────────────────┐
│              VELOCITY MISMATCH: HUMAN vs AI AGENT               │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Human Editor          AI Agent                                 │
│  ─────────────         ─────────────                            │
│  t=0:  "H"             t=0: (processing)                       │
│  t=1:  "He"            t=1: (processing)                       │
│  t=2:  "Hel"           t=2: (processing)                       │
│  t=3:  "Hell"          t=3: (processing)                       │
│  t=4:  "Hello"         t=4: ┌──────────────────────────┐       │
│  t=5:  "Hello "        t=4: │ "Hello World, here is    │       │
│  t=6:  "Hello W"       t=4: │  the complete text with  │       │
│  t=7:  "Hello Wo"      t=4: │  all the content that    │       │
│  ...                    t=4: │  was generated in one    │       │
│  t=20: "Hello World"   t=4: │  single burst..."        │       │
│                         t=4: └──────────────────────────┘       │
│                                                                 │
│  20 operations            1 MASSIVE operation                   │
│  (incremental)            (bulk insertion)                      │
│  ~5 chars/sec             ~5,000 tokens/sec                     │
│                                                                 │
│  Velocity Ratio: 1,000x                                        │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

The table below summarizes how human-human editing differs from agent-human editing across every dimension that matters for OT:

Challenge	Human-Human OT	Agent-Human OT
Edit velocity	~5 chars/sec	~5,000 tokens/sec
Operation size	1-10 chars	50-500 nodes
Edit pattern	Incremental, sequential	Bulk, burst
Cursor behavior	Smooth, predictable	Jumps, displaced
Undo expectation	"Undo my last keystroke"	"Undo the agent's entire contribution"
Conflict frequency	Rare (different locations)	Common (agent touches many areas)
Transform cost	O(n) where n is small	O(n*m) where m is large

Every row in that table represents an assumption that broke when we added agents.

🧬 Why Agents Must Go Through OT

Before diving into solutions, I want to explain why we did not take the easy path. There is a simpler approach: let the agent generate content, then replace the document. No OT required. Several well-known productivity tools do exactly this.

We rejected that approach for four reasons:

Users are editing simultaneously. Replacing the document destroys whatever a human is typing at that moment. In a multiplayer workspace, three people might be working on the same project while an agent generates content. A replace operation would vaporize their in-flight edits.
OT preserves intent. By routing agent edits through OT, the agent's changes and the human's changes merge correctly. The agent inserts a paragraph above while the human edits a paragraph below — both changes survive. This is the whole point of real-time collaboration.
Undo/redo must work for agents too. If agents bypass OT, there is no clean way to undo what an agent did. By using the same system, the same undo/redo primitives apply to agent operations as human operations.
Presence indicators require it. Users can see the agent's cursor in the document. They can see where the agent is editing, which section it is working on, in real-time. This transparency is only possible if the agent is a genuine OT participant.

The design principle we settled on: agents are participants, not administrators. They use the same edit channel as humans.

That principle sounds clean. Implementing it was not.

🔧 Solution 1: Operation Chunking

The first problem we tackled was bulk operations. An AI agent generating a section of content might produce 50, 100, or even 500 OT operations in a single burst. If all 500 operations arrive at the server in one batch, several bad things happen:

The OT transform computation spikes. Every concurrent human operation must be transformed against all 500 agent operations.
Other clients stall while processing the massive changeset.
The human user sees 50 new nodes appear instantaneously — jarring and disorienting.

Our solution: operation chunking. Instead of sending all operations at once, the agent's output is broken into batches of 10-20 operations with brief pauses between batches.

┌──────────────────────────────────────────────────────────────────┐
│                    OPERATION CHUNKING FLOW                        │
├──────────────────────────────────────────────────────────────────┤
│                                                                  │
│  WITHOUT CHUNKING:              WITH CHUNKING:                   │
│  ┌──────────┐                   ┌──────────┐                     │
│  │██████████│ 500 ops           │██        │ Batch 1 (15 ops)    │
│  │██████████│ in ONE            │          │ pause               │
│  │██████████│ burst             │██        │ Batch 2 (15 ops)    │
│  │██████████│                   │          │ pause               │
│  │██████████│                   │██        │ Batch 3 (15 ops)    │
│  └──────────┘                   │          │ pause               │
│       ↓                         │██        │ Batch 4 (15 ops)    │
│  OT transform                   │  ...     │ ...                 │
│  OVERLOAD                       └──────────┘                     │
│  Cursor jumps                        ↓                           │
│  UI freezes                     OT transforms manageable         │
│                                 Humans see incremental updates   │
│                                 Cursor shifts are small           │
│                                                                  │
└──────────────────────────────────────────────────────────────────┘

The effect is dramatic. Without chunking, the document feels "possessed" — content appears out of nowhere, cursors teleport, and the user loses their place. With chunking, the agent feels like a fast collaborator. You can see it typing, section by section, just like watching a colleague work in the same document.

We also apply operation coalescing where possible. If the agent inserts three consecutive nodes, we can sometimes combine those into a single compound operation. Fewer operations means fewer transforms, which means less computational overhead and less cursor disruption.

The chunking strategy we settled on is adaptive:

Scenario	Batch Size	Delay Between Batches
Agent editing alone (no humans present)	Larger batches	Minimal delay
Agent + 1 human editing	Medium batches	Short delay
Agent + multiple humans editing	Smaller batches	Longer delay
Human actively typing in the same section	Pause agent ops	Resume when quiet

The batch size adapts to the level of human activity. When nobody else is in the document, the agent can move fast. When three people are editing, the agent slows down and yields.

🎯 Solution 2: Cursor Anchoring

The second problem was cursor displacement. When an AI agent inserts 50 nodes above the human's cursor, the cursor's absolute position shifts by 50. From the human's perspective, their cursor jumps to a completely different part of the document. In a normal human-human editing session, this barely happens — another person inserts one line, your cursor shifts by one line, you barely notice. An agent inserting 50 nodes is 50x that shift, and it is deeply disorienting.

Our solution: cursor anchoring.

Instead of tracking cursor position as an absolute index ("cursor is at node 47"), we track it relative to nearby content ("cursor is inside the node that contains the text 'quarterly revenue'"). When agent operations shift the document structure, we recompute the absolute position to preserve the relative position. The human's view stays stable.

┌──────────────────────────────────────────────────────────────────┐
│                      CURSOR ANCHORING                            │
├──────────────────────────────────────────────────────────────────┤
│                                                                  │
│  BEFORE agent inserts 50 nodes:                                  │
│  ┌────────────────────────────┐                                  │
│  │ Node 1: Introduction       │                                  │
│  │ Node 2: Background         │                                  │
│  │ Node 3: "quarterly rev..." │ ← Human cursor HERE (abs: 3)    │
│  │ Node 4: Conclusion         │                                  │
│  └────────────────────────────┘                                  │
│                                                                  │
│  Agent inserts 50 nodes between Node 1 and Node 2.              │
│                                                                  │
│  WITHOUT anchoring:             WITH anchoring:                  │
│  ┌──────────────────────┐       ┌──────────────────────┐        │
│  │ Node 1: Introduction │       │ ...                  │        │
│  │ Node 2-51: (agent)   │       │ ...agent content...  │        │
│  │ Node 52: Background  │       │ ...                  │        │
│  │ Node 53: "quarterly" │       │ "quarterly rev..."   │        │
│  │ Node 54: Conclusion  │       │   ↑ cursor STABLE    │        │
│  └──────────────────────┘       └──────────────────────┘        │
│  Cursor now at abs: 3           Cursor recalculated to          │
│  which is AGENT content!        abs: 53 → same content          │
│  User is LOST.                  User sees no disruption.        │
│                                                                  │
└──────────────────────────────────────────────────────────────────┘

The key insight is that absolute position tracking breaks under bulk insertions. Content-relative anchoring survives them. This is not a concept from any OT textbook — we had to invent it because the problem did not exist before agents.

There is a subtlety worth noting. Anchoring is not free. Every time agent operations arrive, the system must walk the document tree to find the anchor node and recompute positions. For a handful of agent ops, this is trivial. For hundreds of agent ops arriving in rapid succession, the recomputation cost adds up. This is another reason chunking matters — it limits how many recomputations happen per interval.

🔄 Solution 3: Priority Queuing

When a human and an agent submit operations at the same time, whose operations get processed first?

In standard OT, operations are processed in the order they arrive at the server. First come, first served. This is fair — but fairness is the wrong goal when one participant generates 1,000x more operations than the other.

We implemented priority queuing: human operations always process before agent operations. If a human is actively typing, the agent's operations queue behind the human's operations, no matter when they arrived.

The trade-off is explicit: agents feel slightly slower, but humans never experience lag. We chose human UX over agent throughput, and we would make the same choice again.

Here is why. A 200-millisecond delay in agent output is invisible to the user — the agent is still generating content faster than anyone can read. But a 200-millisecond stutter in the human's typing experience feels broken. The cursor hitches. Characters appear late. The user thinks the editor is lagging. That is unacceptable in a real-time collaboration tool.

Priority queuing also has a second benefit: it reduces transform complexity. When human operations are processed first, agent operations transform against a known base state rather than a mixed state of interleaved human and agent ops. The transform computation is cleaner and cheaper.

"Multiplayer was built for humans typing at 5 characters per second. AI agents generate at 5,000 tokens per second. That's a 1,000x difference in edit velocity. We chose to protect the human side of that equation." — Taskade Engineering

🔀 Solution 4: The Undo Challenge

Undo is a solved problem for human-human editing. Each user has their own undo stack. "Undo" means "undo MY last change." The semantics are clear.

Agent-human undo is different. When a user says "undo what the agent did," they mean something fundamentally new:

The agent's contribution might be 200 operations.
Those operations might span 15 different sections of the document.
Standard OT undo stacks work per-operation. Undoing 200 operations one at a time is not what the user wants.

We needed per-intent undo, not per-operation undo. The user wants to undo the agent's intent — "generate subtasks for this section" — as a single action, regardless of how many OT operations it took.

Our solution has three layers:

Operation provenance tagging. Every operation is tagged with its source: a human user ID or an agent ID. This is metadata that standard OT does not carry.
Logical unit grouping. Agent operations that belong to the same intent are grouped into a logical unit. "Create five subtasks" is one logical unit even though it generates 15 OT operations (five node insertions, five content sets, five formatting operations).
Batch undo. "Undo agent changes" reverses the entire logical unit. The system computes the inverse of each operation in the unit, applies them in reverse order, and the document returns to its pre-agent state.

This is conceptually simple but operationally tricky. The inverse of an insert is a delete — but what if a human edited the agent's inserted content between the insert and the undo? The inverse must account for intermediate transforms. We track a full operation lineage to handle this correctly.

No open-source OT library provides per-intent undo out of the box. This is custom infrastructure that only becomes necessary when non-human participants join the editing session.

🤖 Agent Presence: Showing the Agent in the Document

When an AI agent is editing a Taskade project, it appears in the presence bar alongside human collaborators. It has a cursor. It has a name. Users can see exactly where in the document the agent is working.

This might seem like a polish feature. It is not. It is critical.

In our early prototypes, agents edited documents without visible presence. The user would be typing in section two while, somewhere off-screen, section five was changing. Nodes appeared. Text shifted. Users described the experience as "the document is changing by itself." Several testers used the word "unsettling."

The fix was transparency. When users can see the agent's cursor in section five, they understand what is happening. The document is not haunted — a collaborator is working. The agent cursor uses a distinct visual style (a different color and icon) so users instantly distinguish agent activity from human activity.

Agent presence also enables a powerful interaction: the user can watch the agent work. In Taskade's AI agent system, you can see the agent building out a section in real-time, node by node (thanks to operation chunking), with its cursor moving through the content just like a human collaborator. This builds trust. The user feels in control because they can see what is happening and intervene at any time.

The "Pause Agent" button is the second most-used agent control after "Undo Agent Changes." Users need to feel they can stop the agent at any moment. Visible presence makes that control discoverable and intuitive.

📊 Rate Limiting and Throttling

Even with chunking, priority queuing, and cursor anchoring, there is a hard ceiling on how many operations the OT engine should process per second. We enforce rate limits on agent operations to prevent overload.

The throttling system works at two levels:

Per-agent throttling. Each AI agent has a maximum operations-per-second budget. If the agent tries to exceed its budget, excess operations are queued and drip-fed at the allowed rate.

Burst detection. If an agent attempts to apply a very large number of operations in a short window, the system detects the burst and applies progressive backpressure — increasing delays between batches until the operation rate stabilizes.

The goal is never to prevent agents from working. The goal is to keep the OT engine's transform computation within bounds so that human operations are never delayed. Think of it as traffic shaping: agent traffic is shaped to coexist with human traffic without causing congestion.

🧪 Testing: The Combinatorial Explosion

Testing agent-human OT interactions is a combinatorial challenge. Every agent operation type must be tested against every human operation type, across every possible document state.

In our system, the core operation types include: insert node, delete node, set content, move node, indent, outdent, set properties, and mark complete. That is 8 operation types. Agent-vs-human means 8 x 8 = 64 pairwise test cases — just for single-operation conflicts.

When you factor in multi-operation sequences, concurrent edits at different positions, and the interaction between chunking, priority queuing, and anchoring, the test space explodes.

Test Dimension	Cases	Description
Operation pairs	64	8 agent op types x 8 human op types
Position variants	3	Same node, adjacent, distant
Concurrency	4	Sequential, overlapping, simultaneous, burst
Chunking states	3	Mid-batch, between batches, no chunking
Priority states	2	Human active, human idle
Multi-agent	3	1 agent, 2 agents, 3+ agents
Total matrix	~4,600	Combinatorial product

We do not test all 4,600 cases exhaustively on every commit. We maintain a priority tier:

Tier 1 (every commit): All 64 operation pairs at same-node position with simultaneous concurrency. These are the most likely conflicts.
Tier 2 (nightly): Full position and concurrency matrix for the top 10 most common operation pairs.
Tier 3 (weekly): Full combinatorial matrix including multi-agent scenarios.

The most important lesson from our test suite: the bugs that matter are not in the OT transform logic itself. The transform functions are well-studied algorithms. The bugs live in the interaction between chunking, priority queuing, and anchoring. A batch boundary that falls in the wrong place. A priority inversion when two agents and a human submit ops in a specific order. A cursor anchor that resolves to a deleted node. These are the edge cases that only exist because agents participate in OT.

🛠️ Production Lessons

After running AI agents as first-class OT participants in production for over a year, we have accumulated hard-won lessons. Here are the five that saved us the most rearchitecture work.

1. Chunking Is Non-Negotiable

We tried shipping without chunking early on. The document felt "possessed." Content materialized in large blocks, cursors teleported, and users lost their place. Within a day of internal testing, we rolled it back. Chunking is not an optimization. It is a correctness requirement for the user experience.

2. Agents Should Edit Bottom-Up

When an agent inserts content into a section, the insertion point matters for cursor displacement. Inserting at the top of a section pushes every cursor in that section downward. Inserting at the bottom affects no existing cursors. We default agents to bottom-up insertion: the agent appends to the end of a section rather than prepending to the beginning. This single heuristic cut cursor displacement events by more than half.

3. Feedback Loops Are Real

An AI agent generates content. The content creation triggers an automation. The automation invokes another agent. That agent generates more content, which triggers another automation. We discovered this feedback loop in production when a document grew by 300 nodes in 10 seconds with no human input.

We detect and break these cycles by tracking operation provenance chains. If an agent's output triggers an action that leads back to the same agent within a bounded number of hops, the cycle is broken.

4. Users Want Control Above All Else

The two most-used AI agent controls in Taskade are "Pause Agent" and "Undo Agent Changes." Users want to feel they can stop and reverse agent actions at any time. Every feature we build for agent-human collaboration must preserve this sense of control. An agent that cannot be paused or undone is an agent users will not trust.

This aligns with what I have observed across the industry. Barry Zhang from Anthropic described the "cost of error and error discovery" as the key factor in agent system design. In a multiplayer document, OT convergence errors are both high-cost and hard to discover — the document looks fine on one client but has diverged on another. Visible agent presence and easy undo are our primary defenses against this failure mode.

5. Testing Is the Bottleneck

Writing agent-OT features takes a week. Testing them takes a month. The combinatorial explosion of operation types, document states, concurrency scenarios, and chunking boundaries means the test matrix is enormous. We invested heavily in property-based testing (generating random operation sequences and verifying convergence) and it has been worth every hour.

🔮 What Comes Next

Agent-human OT is a solved problem for us in the single-agent case. The next frontiers are harder.

Multi-agent editing. Multiple AI agents editing the same document simultaneously introduces agent-agent OT conflicts. Two agents inserting into the same section, each unaware of the other's intent, create merge scenarios that neither was designed for. We are actively working on multi-agent collaboration patterns that handle this gracefully.

Agent awareness of human editing. Today, agents are unaware of where humans are editing. An agent might insert content into the exact section a human is actively working on. We are building agent-side awareness so that agents can detect human editing activity and yield — pausing or redirecting their edits to avoid collision zones.

Suggested edits mode. Instead of agents directly modifying the document, they could propose changes that humans accept or reject — similar to the suggestions mode in Google Docs, but for agent-generated content. This shifts from "agent edits, human undoes" to "agent suggests, human approves."

Conflict-free agent zones. Designated sections of a document where agents can edit freely without OT overhead. If the agent is working in a section that no human is editing, the chunking and priority queuing overhead is unnecessary. Dynamic zone allocation based on human cursor positions could significantly improve agent throughput without sacrificing human UX.

🏗️ The Bigger Picture

The reason I wrote this post is that I believe multiplayer AI collaboration is the future of productivity tools, and almost nobody is talking about the engineering challenges underneath.

Most AI integrations today are single-player. You ask an AI to generate text, it produces output, you paste it in. That works for a solo writer. It breaks completely for a team of five people working on the same project with an AI agent assisting.

At Taskade, we have 7 project views — List, Board, Calendar, Table, Mind Map, Gantt, Org Chart — and AI agents can edit content in every single one of them through the same OT engine. The agent does not get a special path. It does not bypass collaboration. It participates.

Building that participation required us to solve problems that do not exist in any OT textbook:

Operation chunking to tame bulk agent edits
Cursor anchoring to prevent human cursor displacement
Priority queuing to protect human UX over agent throughput
Per-intent undo with operation provenance tracking
Rate limiting to prevent OT transform overload
Visible agent presence to maintain user trust

These are not academic exercises. They are production-tested solutions running across millions of documents in Taskade's workspace.

The Workspace DNA principle that guides our architecture — Memory feeds Intelligence, Intelligence triggers Execution, Execution creates Memory — depends on agents and humans coexisting in the same documents. OT is the layer that makes coexistence possible. Getting it right for AI agents was the hardest multiplayer engineering challenge we have faced.

If you are building AI into a collaborative editor and thinking about bypassing the collaboration engine — reconsider. Route your agents through OT. Make them real participants. It is harder. It is worth it.

Stan Chang is the CTO and Co-founder of Taskade. He has been building real-time collaboration systems since 2017. Reach him at @lxcid.

Want to see AI agents collaborate in real-time alongside your team? Try Taskade free — our AI agents work across all 7 project views, with full automation support and 100+ integrations.

Frequently Asked Questions

What happens when an AI agent and a human edit the same document?

In Taskade, AI agents participate in the same Operational Transform system as human editors. Agent edits are chunked into small batches, human operations are prioritized, and cursor anchoring prevents visual disruption. The result is seamless collaborative editing where agents and humans coexist in real-time.

Why do AI agents need Operational Transform instead of just replacing content?

Replacing content would destroy any edits humans are making simultaneously. By routing agent edits through OT, agent changes and human changes merge correctly, undo/redo works for both parties, and presence indicators show where the agent is editing. Agents are participants, not administrators.

How does Taskade prevent AI agents from disrupting human editors?

Taskade uses three techniques: operation chunking (breaking bulk agent edits into small batches), priority queuing (human ops always process first), and cursor anchoring (human cursors stay stable during agent edits). Agents are also rate-limited to prevent OT transform overload.

Can you undo AI agent changes in a collaborative document?

Yes. Taskade tags operations by source and groups agent operations into logical units. Users can undo the agent's entire contribution as a single action, rather than undoing individual operations one at a time. This requires operation provenance tracking beyond standard OT implementations.

What is the velocity mismatch problem in AI multiplayer editing?

Humans type at roughly 5 characters per second while AI agents generate at approximately 5,000 tokens per second. That is a 1,000x difference in edit velocity. Operational Transform conflict resolution was never designed for this speed differential, causing transform overload, cursor displacement, and UX disruption without specialized engineering.

What is operation chunking for AI agent edits?

Operation chunking breaks a large agent edit into smaller batches of 10 to 20 operations with brief delays between batches. Instead of receiving 500 operations in a single burst, the OT engine and human collaborators see incremental updates that feel collaborative rather than overwhelming.

How does cursor anchoring work in agent-human multiplayer editing?

Cursor anchoring tracks a human cursor position relative to nearby content rather than by absolute document position. When an agent inserts nodes above the cursor, the system recalculates the absolute position to preserve the relative position. The user's view stays stable even as content shifts around them.

Can multiple AI agents edit the same document simultaneously?

Yes. Taskade supports multi-agent collaboration where multiple AI agents and multiple humans edit the same document through the OT engine. Each agent is a first-class participant with its own cursor, operation queue, and presence indicator. The same chunking, priority, and anchoring techniques apply to agent-agent conflicts.