HelixML

Giving Every Agent Its Own Desktop

Jun 3, 2026

I recorded my talk at Tessl AI DevCon on my laptop. Here's what I said about why agents need their own computers, why Claude Code made me stupider, and why multi-agent org charts devolve into corporate politics.

I gave a talk at Tessl's AI DevCon recently. I recorded it on my laptop because I was excited about what I was presenting and wanted to have a copy. Here it is, plus the written version below for people who'd rather read.

Thanks to Patrick Debois and the Tessl team for putting on a great event.

The thesis

thesis: all information work will become management of agents

All information work will eventually become about managing agents. Software engineering is where it started, because that's where AI came from. But everyone who moves information around for a living will end up interacting with agents one way or another.

I've been building toward this since 2023 at Helix, where we make it possible to run agents and LLMs on your own infrastructure. Late last year I got obsessed with making the snake eat its own tail: using the thing we were building to build itself. That journey broke a lot of my assumptions.

the rm -rf incident

the eight stages of AI adoption

There's a natural progression when you start using coding agents. You go from a single copilot suggestion, to running a CLI agent like Claude Code, to getting bored of babysitting one agent and thinking: what if I run five of these in parallel?

I tried that while rushing to prepare a customer presentation. I was up early on a Monday morning, five agents running against the same checkout. One of them decided to git stash the work that all the other ones were doing. The following day, a different agent ran rm -rf . on my git checkout.

This actually happened.

Give each agent its own computer

local vs. centralised agent infrastructure

You wouldn't hire a team of software developers and ask them all to share one laptop. So why would you do that with agents?

We run each agent in its own isolated Docker-in-Docker environment on Kubernetes. Each one gets its own filesystem, its own browser, its own everything. They can't trample each other's work. And because the agents live on centralized infrastructure rather than on individual developer machines, a globally distributed team can hand work off across time zones. When the sun sets in Tokyo and rises in London, a different human picks up exactly where the previous one left off.

The IDE is not dead

do we still need an IDE?

Claude Code started making us not look at the code so much. I genuinely feel like it made me stupider.

Do you know Claude Code is implemented in React? They've ended up building a web browser inside the terminal. Can't we just have a working text box?

You still need a visual IDE. We embed Zed inside each agent's desktop because it's fast and memory-efficient, which matters when you're running hundreds of these on a machine. Watching the agent navigate between files gives you ambient knowledge of the codebase. You absorb context just by watching. And when the agent needs your help, you've already got a proper IDE to search around in.

You also need what I call the "meta IDE": the control plane that gives you a higher-level view over all your running agents. The inner IDE is Zed. The outer IDE is the kanban board and spec review interface that lets you manage the fleet.

Agents devolve into enterprise politics

task scaling vs role shapes

We experimented with org-chart-style agent hierarchies. You hire a CEO agent, they hire a VP of Engineering agent, the VP hires engineer agents, you give them names, you set up communication channels between them. Agent Slack, basically.

What happened was genuinely funny and also expensive. The agents devolved into enterprise politics. They were trained on human data about how humans argue with each other about stupid stuff, and that's exactly what they did. We burned a lot of tokens on corporate infighting between imaginary executives.

The pragmatic middle ground: coarse role categories (marketing agents, engineering agents, sales agents) because they each need different tools and system access. But within each category, you scale by task, not by org chart. A pool of worker bees dispatched from a kanban board.

Spec-driven development

spec driven development flow

The most impactful thing a human does in this workflow is review specs, not code.

A short human prompt goes in ("add dark mode to the to-do app"), and the agent enters a planning phase. It reads the codebase, understands the context, and writes a detailed spec. The human reviews that spec before any code gets written.

The terminal is a terrible place for reviewing documents. So we built a Google Docs-style interface where you can leave inline comments on spec lines. The comments push back to the agent as constraints.

Catching a fundamental misunderstanding during planning is cheap. Comment on two lines. Catching it mid-implementation, after the agent has built scaffolding on top of a wrong assumption, is painful and expensive.

Most of my day now is reviewing specs and finding the two lines where the agent got it wrong. That's the job.

Own your hardware

token costs, privacy, neutrality

Token costs are exploding. Open-weight models like GLM can handle roughly 80% of the work.

Take your next three months of token budget and invest in hardware with RTX 6K Pros. Run local models for the tedious stuff. Burst out to Claude Opus for the hard problems. Don't get locked into any single provider.

Those same GPUs also support what GPUs were originally designed for: graphics. We use them to render the GPU-accelerated desktops that make each agent's environment pleasant to use, borrowing tech from the cloud gaming industry for low-latency streaming.

The mechanical suit

self-improving businesses

We also use our agent desktops to log into LinkedIn. The agent builds a list of 200 people to reach out to, helps draft messages, and navigates the UI. LinkedIn thinks I'm a human. And I kind of am. I'm a human using an agent like a mechanical suit to do a lot more outreach than I'd normally have the patience for. I got a bunch of good meetings out of it when I visited the Bay Area recently.

Where I think this is heading: a self-improving codebase at the bottom, fed by issues on a kanban board. Then product agents improving the software based on user feedback. Then sales and marketing agents on top of that. Humans in the loop at every level, but moving faster with fewer people than you'd expect.


I'm always up for comparing notes on this stuff. Find me on LinkedIn or check out what we're building at Helix.