Skip to content
ENGINEERING·2026-04-22·8 min

Why we rebuilt our agent runtime around MCP

Six months in, we tore down the framework we'd been using and rebuilt on the Model Context Protocol. What broke, what got better, and why we'd do it again.

H

Hakan

CTO

We shipped the first version of Spaceflow on a homemade agent framework. It worked. It was fast to iterate on. It was wired tightly into one model provider because that's what we knew best at the time. Six months in, every meaningful piece of friction we had with customers traced back to that choice.

This post is a writeup of what we replaced it with, what we kept, and what the trade-offs of the new design are.

What the old runtime looked like

Each agent definition was a typescript module that imported a SDK from one provider, declared its tools as decorated functions, and wired retry logic by hand. Approval flows were custom React forms hooked to a websocket. Token accounting was scattered across the codebase.

The shape worked for a small set of agents owned by a small team. It stopped working the moment a customer asked "can we run this with our own model." The answer was "technically yes, but" and that "but" was a six-figure engagement.

What changed

We moved every tool definition behind the Model Context Protocol. The tools are still typescript, they're still typed, but they now sit behind an MCP server that any compatible client can call. The agent logic itself moved to a slim runtime that doesn't import any provider SDK at all. It speaks MCP to the gateway and HTTP to a model endpoint that's configurable per agent.

The visible result for a customer: config.model = "anthropic/claude-opus-4-7" swaps to config.model = "openai/gpt-5" and the agent keeps working. The invisible result for us: we deleted ~12,000 lines of provider-specific code.

What broke along the way

A few things didn't survive the rewrite cleanly:

  • Streaming token accounting. MCP's sampling spec doesn't give us per-provider token counts in the shape we used to log. We replaced our internal cost model with a separate accounting pass that runs against the provider's billing API on a delay.
  • Prompt caching strategies. Each provider does caching differently. The new runtime exposes a thin hook for provider-specific cache hints and otherwise stays out of the way.
  • The internal "agent debugger". The old one was tightly coupled to our framework. We threw it out and wrote a new one against the MCP event log. The new one is better because it works for any client.

What we'd do differently

We waited too long to do this. The signals were there at month three. We had two customers ask about model portability, both got pushed off with "we'll get to it." By the time we did the rewrite, the framework had ten more agents on it.

The other thing: we wrote our own MCP server for the first iteration of the migration. We shouldn't have. The official SDK landed two weeks later and we ended up porting to it anyway. If you're thinking about this kind of migration in 2026, start with the SDK.

What this means for customers

The biggest visible change is that bringing your own model is now a config option. Bring Claude, GPT-5, Gemini, a hosted Llama, or your own internal model. The tools, the policies, the audit log, and the gateway are unchanged.

Less visible but more important: we ship faster. New agents take days. The runtime is a much smaller surface to maintain. And when a new model lands, the integration time is hours.


We're still actively contributing to MCP. If you're building on it, we'd like to hear from you. The full API reference covers our specific extensions.