Article Detail

Build a Browser-Based Sandboxed SWE Agent

2026-05-14MDX POCen

Build a Browser-Based Sandboxed SWE Agent

There’s a limit to how far a terminal-based AI coding agent can take you. It runs on your machine, touches your real filesystem, executes commands with your user privileges, and requires a local setup just to try it out. If you’ve built one, you’ve probably thought: what if people could just open a browser and use it?

That’s the idea behind the browser sandbox agent. The user interacts through a web chat interface. The actual code execution happens inside an isolated Docker container on a remote server. No local installation, no file system risk, no “works on my machine.”

System Overview

The core architecture is simple on paper:

Browser → Web Chat UI → WebSocket → Server Process → Docker Sandbox
                                                         (isolated FS + Shell)

The terminal agent reads your local files and runs commands on your machine. The sandbox agent does the same thing — but inside a container that’s created on demand, lives for the duration of a session, and gets destroyed when you’re done.

Concern Terminal Agent Sandbox Agent
File safety Operates on real user files Operates on isolated sandbox
Code execution Direct system commands Restricted container execution
Remote collaboration Needs Bridge/SSH setup Share a browser URL
Environment consistency Depends on local setup Standardized container environment
Resource limits Uses all available resources CPU/memory caps enforced

Architecture

Here’s the full system layout:

┌──────────────────────────────────────────────────────────┐
│                    Browser (React)                         │
│  <App>                                                    │
│    ├─ <ConversationPanel>                                 │
│    │    ├─ <MessageList>                                  │
│    │    └─ <MessageInput>                                 │
│    ├─ <FileExplorer>     sandbox file tree                │
│    ├─ <Terminal>         live container terminal          │
│    └─ <ToolCallUI>       real-time tool call visualization│
└───────────────────────┬──────────────────────────────────┘
                        │ WebSocket (wss://)

┌──────────────────────────────────────────────────────────┐
│                    Server (Node.js/Bun)                    │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐    │
│  │ HTTP Server   │  │ WebSocket    │  │ Session       │    │
│  │ (Hono/Express)│  │ Server (WS)  │  │ Manager       │    │
│  └──────┬───────┘  └──────┬───────┘  └──────┬───────┘    │
│         │                 │                  │            │
│  ┌──────┴─────────────────┴──────────────────┴───────┐   │
│  │              QueryEngine + Agent Loop              │   │
│  │         (reused from the terminal version)         │   │
│  └──────────────────────┬─────────────────────────────┘   │
│                         │                                  │
│  ┌──────────────────────┴─────────────────────────────┐   │
│  │              Sandbox Manager                        │   │
│  │  ┌────────────┐  ┌────────────┐  ┌─────────────┐   │   │
│  │  │ Docker     │  │ Temp FS    │  │ Resource    │   │   │
│  │  │ Provider   │  │ Ephemeral  │  │ Limiter     │   │   │
│  │  └────────────┘  └────────────┘  └─────────────┘   │   │
│  └──────────────────────┬─────────────────────────────┘   │
│                         ▼                                  │
│              ┌────────────────────┐                        │
│              │ Docker Container   │                        │
│              │ (Ubuntu 22.04)     │                        │
│              │ ├─ /workspace/     │                        │
│              │ ├─ bash            │                        │
│              │ ├─ git, node, etc  │                        │
│              │ └─ isolated FS     │                        │
│              └────────────────────┘                        │
└──────────────────────────────────────────────────────────┘

The key insight is that the core agent loop is shared between the terminal and sandbox versions. The same QueryEngine, the same buildTool factory, the same tool execution pipeline — only the tool implementations change to target the container instead of localhost.

The Sandbox

Choice of Isolation

For the sandbox itself, Docker containers are the sweet spot. They offer strong isolation with manageable overhead:

Approach Isolation Overhead Use Case
Docker Good Low Default choice
gVisor Stronger Medium High-security deployments
Firecracker micro VM Strongest High Multi-tenant SaaS
WebContainer Limited None (browser) Lightweight demos

Security Model

A production sandbox needs multiple layers of defense:

1. Network Isolation
   ├─ Outbound connections blocked by default
   ├─ Whitelist-allowed domains (npm, git, pypi)
   └─ Configurable proxy egress

2. Filesystem Isolation
   ├─ Container has only /workspace
   ├─ No access to host filesystem
   └─ Auto-destroyed when session ends

3. Resource Limits
   ├─ CPU: 1 core default, up to 4
   ├─ Memory: 1GB default, up to 8GB
   ├─ Disk: 5GB default
   ├─ Command timeout: 10s
   └─ Total session duration: 1 hour

4. Command Filtering
   ├─ Blocked: network scanning, crypto mining, privilege escalation
   ├─ Restricted: curl/wget whitelist
   └─ Monitoring: real-time audit log

5. Cleanup
   ├─ Idle session auto-destroyed after 30 min
   ├─ Resources force-reclaimed
   └─ All data permanently deleted

Docker Sandbox Manager

The implementation wraps the Docker SDK into a simple lifecycle:

class DockerSandbox {
  private container: Docker.Container | null = null

  async create(image = 'swe-agent:base'): Promise<void> {
    this.container = await docker.createContainer({
      Image: image,
      WorkingDir: '/workspace',
      Cmd: ['/bin/bash', '-c', 'tail -f /dev/null'],
      HostConfig: {
        Memory: 1024 * 1024 * 1024,   // 1GB
        NanoCpus: 1_000_000_000,      // 1 CPU
        ReadonlyRootfs: true,
        NetworkMode: 'none',
        AutoRemove: true,
        Binds: [`${tempDir}:/workspace:rw`],
      },
    })
    await this.container.start()
  }

  async execCommand(command: string, timeout = 10_000) {
    const exec = await this.container!.exec({
      Cmd: ['bash', '-c', command],
      AttachStdout: true,
      AttachStderr: true,
    })
    return collectOutput(exec, timeout)
  }

  async destroy(): Promise<void> {
    await this.container?.stop({ t: 0 })
  }
}

The base Docker image is straightforward — Ubuntu 22.04 with common dev tools pre-installed: git, curl, Node.js, Python, bun, ripgrep. A non-root user (swe) runs inside the container.

Backend Architecture

Technology Stack

Component Choice Rationale
HTTP framework Hono or Express Lightweight, well-known
WebSocket ws library Native, minimal
Sandbox management dockerode Docker API bindings
Database SQLite or PostgreSQL Session persistence
Auth JWT + Session Stateless API tokens
Agent core Reused from terminal version Same QueryEngine/tools

API Routes

POST   /api/sessions                Create session (starts sandbox)
GET    /api/sessions/:id            Get session details
POST   /api/sessions/:id/query      Send a message
GET    /api/sessions/:id/status     Get session status
DELETE /api/sessions/:id            Destroy session
WS     /ws/sessions/:id             Streaming communication

Adapting Tools for the Sandbox

The clever part is how little needs to change. The terminal agent tools execute locally, so the sandbox version simply swaps the execution target:

// Sandbox Bash tool — runs commands inside the container
const SandboxBashTool = buildTool({
  name: 'Bash',
  inputSchema: bashInputSchema,
  async call({ command, timeout }, context) {
    const sandbox = context.getSandbox()  // current container instance
    const result = await sandbox.execCommand(command, timeout)
    return { data: result }
  },
})

// Sandbox file read — uses cat inside the container
const SandboxFileReadTool = buildTool({
  name: 'Read',
  inputSchema: fileReadInputSchema,
  async call({ file_path, offset, limit }, context) {
    const sandbox = context.getSandbox()
    const result = await sandbox.execCommand(`cat "${file_path}"`)
    return { data: { content: result.stdout } }
  },
})

Session Management

Each session ties together a container instance, an agent engine, and user data:

class SessionManager {
  private sessions = new Map<string, Session>()

  async createSession(userId: string, options?: SessionOptions) {
    // 1. Create Docker container
    const sandbox = await createSandbox(options?.sandboxConfig)

    // 2. Create Agent engine pointed at the sandbox
    const engine = new QueryEngine({
      cwd: '/workspace',
      tools: createSandboxToolsFor(sandbox),
    })

    // 3. Store session record
    const session = {
      id: randomUUID(), userId, sandbox, engine,
      status: 'active', createdAt: new Date(),
    }

    this.sessions.set(session.id, session)

    // 4. Auto-destroy after 30 min idle
    startIdleTimer(session.id, 30 * 60 * 1000,
      () => this.destroySession(session.id))

    return session
  }

  async destroySession(sessionId: string) {
    const session = this.sessions.get(sessionId)
    if (!session) return
    await session.sandbox.destroy()
    this.sessions.delete(sessionId)
  }
}

Frontend Architecture

Component Structure

The web UI mirrors the terminal REPL but with richer interaction patterns:

<App>
  ├── <Sidebar>
  │    ├── <SessionList>
  │    └── <UserMenu>

  ├── <ChatPanel>
  │    ├── <MessageList>
  │    │    ├── <UserMessage>
  │    │    ├── <AssistantMessage>
  │    │    │    └── <StreamingText>    typewriter effect
  │    │    ├── <ToolCallBlock>
  │    │    │    ├── <ToolCallHeader>
  │    │    │    ├── <ToolCallInput>
  │    │    │    └── <ToolCallResult>
  │    │    └── <SystemMessage>
  │    │
  │    └── <MessageInput>

  ├── <FilePanel>
  │    ├── <FileExplorer>              sandbox file tree
  │    └── <FilePreview>

  └── <TerminalPanel>
       └── <Terminal> (xterm.js)

The layout splits into three panes: a chat panel (left), a file explorer (center), and a real-time terminal (right). The file explorer and terminal both connect to the sandbox container through the server, showing exactly what’s happening inside.

WebSocket Client Integration

The frontend connects to the server over a single WebSocket per session:

function useSessionWebSocket(sessionId: string) {
  const [connectionStatus, setConnectionStatus] =
    useState<'connecting' | 'connected' | 'disconnected'>('disconnected')

  const connect = useCallback(() => {
    const ws = new WebSocket(`wss://server.com/ws/sessions/${sessionId}`)

    ws.onopen = () => setConnectionStatus('connected')
    ws.onclose = () => setConnectionStatus('disconnected')

    return ws
  }, [sessionId])

  const sendMessage = useCallback((content: string) => {
    const ws = wsRef.current
    if (ws?.readyState === WebSocket.OPEN) {
      ws.send(JSON.stringify({ type: 'query', content }))
    }
  }, [])

  // Auto-reconnect on disconnect
  useEffect(() => {
    if (connectionStatus === 'disconnected') {
      const timer = setTimeout(connect, 1000)
      return () => clearTimeout(timer)
    }
  }, [connectionStatus, connect])

  return { connect, sendMessage, connectionStatus }
}

Streaming Text Rendering

Instead of dumping the full response at once, the frontend renders text with a typewriter effect:

function StreamingText({ content }: { content: string }) {
  const [displayed, setDisplayed] = useState('')
  const indexRef = useRef(0)

  useEffect(() => {
    if (!content) return
    const timer = setInterval(() => {
      if (indexRef.current < content.length) {
        setDisplayed(prev => prev + content[indexRef.current]!)
        indexRef.current++
      } else {
        clearInterval(timer)
      }
    }, 10)  // ~100 chars/sec
    return () => clearInterval(timer)
  }, [content])

  return (
    <div className="prose whitespace-pre-wrap">
      {displayed}
      <span className="animate-pulse">▊</span>
    </div>
  )
}

WebSocket Protocol

All communication between the browser and server goes through a single WebSocket connection per session. The protocol is simple JSON messages:

// Client -> Server
type ClientMessage =
  | { type: 'query'; content: string }
  | { type: 'cancel' }
  | { type: 'permission_response'; requestId: string; behavior: 'allow' | 'deny' | 'always_allow' }
  | { type: 'keepalive' }

// Server -> Client
type ServerMessage =
  | { type: 'stream_start'; messageId: string }
  | { type: 'text_delta'; delta: string }
  | { type: 'text_done' }
  | { type: 'tool_use'; toolUseId: string; toolName: string; input: object }
  | { type: 'tool_result_start'; toolUseId: string }
  | { type: 'tool_result_delta'; toolUseId: string; delta: string }
  | { type: 'tool_result_done'; toolUseId: string; isError: boolean }
  | { type: 'permission_request'; requestId: string; toolName: string; input: object }
  | { type: 'error'; message: string }
  | { type: 'done'; reason: string }
  | { type: 'keepalive' }

The server-side WebSocket handler manages the full lifecycle — processing queries, handling cancellations, routing permission responses:

wss.on('connection', (ws, req) => {
  const sessionId = extractSessionId(req.url!)
  const session = sessionManager.get(sessionId)

  ws.on('message', async (data) => {
    const msg = JSON.parse(data.toString())

    switch (msg.type) {
      case 'query':
        for await (const event of handleQuery(sessionId, msg.content)) {
          if (ws.readyState !== WebSocket.OPEN) break
          ws.send(JSON.stringify(event))
        }
        break

      case 'cancel':
        session.abortController?.abort()
        break

      case 'permission_response':
        session.permissionResolver?.resolve(msg)
        break
    }
  })
})

Authentication and Quota Management

Auth Flow

The auth system uses JWT tokens with a simple flow:

POST /api/auth/login  →  { token, expiresIn }
POST /api/sessions    →  Authorization: Bearer <token>
                       →  { sessionId, wsUrl }
WS /ws/sessions/:id   →  Token validated on connection

Resource Quotas

Different plan tiers get different resource limits:

const FREE_LIMITS = {
  maxSessionsPerDay: 5,
  maxDurationPerSession: 30 * 60 * 1000,  // 30 min
  maxCommandsPerSession: 100,
  sandboxMemory: 512,    // MB
  sandboxCpu: 0.5,       // cores
}

const PRO_LIMITS = {
  maxSessionsPerDay: 50,
  maxDurationPerSession: 4 * 60 * 60 * 1000,  // 4 hours
  maxCommandsPerSession: 2000,
  sandboxMemory: 4096,   // MB
  sandboxCpu: 4,
}

Building It: Iteration Roadmap

If you want to build this yourself, here’s the phased approach:

Phase 1: Backend Foundation (Days 1-5)

HTTP API server + Docker sandbox manager + session CRUD + basic JWT auth.

Phase 2: Agent Integration (Days 6-10)

Reuse the terminal agent’s QueryEngine. Create sandbox-adapted tools (Bash, FileRead, FileWrite that execute inside the container).

Phase 3: WebSocket Streaming (Days 11-14)

Real-time streaming with the message protocol. Support cancellation and permission request/response channels.

Phase 4: Frontend Basics (Days 15-20)

Vite + React project setup. Chat panel, WebSocket client, typewriter text rendering, tool call UI cards, responsive layout.

Phase 5: Sandbox Visualization (Days 21-25)

File tree component reading from the container. xterm.js real-time terminal. File preview with syntax highlighting. Auto-refresh on file changes.

Phase 6: Security Hardening (Days 26-30)

Command filtering, resource limits, timeout enforcement, network isolation, image signing.

Phase 7: UX (Days 31-35)

Session history, reconnection, token usage display, permission dialog UI, dark/light theme, mobile adaptation.

Phase 8: Production (Days 36-40)

Dockerized deployment, CI/CD, sandbox pre-warming, horizontal scaling, monitoring and logging, rate limiting.

Going Further

Sandbox Pool

Pre-warming containers eliminates cold-start latency:

class SandboxPool {
  private pool: DockerSandbox[] = []
  private minSize = 5

  async acquire(): Promise<DockerSandbox> {
    if (this.pool.length > 0) return this.pool.pop()!
    return DockerSandbox.create()
  }

  async release(sandbox: DockerSandbox): Promise<void> {
    await sandbox.cleanup()
    if (this.pool.length < this.maxSize) this.pool.push(sandbox)
  }

  async warmUp(): Promise<void> {
    while (this.pool.length < this.minSize) {
      this.pool.push(await DockerSandbox.create())
    }
  }
}

Collaboration

Multiple users can connect to the same sandbox session — useful for pair debugging or demos:

class CollaborativeSession extends Session {
  participants: Map<string, Participant> = new Map()

  addParticipant(userId: string, ws: WebSocket) {
    this.broadcast({ type: 'participant_joined', userId })
  }

  broadcast(message: ServerMessage, excludeUserId?: string) {
    for (const [uid, p] of this.participants) {
      if (uid === excludeUserId) continue
      p.ws.send(JSON.stringify(message))
    }
  }
}

WebContainer Alternative

For lighter use cases that don’t need full Docker isolation, StackBlitz’s WebContainer technology runs Node.js entirely in the browser via Service Workers. No servers to manage, but limited to what a browser runtime can do — no Docker, no arbitrary binaries, constrained memory.

Relationship to the Terminal Version

The two versions share the same core but diverge at the edges:

Terminal Agent                Sandbox Agent
─────────────                 ─────────────
Local filesystem              Isolated sandbox FS
System shell                  Container shell
User's machine resources      Server resources
Local trust model             JWT auth + quotas
Ink terminal UI               React web UI

The real win is code reuse: the QueryEngine, buildTool factory, message types, and agent loop logic are identical between versions. Only the tool implementations and UI layer change. If you’ve already built a terminal agent, you’re about 60% of the way to a browser sandbox version.