Skip to Content
Computer Use

Last Updated: 3/7/2026


Computer Use

Build AI agents that see, understand, and control virtual Linux desktops using E2B Desktop sandboxes.

Overview

E2B Desktop sandboxes provide a complete graphical Linux environment that AI agents can interact with, enabling:

  • Visual understanding of applications
  • Mouse and keyboard control
  • Screen capture and analysis
  • GUI application automation

Key Features

Visual Interface

  • Full Linux desktop environment
  • X11 display server
  • VNC access for remote viewing
  • Screenshot capabilities

Input Control

  • Programmatic mouse movements
  • Keyboard input simulation
  • Click and drag operations
  • Hotkey combinations

Application Support

  • Web browsers (Chrome, Firefox)
  • Office applications
  • Development tools
  • Any Linux GUI application

Basic Example

import { Sandbox } from 'e2b' // Create a desktop sandbox const sandbox = await Sandbox.create({ template: 'desktop' // Use desktop template }) // Take a screenshot const screenshot = await sandbox.desktop.screenshot() // Move mouse and click await sandbox.desktop.mouse.move(100, 200) await sandbox.desktop.mouse.click() // Type text await sandbox.desktop.keyboard.type('Hello from AI agent!') await sandbox.close()

Advanced Use Cases

Web Browsing Agent

// Start browser await sandbox.commands.run('firefox &') // Wait for browser to load await new Promise(resolve => setTimeout(resolve, 3000)) // Navigate and interact await sandbox.desktop.keyboard.type('https://example.com') await sandbox.desktop.keyboard.press('Enter') // Analyze page const screenshot = await sandbox.desktop.screenshot() // Send screenshot to vision model for analysis

Testing Automation

// Launch application await sandbox.commands.run('/path/to/app &') // Automated testing workflow const testSteps = [ { action: 'click', x: 150, y: 200 }, { action: 'type', text: 'test input' }, { action: 'click', x: 300, y: 400 }, { action: 'screenshot', verify: true } ] for (const step of testSteps) { if (step.action === 'click') { await sandbox.desktop.mouse.move(step.x, step.y) await sandbox.desktop.mouse.click() } else if (step.action === 'type') { await sandbox.desktop.keyboard.type(step.text) } else if (step.action === 'screenshot') { const img = await sandbox.desktop.screenshot() // Verify expected UI state } await new Promise(resolve => setTimeout(resolve, 500)) }

Vision-Guided Navigation

import { Sandbox } from 'e2b' import OpenAI from 'openai' const openai = new OpenAI() const sandbox = await Sandbox.create({ template: 'desktop' }) async function navigateWithVision(instruction) { // Take screenshot const screenshot = await sandbox.desktop.screenshot() // Ask vision model what to do const response = await openai.chat.completions.create({ model: 'gpt-4-vision-preview', messages: [{ role: 'user', content: [ { type: 'text', text: instruction }, { type: 'image_url', image_url: { url: screenshot } } ] }] }) // Parse and execute actions const action = parseAction(response.choices[0].message.content) await executeAction(sandbox, action) } await navigateWithVision('Click on the submit button')

Desktop API Reference

Mouse Control

// Move mouse await sandbox.desktop.mouse.move(x, y) // Click await sandbox.desktop.mouse.click() // Left click await sandbox.desktop.mouse.click('right') // Right click await sandbox.desktop.mouse.click('middle') // Middle click // Double click await sandbox.desktop.mouse.doubleClick() // Drag await sandbox.desktop.mouse.drag(startX, startY, endX, endY) // Scroll await sandbox.desktop.mouse.scroll(deltaX, deltaY)

Keyboard Control

// Type text await sandbox.desktop.keyboard.type('Hello world') // Press single key await sandbox.desktop.keyboard.press('Enter') await sandbox.desktop.keyboard.press('Escape') // Key combinations await sandbox.desktop.keyboard.press('Control+C') await sandbox.desktop.keyboard.press('Alt+Tab') // Hold and release await sandbox.desktop.keyboard.down('Shift') await sandbox.desktop.keyboard.type('hello') // Types HELLO await sandbox.desktop.keyboard.up('Shift')

Screen Capture

// Full screenshot const fullScreen = await sandbox.desktop.screenshot() // Region screenshot const region = await sandbox.desktop.screenshot({ x: 100, y: 100, width: 500, height: 300 }) // Save screenshot await sandbox.files.write('/home/user/screenshot.png', fullScreen)

Best Practices

1. Timing and Synchronization

  • Add delays between actions for UI to respond
  • Use screenshots to verify state before proceeding
  • Implement retry logic for unreliable UI elements

2. Resource Management

  • Close applications when done
  • Clean up temporary files
  • Monitor memory usage for long-running sessions

3. Vision Model Integration

  • Use screenshots for decision making
  • Implement visual verification
  • Cache common UI patterns

4. Error Recovery

  • Take screenshots on errors for debugging
  • Implement fallback strategies
  • Use known UI coordinates as anchors

Common Patterns

Wait for Element

async function waitForElement(description, maxAttempts = 10) { for (let i = 0; i < maxAttempts; i++) { const screenshot = await sandbox.desktop.screenshot() const found = await checkWithVision(screenshot, description) if (found) return true await new Promise(resolve => setTimeout(resolve, 1000)) } return false }

Application Launcher

async function launchApp(command, waitTime = 3000) { await sandbox.commands.run(`${command} &`) await new Promise(resolve => setTimeout(resolve, waitTime)) // Verify app launched const screenshot = await sandbox.desktop.screenshot() return screenshot }

Form Filling

async function fillForm(fields) { for (const field of fields) { // Click field await sandbox.desktop.mouse.move(field.x, field.y) await sandbox.desktop.mouse.click() // Clear existing content await sandbox.desktop.keyboard.press('Control+A') // Enter new value await sandbox.desktop.keyboard.type(field.value) // Move to next field await sandbox.desktop.keyboard.press('Tab') } }