Last Updated: 3/7/2026

Computer Use

Build AI agents that see, understand, and control virtual Linux desktops using E2B Desktop sandboxes.

Overview

E2B Desktop sandboxes provide a complete graphical Linux environment that AI agents can interact with, enabling:

Visual understanding of applications
Mouse and keyboard control
Screen capture and analysis
GUI application automation

Key Features

Visual Interface

Full Linux desktop environment
X11 display server
VNC access for remote viewing
Screenshot capabilities

Input Control

Programmatic mouse movements
Keyboard input simulation
Click and drag operations
Hotkey combinations

Application Support

Web browsers (Chrome, Firefox)
Office applications
Development tools
Any Linux GUI application

Basic Example


import { Sandbox } from 'e2b'
 
// Create a desktop sandbox
const sandbox = await Sandbox.create({
  template: 'desktop' // Use desktop template
})
 
// Take a screenshot
const screenshot = await sandbox.desktop.screenshot()
 
// Move mouse and click
await sandbox.desktop.mouse.move(100, 200)
await sandbox.desktop.mouse.click()
 
// Type text
await sandbox.desktop.keyboard.type('Hello from AI agent!')
 
await sandbox.close()

Advanced Use Cases

Web Browsing Agent


// Start browser
await sandbox.commands.run('firefox &')
 
// Wait for browser to load
await new Promise(resolve => setTimeout(resolve, 3000))
 
// Navigate and interact
await sandbox.desktop.keyboard.type('https://example.com')
await sandbox.desktop.keyboard.press('Enter')
 
// Analyze page
const screenshot = await sandbox.desktop.screenshot()
// Send screenshot to vision model for analysis

Testing Automation


// Launch application
await sandbox.commands.run('/path/to/app &')
 
// Automated testing workflow
const testSteps = [
  { action: 'click', x: 150, y: 200 },
  { action: 'type', text: 'test input' },
  { action: 'click', x: 300, y: 400 },
  { action: 'screenshot', verify: true }
]
 
for (const step of testSteps) {
  if (step.action === 'click') {
    await sandbox.desktop.mouse.move(step.x, step.y)
    await sandbox.desktop.mouse.click()
  } else if (step.action === 'type') {
    await sandbox.desktop.keyboard.type(step.text)
  } else if (step.action === 'screenshot') {
    const img = await sandbox.desktop.screenshot()
    // Verify expected UI state
  }
  
  await new Promise(resolve => setTimeout(resolve, 500))
}


import { Sandbox } from 'e2b'
import OpenAI from 'openai'
 
const openai = new OpenAI()
const sandbox = await Sandbox.create({ template: 'desktop' })
 
async function navigateWithVision(instruction) {
  // Take screenshot
  const screenshot = await sandbox.desktop.screenshot()
  
  // Ask vision model what to do
  const response = await openai.chat.completions.create({
    model: 'gpt-4-vision-preview',
    messages: [{
      role: 'user',
      content: [
        { type: 'text', text: instruction },
        { type: 'image_url', image_url: { url: screenshot } }
      ]
    }]
  })
  
  // Parse and execute actions
  const action = parseAction(response.choices[0].message.content)
  await executeAction(sandbox, action)
}
 
await navigateWithVision('Click on the submit button')

Desktop API Reference

Mouse Control


// Move mouse
await sandbox.desktop.mouse.move(x, y)
 
// Click
await sandbox.desktop.mouse.click() // Left click
await sandbox.desktop.mouse.click('right') // Right click
await sandbox.desktop.mouse.click('middle') // Middle click
 
// Double click
await sandbox.desktop.mouse.doubleClick()
 
// Drag
await sandbox.desktop.mouse.drag(startX, startY, endX, endY)
 
// Scroll
await sandbox.desktop.mouse.scroll(deltaX, deltaY)

Keyboard Control


// Type text
await sandbox.desktop.keyboard.type('Hello world')
 
// Press single key
await sandbox.desktop.keyboard.press('Enter')
await sandbox.desktop.keyboard.press('Escape')
 
// Key combinations
await sandbox.desktop.keyboard.press('Control+C')
await sandbox.desktop.keyboard.press('Alt+Tab')
 
// Hold and release
await sandbox.desktop.keyboard.down('Shift')
await sandbox.desktop.keyboard.type('hello') // Types HELLO
await sandbox.desktop.keyboard.up('Shift')

Screen Capture


// Full screenshot
const fullScreen = await sandbox.desktop.screenshot()
 
// Region screenshot
const region = await sandbox.desktop.screenshot({
  x: 100,
  y: 100,
  width: 500,
  height: 300
})
 
// Save screenshot
await sandbox.files.write('/home/user/screenshot.png', fullScreen)

Best Practices

1. Timing and Synchronization

Add delays between actions for UI to respond
Use screenshots to verify state before proceeding
Implement retry logic for unreliable UI elements

2. Resource Management

Close applications when done
Clean up temporary files
Monitor memory usage for long-running sessions

3. Vision Model Integration

Use screenshots for decision making
Implement visual verification
Cache common UI patterns

4. Error Recovery

Take screenshots on errors for debugging
Implement fallback strategies
Use known UI coordinates as anchors

Common Patterns

Wait for Element


async function waitForElement(description, maxAttempts = 10) {
  for (let i = 0; i < maxAttempts; i++) {
    const screenshot = await sandbox.desktop.screenshot()
    const found = await checkWithVision(screenshot, description)
    
    if (found) return true
    await new Promise(resolve => setTimeout(resolve, 1000))
  }
  return false
}

Application Launcher


async function launchApp(command, waitTime = 3000) {
  await sandbox.commands.run(`${command} &`)
  await new Promise(resolve => setTimeout(resolve, waitTime))
  
  // Verify app launched
  const screenshot = await sandbox.desktop.screenshot()
  return screenshot
}

Form Filling


async function fillForm(fields) {
  for (const field of fields) {
    // Click field
    await sandbox.desktop.mouse.move(field.x, field.y)
    await sandbox.desktop.mouse.click()
    
    // Clear existing content
    await sandbox.desktop.keyboard.press('Control+A')
    
    // Enter new value
    await sandbox.desktop.keyboard.type(field.value)
    
    // Move to next field
    await sandbox.desktop.keyboard.press('Tab')
  }
}