Last Updated: 3/7/2026
Computer Use
Build AI agents that see, understand, and control virtual Linux desktops using E2B Desktop sandboxes.
Overview
E2B Desktop sandboxes provide a complete graphical Linux environment that AI agents can interact with, enabling:
- Visual understanding of applications
- Mouse and keyboard control
- Screen capture and analysis
- GUI application automation
Key Features
Visual Interface
- Full Linux desktop environment
- X11 display server
- VNC access for remote viewing
- Screenshot capabilities
Input Control
- Programmatic mouse movements
- Keyboard input simulation
- Click and drag operations
- Hotkey combinations
Application Support
- Web browsers (Chrome, Firefox)
- Office applications
- Development tools
- Any Linux GUI application
Basic Example
import { Sandbox } from 'e2b'
// Create a desktop sandbox
const sandbox = await Sandbox.create({
template: 'desktop' // Use desktop template
})
// Take a screenshot
const screenshot = await sandbox.desktop.screenshot()
// Move mouse and click
await sandbox.desktop.mouse.move(100, 200)
await sandbox.desktop.mouse.click()
// Type text
await sandbox.desktop.keyboard.type('Hello from AI agent!')
await sandbox.close()Advanced Use Cases
Web Browsing Agent
// Start browser
await sandbox.commands.run('firefox &')
// Wait for browser to load
await new Promise(resolve => setTimeout(resolve, 3000))
// Navigate and interact
await sandbox.desktop.keyboard.type('https://example.com')
await sandbox.desktop.keyboard.press('Enter')
// Analyze page
const screenshot = await sandbox.desktop.screenshot()
// Send screenshot to vision model for analysisTesting Automation
// Launch application
await sandbox.commands.run('/path/to/app &')
// Automated testing workflow
const testSteps = [
{ action: 'click', x: 150, y: 200 },
{ action: 'type', text: 'test input' },
{ action: 'click', x: 300, y: 400 },
{ action: 'screenshot', verify: true }
]
for (const step of testSteps) {
if (step.action === 'click') {
await sandbox.desktop.mouse.move(step.x, step.y)
await sandbox.desktop.mouse.click()
} else if (step.action === 'type') {
await sandbox.desktop.keyboard.type(step.text)
} else if (step.action === 'screenshot') {
const img = await sandbox.desktop.screenshot()
// Verify expected UI state
}
await new Promise(resolve => setTimeout(resolve, 500))
}Vision-Guided Navigation
import { Sandbox } from 'e2b'
import OpenAI from 'openai'
const openai = new OpenAI()
const sandbox = await Sandbox.create({ template: 'desktop' })
async function navigateWithVision(instruction) {
// Take screenshot
const screenshot = await sandbox.desktop.screenshot()
// Ask vision model what to do
const response = await openai.chat.completions.create({
model: 'gpt-4-vision-preview',
messages: [{
role: 'user',
content: [
{ type: 'text', text: instruction },
{ type: 'image_url', image_url: { url: screenshot } }
]
}]
})
// Parse and execute actions
const action = parseAction(response.choices[0].message.content)
await executeAction(sandbox, action)
}
await navigateWithVision('Click on the submit button')Desktop API Reference
Mouse Control
// Move mouse
await sandbox.desktop.mouse.move(x, y)
// Click
await sandbox.desktop.mouse.click() // Left click
await sandbox.desktop.mouse.click('right') // Right click
await sandbox.desktop.mouse.click('middle') // Middle click
// Double click
await sandbox.desktop.mouse.doubleClick()
// Drag
await sandbox.desktop.mouse.drag(startX, startY, endX, endY)
// Scroll
await sandbox.desktop.mouse.scroll(deltaX, deltaY)Keyboard Control
// Type text
await sandbox.desktop.keyboard.type('Hello world')
// Press single key
await sandbox.desktop.keyboard.press('Enter')
await sandbox.desktop.keyboard.press('Escape')
// Key combinations
await sandbox.desktop.keyboard.press('Control+C')
await sandbox.desktop.keyboard.press('Alt+Tab')
// Hold and release
await sandbox.desktop.keyboard.down('Shift')
await sandbox.desktop.keyboard.type('hello') // Types HELLO
await sandbox.desktop.keyboard.up('Shift')Screen Capture
// Full screenshot
const fullScreen = await sandbox.desktop.screenshot()
// Region screenshot
const region = await sandbox.desktop.screenshot({
x: 100,
y: 100,
width: 500,
height: 300
})
// Save screenshot
await sandbox.files.write('/home/user/screenshot.png', fullScreen)Best Practices
1. Timing and Synchronization
- Add delays between actions for UI to respond
- Use screenshots to verify state before proceeding
- Implement retry logic for unreliable UI elements
2. Resource Management
- Close applications when done
- Clean up temporary files
- Monitor memory usage for long-running sessions
3. Vision Model Integration
- Use screenshots for decision making
- Implement visual verification
- Cache common UI patterns
4. Error Recovery
- Take screenshots on errors for debugging
- Implement fallback strategies
- Use known UI coordinates as anchors
Common Patterns
Wait for Element
async function waitForElement(description, maxAttempts = 10) {
for (let i = 0; i < maxAttempts; i++) {
const screenshot = await sandbox.desktop.screenshot()
const found = await checkWithVision(screenshot, description)
if (found) return true
await new Promise(resolve => setTimeout(resolve, 1000))
}
return false
}Application Launcher
async function launchApp(command, waitTime = 3000) {
await sandbox.commands.run(`${command} &`)
await new Promise(resolve => setTimeout(resolve, waitTime))
// Verify app launched
const screenshot = await sandbox.desktop.screenshot()
return screenshot
}Form Filling
async function fillForm(fields) {
for (const field of fields) {
// Click field
await sandbox.desktop.mouse.move(field.x, field.y)
await sandbox.desktop.mouse.click()
// Clear existing content
await sandbox.desktop.keyboard.press('Control+A')
// Enter new value
await sandbox.desktop.keyboard.type(field.value)
// Move to next field
await sandbox.desktop.keyboard.press('Tab')
}
}