# Bytebot Bytebot is an open-source AI desktop agent that provides a complete virtual computer environment for automating any task. It runs in Docker containers on your own infrastructure, giving you an AI assistant that can control a full Ubuntu Linux desktop with pre-installed applications including browsers, email clients, office tools, and development environments. The agent understands natural language instructions and executes them by controlling the mouse, keyboard, and screen - just like a human would. The platform consists of four integrated components: a virtual desktop (Ubuntu 22.04 with XFCE4), an AI agent (NestJS service supporting Claude, GPT, and Gemini), a task interface (Next.js web app), and REST APIs for programmatic control. Bytebot excels at enterprise automation (RPA replacement), document processing, multi-system integrations, and development/QA workflows. It handles authentication automatically via password manager extensions and can process uploaded files including PDFs directly into the LLM context. ## Quick Start with Docker Compose Deploy Bytebot with Docker Compose for a complete self-hosted AI desktop automation system. ```bash # Clone and configure git clone https://github.com/bytebot-ai/bytebot.git cd bytebot # Configure your AI provider (choose one) echo "ANTHROPIC_API_KEY=sk-ant-your-key-here" > docker/.env # Or: echo "OPENAI_API_KEY=sk-your-key-here" > docker/.env # Or: echo "GEMINI_API_KEY=your-key-here" > docker/.env # Start the agent stack docker-compose -f docker/docker-compose.yml up -d # Access the UI at http://localhost:9992 # Agent API at http://localhost:9991 # Desktop API at http://localhost:9990 ``` ## Tasks API - Create Task Create a new task for the AI agent to process. Tasks can include natural language descriptions and optional file uploads for document processing. ```bash # Create a simple task curl -X POST http://localhost:9991/tasks \ -H "Content-Type: application/json" \ -d '{ "description": "Search for flights from NYC to London next month and create a comparison document", "priority": "HIGH" }' # Response: # { # "id": "task-123", # "description": "Search for flights from NYC to London...", # "status": "PENDING", # "priority": "HIGH", # "createdAt": "2025-04-14T12:00:00Z", # "updatedAt": "2025-04-14T12:00:00Z" # } # Create task with file upload (multipart/form-data) curl -X POST http://localhost:9991/tasks \ -F "description=Analyze the uploaded contracts and extract all payment terms and deadlines" \ -F "priority=HIGH" \ -F "files=@contract1.pdf" \ -F "files=@contract2.pdf" ``` ## Tasks API - Get Tasks Retrieve all tasks or get a specific task by ID, including its message history. ```bash # Get all tasks curl -X GET http://localhost:9991/tasks # Response: # [ # { # "id": "task-123", # "description": "Download invoices from webmail", # "status": "COMPLETED", # "priority": "MEDIUM", # "createdAt": "2025-04-14T12:00:00Z", # "updatedAt": "2025-04-14T12:30:00Z" # }, # ... # ] # Get specific task with messages curl -X GET http://localhost:9991/tasks/task-123 # Get currently in-progress task curl -X GET http://localhost:9991/tasks/in-progress ``` ## Tasks API - Update and Delete Tasks Update task status/priority or delete tasks from the system. ```bash # Update task status and priority curl -X PATCH http://localhost:9991/tasks/task-123 \ -H "Content-Type: application/json" \ -d '{ "status": "COMPLETED", "priority": "HIGH" }' # Delete a task (returns 204 No Content) curl -X DELETE http://localhost:9991/tasks/task-123 # Task statuses: PENDING, IN_PROGRESS, NEEDS_HELP, NEEDS_REVIEW, COMPLETED, CANCELLED, FAILED # Priority levels: LOW, MEDIUM, HIGH, URGENT ``` ## Computer Use API - Screenshot Capture a screenshot of the virtual desktop. Returns a base64-encoded PNG image. ```bash # Take a screenshot curl -X POST http://localhost:9990/computer-use \ -H "Content-Type: application/json" \ -d '{"action": "screenshot"}' # Response: # { # "success": true, # "data": { # "image": "iVBORw0KGgoAAAANSUhEUgAAB4AAAAQ..." # } # } # Save screenshot to file (bash) response=$(curl -s -X POST http://localhost:9990/computer-use \ -H "Content-Type: application/json" \ -d '{"action": "screenshot"}') echo $response | jq -r '.data.image' | base64 -d > screenshot.png ``` ## Computer Use API - Mouse Actions Control mouse movements, clicks, and drags on the virtual desktop. ```bash # Move mouse to coordinates curl -X POST http://localhost:9990/computer-use \ -H "Content-Type: application/json" \ -d '{ "action": "move_mouse", "coordinates": {"x": 500, "y": 300} }' # Click at coordinates (left, right, or middle button) curl -X POST http://localhost:9990/computer-use \ -H "Content-Type: application/json" \ -d '{ "action": "click_mouse", "coordinates": {"x": 500, "y": 300}, "button": "left", "clickCount": 1 }' # Double-click with modifier keys curl -X POST http://localhost:9990/computer-use \ -H "Content-Type: application/json" \ -d '{ "action": "click_mouse", "coordinates": {"x": 500, "y": 300}, "button": "left", "clickCount": 2, "holdKeys": ["ctrl"] }' # Drag from one point to another curl -X POST http://localhost:9990/computer-use \ -H "Content-Type: application/json" \ -d '{ "action": "drag_mouse", "path": [ {"x": 100, "y": 100}, {"x": 300, "y": 300} ], "button": "left" }' # Scroll down 5 steps curl -X POST http://localhost:9990/computer-use \ -H "Content-Type: application/json" \ -d '{ "action": "scroll", "direction": "down", "scrollCount": 5 }' # Get current cursor position curl -X POST http://localhost:9990/computer-use \ -H "Content-Type: application/json" \ -d '{"action": "cursor_position"}' # Response: {"success": true, "data": {"x": 500, "y": 300}} ``` ## Computer Use API - Keyboard Actions Type text, press keys, and execute keyboard shortcuts. ```bash # Type text with optional delay between characters curl -X POST http://localhost:9990/computer-use \ -H "Content-Type: application/json" \ -d '{ "action": "type_text", "text": "Hello, Bytebot!", "delay": 50 }' # Paste text (useful for special characters and emojis) curl -X POST http://localhost:9990/computer-use \ -H "Content-Type: application/json" \ -d '{ "action": "paste_text", "text": "Special characters: (C)(R)(TM) and emojis" }' # Type individual keys in sequence curl -X POST http://localhost:9990/computer-use \ -H "Content-Type: application/json" \ -d '{ "action": "type_keys", "keys": ["a", "b", "c", "enter"], "delay": 50 }' # Press keyboard shortcut (Ctrl+S to save) curl -X POST http://localhost:9990/computer-use \ -H "Content-Type: application/json" \ -d '{ "action": "press_keys", "keys": ["ctrl", "s"], "press": "down" }' # Wait for specified duration (milliseconds) curl -X POST http://localhost:9990/computer-use \ -H "Content-Type: application/json" \ -d '{ "action": "wait", "duration": 2000 }' ``` ## Computer Use API - Application Switching Switch between applications in the virtual desktop environment. ```bash # Switch to Firefox browser curl -X POST http://localhost:9990/computer-use \ -H "Content-Type: application/json" \ -d '{ "action": "application", "application": "firefox" }' # Available applications: # - firefox: Mozilla Firefox browser # - 1password: Password manager # - thunderbird: Email client # - vscode: Visual Studio Code # - terminal: Terminal/console # - desktop: Switch to desktop # - directory: File manager ``` ## Computer Use API - File Operations Read and write files in the virtual desktop filesystem. ```bash # Write a file (content must be base64 encoded) curl -X POST http://localhost:9990/computer-use \ -H "Content-Type: application/json" \ -d '{ "action": "write_file", "path": "/home/user/documents/example.txt", "data": "SGVsbG8gV29ybGQh" }' # Response: # { # "success": true, # "message": "File written successfully to: /home/user/documents/example.txt" # } # Read a file (returns base64 encoded content) curl -X POST http://localhost:9990/computer-use \ -H "Content-Type: application/json" \ -d '{ "action": "read_file", "path": "/home/user/documents/example.txt" }' # Response: # { # "success": true, # "data": "SGVsbG8gV29ybGQh", # "name": "example.txt", # "size": 12, # "mediaType": "text/plain" # } ``` ## Python SDK Example Complete Python example for automating browser tasks with the Computer Use API. ```python import requests import base64 import time class BytebotClient: def __init__(self, base_url="http://localhost:9990"): self.base_url = base_url def computer_action(self, action, **params): """Execute a computer action on the virtual desktop.""" url = f"{self.base_url}/computer-use" data = {"action": action, **params} response = requests.post(url, json=data) return response.json() def screenshot(self): """Take a screenshot and return the image data.""" result = self.computer_action("screenshot") if result["success"]: return base64.b64decode(result["data"]["image"]) return None def click(self, x, y, button="left", count=1): """Click at specified coordinates.""" return self.computer_action("click_mouse", coordinates={"x": x, "y": y}, button=button, clickCount=count) def type_text(self, text, delay=0): """Type text into the active window.""" return self.computer_action("type_text", text=text, delay=delay) def press_keys(self, keys): """Press keyboard keys.""" return self.computer_action("press_keys", keys=keys, press="down") def wait(self, ms): """Wait for specified milliseconds.""" return self.computer_action("wait", duration=ms) def switch_app(self, app): """Switch to specified application.""" return self.computer_action("application", application=app) # Usage example: Automate web search client = BytebotClient() # Open Firefox client.switch_app("firefox") client.wait(2000) # Click on URL bar and type search query client.click(500, 50) client.type_text("https://www.google.com", delay=30) client.press_keys(["enter"]) client.wait(3000) # Take screenshot of results screenshot_data = client.screenshot() with open("google_home.png", "wb") as f: f.write(screenshot_data) print("Screenshot saved to google_home.png") ``` ## JavaScript/Node.js SDK Example Complete Node.js example for task automation with both APIs. ```javascript const axios = require('axios'); const fs = require('fs'); class BytebotClient { constructor(agentUrl = 'http://localhost:9991', desktopUrl = 'http://localhost:9990') { this.agentUrl = agentUrl; this.desktopUrl = desktopUrl; } // Task Management API async createTask(description, priority = 'MEDIUM') { const response = await axios.post(`${this.agentUrl}/tasks`, { description, priority }); return response.data; } async getTasks() { const response = await axios.get(`${this.agentUrl}/tasks`); return response.data; } async getTask(taskId) { const response = await axios.get(`${this.agentUrl}/tasks/${taskId}`); return response.data; } async getInProgressTask() { const response = await axios.get(`${this.agentUrl}/tasks/in-progress`); return response.data; } // Computer Use API async computerAction(action, params = {}) { const response = await axios.post(`${this.desktopUrl}/computer-use`, { action, ...params }); return response.data; } async screenshot() { return this.computerAction('screenshot'); } async click(x, y, button = 'left', clickCount = 1) { return this.computerAction('click_mouse', { coordinates: { x, y }, button, clickCount }); } async typeText(text, delay = 0) { return this.computerAction('type_text', { text, delay }); } async pressKeys(keys) { return this.computerAction('press_keys', { keys, press: 'down' }); } async wait(duration) { return this.computerAction('wait', { duration }); } async switchApp(application) { return this.computerAction('application', { application }); } } // Usage example async function main() { const client = new BytebotClient(); // Create a task for the AI agent const task = await client.createTask( 'Research the top 5 project management tools and create a comparison document', 'HIGH' ); console.log('Created task:', task.id); // Or use direct desktop control await client.switchApp('firefox'); await client.wait(2000); await client.click(500, 50); await client.typeText('https://example.com'); await client.pressKeys(['enter']); await client.wait(3000); // Take and save screenshot const result = await client.screenshot(); if (result.success) { const imageBuffer = Buffer.from(result.data.image, 'base64'); fs.writeFileSync('screenshot.png', imageBuffer); console.log('Screenshot saved'); } } main().catch(console.error); ``` ## MCP (Model Context Protocol) Integration Connect MCP clients to access desktop control tools via Server-Sent Events. ```bash # MCP endpoint for SSE connections # http://localhost:9990/mcp # Example: Configure Claude Desktop to use Bytebot MCP # In claude_desktop_config.json: { "mcpServers": { "bytebot": { "url": "http://localhost:9990/mcp" } } } # The MCP endpoint exposes all computer-use actions as tools: # - screenshot: Capture desktop # - click_mouse: Click at coordinates # - type_text: Type text # - press_keys: Keyboard shortcuts # - scroll: Scroll page # - application: Switch apps # - read_file/write_file: File operations ``` ## Helm Deployment for Kubernetes Deploy Bytebot on Kubernetes using Helm charts for production environments. ```bash # Clone the repository git clone https://github.com/bytebot-ai/bytebot.git cd bytebot # Install with Helm (basic) helm install bytebot ./helm \ --set agent.env.ANTHROPIC_API_KEY=sk-ant-your-key-here # Install with custom values helm install bytebot ./helm \ --set agent.env.ANTHROPIC_API_KEY=sk-ant-your-key-here \ --set agent.env.ANTHROPIC_MODEL=claude-3-5-sonnet-20241022 \ --set bytebot-ui.ingress.enabled=true \ --set bytebot-ui.ingress.hosts[0].host=bytebot.example.com # Using values file cat > my-values.yaml << EOF agent: env: ANTHROPIC_API_KEY: sk-ant-your-key-here bytebot-ui: ingress: enabled: true hosts: - host: bytebot.example.com paths: - path: / pathType: Prefix EOF helm install bytebot ./helm -f my-values.yaml ``` ## LiteLLM Proxy Integration Use LiteLLM proxy to access multiple LLM providers including Azure OpenAI, AWS Bedrock, and local models. ```bash # Start with LiteLLM proxy for multiple providers docker-compose -f docker/docker-compose.proxy.yml up -d # Configure LiteLLM (litellm_config.yaml example) model_list: - model_name: gpt-4 litellm_params: model: azure/gpt-4-deployment api_base: https://your-resource.openai.azure.com api_key: your-azure-key api_version: "2024-02-15-preview" - model_name: claude-3-sonnet litellm_params: model: anthropic/claude-3-sonnet-20240229 api_key: sk-ant-your-key - model_name: gemini-pro litellm_params: model: gemini/gemini-1.5-flash api_key: your-gemini-key # Environment variables for proxy mode # LITELLM_PROXY_URL=http://litellm:4000 # LITELLM_MODEL=gpt-4 ``` Bytebot serves as a powerful platform for enterprise automation, replacing traditional RPA tools with AI-powered adaptability. Primary use cases include financial operations (bank portal automation, invoice processing, reconciliation), compliance workflows (regulatory document downloads, audit trail generation), multi-system integration (bridging legacy systems without APIs), and development/QA integration (automated testing, visual regression). The platform handles authentication automatically through password manager extensions, supporting 2FA workflows without manual intervention. Integration patterns typically involve either high-level task creation via the Agent API (port 9991) for autonomous AI-driven workflows, or low-level desktop control via the Computer Use API (port 9990) for precise automation scripts. The Agent API is ideal for complex, adaptive tasks described in natural language, while the Computer Use API provides direct programmatic control for integration with existing automation frameworks. Both APIs can be combined - create tasks via the Agent API while monitoring desktop state via the Computer Use API. The MCP endpoint enables integration with AI assistants like Claude Desktop, exposing all desktop control capabilities as tools.