Try Live
Add Docs
Rankings
Pricing
Enterprise
Docs
Install
Theme
Install
Docs
Pricing
Enterprise
More...
More...
Try Live
Rankings
Create API Key
Add Docs
TuriX
https://github.com/turixai/turix-cua
Admin
TuriX is an open-source AI-powered desktop automation agent that enables users to talk to their
...
Tokens:
8,877
Snippets:
73
Trust Score:
3.7
Update:
4 months ago
Context
Skills
Chat
Benchmark
57.4
Suggestions
Latest
Show doc for...
Code
Info
Show Results
Context Summary (auto-generated)
Raw
Copy
Link
# TuriX - Desktop Actions Driven by AI TuriX is a state-of-the-art computer-use agent that enables AI models to take real, hands-on actions directly on macOS and Windows desktops. It provides a comprehensive framework for automating desktop interactions through vision-language models, achieving over 68% success rate on OSWorld-style benchmarks. The system integrates seamlessly with multiple LLM providers including OpenAI, Anthropic Claude, Google Gemini, and TuriX's proprietary models. The framework operates through a multi-agent architecture with a planner module that breaks down complex tasks into executable steps, and a controller system that executes low-level desktop actions such as clicking, typing, scrolling, and application management. TuriX supports Model Context Protocol (MCP) integration, enabling third-party agents like Claude Desktop to leverage desktop automation capabilities. The platform is 100% open-source and designed for personal use, research, and development without data collection. ## Agent Initialization and Execution Initialize and run the TuriX agent The Agent class orchestrates the entire desktop automation workflow by managing task execution, LLM interactions, UI state tracking, and action execution through a controller. It maintains short-term memory of recent actions, evaluates goal completion, and adapts its behavior based on feedback from the environment. ```python from langchain_openai import ChatOpenAI from src import Agent from src.controller.service import Controller # Configure the language model llm = ChatOpenAI( model="turix-model", openai_api_base="https://llm.turixapi.io/v1", openai_api_key="your_api_key_here", temperature=0.3, ) planner_llm = ChatOpenAI( model="turix-planner-model", openai_api_base="https://llm.turixapi.io/v1", openai_api_key="your_api_key_here", temperature=0.3, ) # Initialize controller controller = Controller() # Create agent with task agent = Agent( task="open Chrome, go to github, and search for turix-cua", llm=llm, planner_llm=planner_llm, controller=controller, use_turix=True, short_memory_len=5, max_actions_per_step=5, save_conversation_path="llm_interactions.log", max_failures=5, ) # Run agent with maximum step limit import asyncio history = asyncio.run(agent.run(max_steps=100)) # Check if task completed successfully if history.is_done(): print("Task completed successfully!") print(f"Final result: {history.final_result()}") else: print("Task failed or reached max steps") errors = history.errors() if errors: print(f"Errors encountered: {errors}") ``` ## Desktop Action Execution Execute low-level desktop interactions The Controller class manages action registration and execution, providing a unified interface for all desktop interactions including mouse operations, keyboard input, application control, and AppleScript automation. Each action is registered with a description and parameter model for LLM function calling. ```python from src.controller.service import Controller from src.controller.views import ( LeftClickPixel, InputTextAction, OpenAppAction, PressCombinedAction, ) # Initialize controller controller = Controller() # Execute a left click at normalized coordinates [x, y] (0-1000) from src.agent.views import ActionResult result = await controller.act( action=LeftClickPixel(position=[500, 300]), mac_tree_builder=mac_tree_builder ) if result.error: print(f"Action failed: {result.error}") else: print(f"Action succeeded: {result.extracted_content}") # Open an application app_result = await controller.act( action=OpenAppAction(app_name="Safari"), mac_tree_builder=mac_tree_builder ) print(f"App PID: {app_result.current_app_pid}") # Type text input text_result = await controller.act( action=InputTextAction(text="Hello, world!"), mac_tree_builder=mac_tree_builder ) # Press keyboard combination (Command+C) copy_result = await controller.act( action=PressCombinedAction(key1="command", key2="c", key3=None), mac_tree_builder=mac_tree_builder ) # Execute multiple actions sequentially actions = [ OpenAppAction(app_name="TextEdit"), InputTextAction(text="This is a test document"), PressCombinedAction(key1="command", key2="s", key3=None), # Save ] results = await controller.multi_act( actions=actions, mac_tree_builder=mac_tree_builder, action_valid=True ) for i, result in enumerate(results): print(f"Action {i+1}: {result.extracted_content}") ``` ## Custom Action Registration Register custom actions with the controller The Registry system allows developers to extend TuriX with custom actions beyond the built-in set. Actions are registered using a decorator pattern and automatically integrated into the LLM's function calling interface. ```python from src.controller.service import Controller from src.controller.registry.views import ActionModel from src.agent.views import ActionResult from pydantic import BaseModel # Initialize controller controller = Controller() # Define parameter model class CustomSearchAction(BaseModel): query: str engine: str # Register custom action @controller.action( description="Search the web using specified search engine", param_model=CustomSearchAction, requires_mac_builder=False ) async def custom_search(query: str, engine: str): """Custom search action implementation""" try: # Implement custom search logic url = f"https://www.{engine}.com/search?q={query}" # Open browser and navigate return ActionResult( extracted_content=f"Searched for '{query}' on {engine}", error=None ) except Exception as e: return ActionResult( extracted_content=None, error=f"Search failed: {str(e)}" ) # The action is now available to the agent # It will appear in the action registry and can be called by the LLM prompt_description = controller.registry.get_prompt_description() print(prompt_description) # Includes your custom action ``` ## Mouse and Keyboard Actions Low-level mouse and keyboard control The mac.actions module provides invisible mouse operations and keyboard input functions that work through macOS accessibility APIs without visually moving the cursor or disrupting the user's workflow. ```python from src.mac.actions import ( left_click_pixel, right_click_pixel, drag_pixel, move_to, type_into, press, press_combination, scroll_up, scroll_down, ) # Left click at normalized coordinates (0-1000 scale) await left_click_pixel([500, 300]) # Right click for context menu await right_click_pixel([600, 400]) # Drag from one position to another await drag_pixel(start=[100, 100], end=[200, 200]) # Move cursor to position (visible movement) await move_to([450, 550]) # Type text with Unicode support await type_into("Hello δΈη! π") # Press single key await press("enter") await press("escape") await press("backspace") # Press key combination await press_combination("command", "c") # Copy await press_combination("command", "shift", "s") # Save As await press_combination("command", "t") # New Tab # Scroll operations (amount: 0-25 lines) await scroll_up(amount=5) await scroll_down(amount=10) # Scroll at specific position from src.mac.actions import _scroll_invisible_at_position await _scroll_invisible_at_position(x=500, y=400, lines=5) ``` ## AppleScript Execution Run AppleScript for advanced macOS automation AppleScript integration enables control of macOS applications and system functions that may not be accessible through standard UI automation, such as manipulating Safari tabs, sending messages, or controlling system preferences. ```python from src.controller.service import Controller from src.controller.views import AppleScriptAction from src.mac.tree import MacUITreeBuilder controller = Controller() mac_tree_builder = MacUITreeBuilder() # Control Safari safari_script = ''' tell application "Safari" activate make new document set URL of document 1 to "https://github.com" end tell ''' result = await controller.act( action=AppleScriptAction(script=safari_script), mac_tree_builder=mac_tree_builder ) if result.error: print(f"Script failed: {result.error}") else: print(f"Script executed: {result.extracted_content}") # Get system information system_info_script = ''' tell application "System Events" set processNames to name of every process whose background only is false return processNames as string end tell ''' result = await controller.act( action=AppleScriptAction(script=system_info_script), mac_tree_builder=mac_tree_builder ) print(f"Running applications: {result.extracted_content}") # Control Messages app send_message_script = ''' tell application "Messages" set targetBuddy to buddy "John Doe" send "Hello from TuriX!" to targetBuddy end tell ''' result = await controller.act( action=AppleScriptAction(script=send_message_script), mac_tree_builder=mac_tree_builder ) ``` ## Task Planning with Planner Break down complex tasks into executable steps The Planner module uses a separate LLM to analyze high-level user tasks and decompose them into concrete, actionable steps that the main agent can execute. This improves success rates on complex multi-step workflows. ```python from src.agent.planner_service import Planner from langchain_openai import ChatOpenAI # Configure planner LLM planner_llm = ChatOpenAI( model="turix-planner-model", openai_api_base="https://llm.turixapi.io/v1", openai_api_key="your_api_key_here", temperature=0.3, ) # Initialize planner planner = Planner( planner_llm=planner_llm, task="Book a flight from NYC to SF, then reserve a hotel near the airport", max_input_tokens=32000, ) # Generate step-by-step plan plan = await planner.edit_task() print("Generated Plan:") print(plan) # The plan is automatically formatted and can be used by the agent # Example output: # "The overall user's task is: Book a flight from NYC to SF, then reserve a hotel near the airport # The step by step plan is: # 1. Open web browser # 2. Navigate to flight booking website # 3. Enter departure city: NYC # 4. Enter destination city: SF # 5. Select dates and search # 6. Choose flight and complete booking # 7. Navigate to hotel booking website # 8. Search for hotels near SF airport # 9. Select hotel and complete reservation" ``` ## Agent History and Results Access execution history and results The AgentHistoryList class provides comprehensive tracking of all actions, results, and state changes throughout the agent's execution, enabling debugging, analysis, and workflow replay. ```python from src.agent.views import AgentHistoryList import asyncio # After running agent agent = Agent(task="open Safari and go to github.com", llm=llm, controller=controller) history = asyncio.run(agent.run(max_steps=50)) # Check completion status if history.is_done(): print("Task completed successfully!") # Get all executed actions actions = history.model_actions() for i, action in enumerate(actions): print(f"Step {i+1}: {action}") # Get all action results results = history.action_results() for result in results: if result.error: print(f"Error: {result.error}") else: print(f"Success: {result.extracted_content}") # Get errors only errors = history.errors() if errors: print(f"Encountered {len(errors)} errors:") for error in errors: print(f" - {error}") # Get final result final = history.final_result() print(f"Final result: {final}") # Get extracted content from all steps content = history.extracted_content() print(f"All extracted content: {content}") # Get agent's thoughts/reasoning at each step thoughts = history.model_thoughts() for i, thought in enumerate(thoughts): print(f"Step {i+1} - Goal: {thought.next_goal}") print(f" Evaluation: {thought.evaluation_previous_goal}") print(f" Info Stored: {thought.information_stored}") # Save history to file history.save_to_file("agent_execution_history.json") # Load history from file loaded_history = AgentHistoryList.load_from_file( "agent_execution_history.json", output_model=agent.AgentOutput ) ``` ## Configuration and Model Selection Configure LLM providers and models TuriX supports multiple LLM providers through a unified configuration interface, enabling easy switching between OpenAI, Anthropic, Google, and custom models. ```python import json from pathlib import Path from langchain_openai import ChatOpenAI from langchain_google_genai import ChatGoogleGenerativeAI from langchain_anthropic import ChatAnthropic # Load configuration from JSON def load_config(path: Path) -> dict: with path.open("r", encoding="utf-8") as fp: return json.load(fp) config = load_config(Path("examples/config.json")) # Build LLM based on provider def build_llm(cfg: dict): provider = cfg["provider"].lower() api_key = cfg.get("api_key") model_name = cfg.get("model_name") base_url = cfg.get("base_url") if provider == "turix": return ChatOpenAI( model=model_name, openai_api_base=base_url, openai_api_key=api_key, temperature=0.3, ) elif provider == "google_flash": return ChatGoogleGenerativeAI( model="gemini-2.5-flash", api_key=api_key, temperature=0.3 ) elif provider == "gpt": return ChatOpenAI( model="gpt-4.1-mini", api_key=api_key, temperature=0.3 ) elif provider == "anthropic": return ChatAnthropic( model="claude-4-opus", api_key=api_key, temperature=0.3 ) else: raise ValueError(f"Unknown llm provider '{provider}'") # Example config.json structure: config_example = { "logging_level": "DEBUG", "llm": { "provider": "turix", "model_name": "turix-model", "api_key": "your_api_key_here", "base_url": "https://llm.turixapi.io/v1" }, "planner_llm": { "provider": "google_flash", "api_key": "google_api_key_here" }, "agent": { "task": "open Chrome and search for AI news", "short_memory_len": 5, "use_ui": False, "max_actions_per_step": 5, "max_steps": 100, "use_turix": True, "save_conversation_path": "llm_interactions.log" } } # Initialize agent with configuration llm = build_llm(config["llm"]) planner_llm = build_llm(config["planner_llm"]) agent = Agent( task=config["agent"]["task"], llm=llm, planner_llm=planner_llm, controller=Controller(), short_memory_len=config["agent"]["short_memory_len"], max_actions_per_step=config["agent"]["max_actions_per_step"], ) ``` ## Main Execution Entry Point Run TuriX from command line The main.py script provides the primary entry point for running TuriX agents, handling configuration loading, permission checks, logging setup, and agent initialization. ```python import asyncio from pathlib import Path from examples.main import main, load_config, build_llm # Run with default config main("examples/config.json") # Run with custom configuration custom_config = { "logging_level": "INFO", "llm": { "provider": "turix", "model_name": "turix-model", "api_key": "your_api_key", "base_url": "https://llm.turixapi.io/v1" }, "planner_llm": { "provider": "turix", "model_name": "turix-planner-model", "api_key": "your_api_key", "base_url": "https://llm.turixapi.io/v1" }, "agent": { "task": "open system settings and enable dark mode", "short_memory_len": 5, "use_ui": False, "max_actions_per_step": 5, "max_steps": 100, "use_turix": True } } # Save and run import json config_path = Path("custom_config.json") with open(config_path, "w") as f: json.dump(custom_config, f, indent=2) # Execute from command line # python examples/main.py -c custom_config.json # Or programmatically if __name__ == "__main__": main(str(config_path)) ``` # Summary TuriX provides a comprehensive framework for AI-driven desktop automation, combining vision-language models with low-level system control to execute complex multi-step tasks. The system achieves state-of-the-art performance through its multi-agent architecture, where a planner breaks down high-level objectives into executable steps, and a controller manages precise desktop interactions including mouse operations, keyboard input, and application control. The framework supports multiple LLM providers (OpenAI, Anthropic, Google, TuriX API) and maintains execution history for debugging and analysis. Primary use cases include automated workflow execution (booking flights/hotels, data entry), application testing and QA, research automation (web scraping, data collection), and accessibility assistance. Integration patterns include standalone Python scripting for custom automation tasks, Model Context Protocol (MCP) integration for Claude Desktop and other MCP-compatible agents, custom action registration for domain-specific operations, and API-based integration for embedding TuriX capabilities into larger systems. The platform is designed for personal use, research projects, and development environments requiring desktop automation without app-specific APIs.