TuriX (turixai/turix-cua)

TuriX

https://github.com/turixai/turix-cua
Admin
TuriX is an open-source AI-powered desktop automation agent that enables users to talk to their...

Tokens:8,877
Snippets:73
Trust Score:3.7
Update:4 months ago
Show doc for...
Context Summary (auto-generated)
Raw
# TuriX - Desktop Actions Driven by AI

TuriX is a state-of-the-art computer-use agent that enables AI models to take real, hands-on actions directly on macOS and Windows desktops. It provides a comprehensive framework for automating desktop interactions through vision-language models, achieving over 68% success rate on OSWorld-style benchmarks. The system integrates seamlessly with multiple LLM providers including OpenAI, Anthropic Claude, Google Gemini, and TuriX's proprietary models.

The framework operates through a multi-agent architecture with a planner module that breaks down complex tasks into executable steps, and a controller system that executes low-level desktop actions such as clicking, typing, scrolling, and application management. TuriX supports Model Context Protocol (MCP) integration, enabling third-party agents like Claude Desktop to leverage desktop automation capabilities. The platform is 100% open-source and designed for personal use, research, and development without data collection.

## Agent Initialization and Execution

Initialize and run the TuriX agent

The Agent class orchestrates the entire desktop automation workflow by managing task execution, LLM interactions, UI state tracking, and action execution through a controller. It maintains short-term memory of recent actions, evaluates goal completion, and adapts its behavior based on feedback from the environment.

```python
from langchain_openai import ChatOpenAI
from src import Agent
from src.controller.service import Controller

# Configure the language model
llm = ChatOpenAI(
    model="turix-model",
    openai_api_base="https://llm.turixapi.io/v1",
    openai_api_key="your_api_key_here",
    temperature=0.3,
)

planner_llm = ChatOpenAI(
    model="turix-planner-model",
    openai_api_base="https://llm.turixapi.io/v1",
    openai_api_key="your_api_key_here",
    temperature=0.3,
)

# Initialize controller
controller = Controller()

# Create agent with task
agent = Agent(
    task="open Chrome, go to github, and search for turix-cua",
    llm=llm,
    planner_llm=planner_llm,
    controller=controller,
    use_turix=True,
    short_memory_len=5,
    max_actions_per_step=5,
    save_conversation_path="llm_interactions.log",
    max_failures=5,
)

# Run agent with maximum step limit
import asyncio
history = asyncio.run(agent.run(max_steps=100))

# Check if task completed successfully
if history.is_done():
    print("Task completed successfully!")
    print(f"Final result: {history.final_result()}")
else:
    print("Task failed or reached max steps")
    errors = history.errors()
    if errors:
        print(f"Errors encountered: {errors}")
```

## Desktop Action Execution

Execute low-level desktop interactions

The Controller class manages action registration and execution, providing a unified interface for all desktop interactions including mouse operations, keyboard input, application control, and AppleScript automation. Each action is registered with a description and parameter model for LLM function calling.

```python
from src.controller.service import Controller
from src.controller.views import (
    LeftClickPixel,
    InputTextAction,
    OpenAppAction,
    PressCombinedAction,
)

# Initialize controller
controller = Controller()

# Execute a left click at normalized coordinates [x, y] (0-1000)
from src.agent.views import ActionResult
result = await controller.act(
    action=LeftClickPixel(position=[500, 300]),
    mac_tree_builder=mac_tree_builder
)

if result.error:
    print(f"Action failed: {result.error}")
else:
    print(f"Action succeeded: {result.extracted_content}")

# Open an application
app_result = await controller.act(
    action=OpenAppAction(app_name="Safari"),
    mac_tree_builder=mac_tree_builder
)
print(f"App PID: {app_result.current_app_pid}")

# Type text input
text_result = await controller.act(
    action=InputTextAction(text="Hello, world!"),
    mac_tree_builder=mac_tree_builder
)

# Press keyboard combination (Command+C)
copy_result = await controller.act(
    action=PressCombinedAction(key1="command", key2="c", key3=None),
    mac_tree_builder=mac_tree_builder
)

# Execute multiple actions sequentially
actions = [
    OpenAppAction(app_name="TextEdit"),
    InputTextAction(text="This is a test document"),
    PressCombinedAction(key1="command", key2="s", key3=None),  # Save
]

results = await controller.multi_act(
    actions=actions,
    mac_tree_builder=mac_tree_builder,
    action_valid=True
)

for i, result in enumerate(results):
    print(f"Action {i+1}: {result.extracted_content}")
```

## Custom Action Registration

Register custom actions with the controller

The Registry system allows developers to extend TuriX with custom actions beyond the built-in set. Actions are registered using a decorator pattern and automatically integrated into the LLM's function calling interface.

```python
from src.controller.service import Controller
from src.controller.registry.views import ActionModel
from src.agent.views import ActionResult
from pydantic import BaseModel

# Initialize controller
controller = Controller()

# Define parameter model
class CustomSearchAction(BaseModel):
    query: str
    engine: str

# Register custom action
@controller.action(
    description="Search the web using specified search engine",
    param_model=CustomSearchAction,
    requires_mac_builder=False
)
async def custom_search(query: str, engine: str):
    """Custom search action implementation"""
    try:
        # Implement custom search logic
        url = f"https://www.{engine}.com/search?q={query}"
        # Open browser and navigate
        return ActionResult(
            extracted_content=f"Searched for '{query}' on {engine}",
            error=None
        )
    except Exception as e:
        return ActionResult(
            extracted_content=None,
            error=f"Search failed: {str(e)}"
        )

# The action is now available to the agent
# It will appear in the action registry and can be called by the LLM
prompt_description = controller.registry.get_prompt_description()
print(prompt_description)  # Includes your custom action
```

## Mouse and Keyboard Actions

Low-level mouse and keyboard control

The mac.actions module provides invisible mouse operations and keyboard input functions that work through macOS accessibility APIs without visually moving the cursor or disrupting the user's workflow.

```python
from src.mac.actions import (
    left_click_pixel,
    right_click_pixel,
    drag_pixel,
    move_to,
    type_into,
    press,
    press_combination,
    scroll_up,
    scroll_down,
)

# Left click at normalized coordinates (0-1000 scale)
await left_click_pixel([500, 300])

# Right click for context menu
await right_click_pixel([600, 400])

# Drag from one position to another
await drag_pixel(start=[100, 100], end=[200, 200])

# Move cursor to position (visible movement)
await move_to([450, 550])

# Type text with Unicode support
await type_into("Hello 世界! 🎉")

# Press single key
await press("enter")
await press("escape")
await press("backspace")

# Press key combination
await press_combination("command", "c")  # Copy
await press_combination("command", "shift", "s")  # Save As
await press_combination("command", "t")  # New Tab

# Scroll operations (amount: 0-25 lines)
await scroll_up(amount=5)
await scroll_down(amount=10)

# Scroll at specific position
from src.mac.actions import _scroll_invisible_at_position
await _scroll_invisible_at_position(x=500, y=400, lines=5)
```

## AppleScript Execution

Run AppleScript for advanced macOS automation

AppleScript integration enables control of macOS applications and system functions that may not be accessible through standard UI automation, such as manipulating Safari tabs, sending messages, or controlling system preferences.

```python
from src.controller.service import Controller
from src.controller.views import AppleScriptAction
from src.mac.tree import MacUITreeBuilder

controller = Controller()
mac_tree_builder = MacUITreeBuilder()

# Control Safari
safari_script = '''
tell application "Safari"
    activate
    make new document
    set URL of document 1 to "https://github.com"
end tell
'''

result = await controller.act(
    action=AppleScriptAction(script=safari_script),
    mac_tree_builder=mac_tree_builder
)

if result.error:
    print(f"Script failed: {result.error}")
else:
    print(f"Script executed: {result.extracted_content}")

# Get system information
system_info_script = '''
tell application "System Events"
    set processNames to name of every process whose background only is false
    return processNames as string
end tell
'''

result = await controller.act(
    action=AppleScriptAction(script=system_info_script),
    mac_tree_builder=mac_tree_builder
)

print(f"Running applications: {result.extracted_content}")

# Control Messages app
send_message_script = '''
tell application "Messages"
    set targetBuddy to buddy "John Doe"
    send "Hello from TuriX!" to targetBuddy
end tell
'''

result = await controller.act(
    action=AppleScriptAction(script=send_message_script),
    mac_tree_builder=mac_tree_builder
)
```

## Task Planning with Planner

Break down complex tasks into executable steps

The Planner module uses a separate LLM to analyze high-level user tasks and decompose them into concrete, actionable steps that the main agent can execute. This improves success rates on complex multi-step workflows.

```python
from src.agent.planner_service import Planner
from langchain_openai import ChatOpenAI

# Configure planner LLM
planner_llm = ChatOpenAI(
    model="turix-planner-model",
    openai_api_base="https://llm.turixapi.io/v1",
    openai_api_key="your_api_key_here",
    temperature=0.3,
)

# Initialize planner
planner = Planner(
    planner_llm=planner_llm,
    task="Book a flight from NYC to SF, then reserve a hotel near the airport",
    max_input_tokens=32000,
)

# Generate step-by-step plan
plan = await planner.edit_task()

print("Generated Plan:")
print(plan)

# The plan is automatically formatted and can be used by the agent
# Example output:
# "The overall user's task is: Book a flight from NYC to SF, then reserve a hotel near the airport
#  The step by step plan is:
#  1. Open web browser
#  2. Navigate to flight booking website
#  3. Enter departure city: NYC
#  4. Enter destination city: SF
#  5. Select dates and search
#  6. Choose flight and complete booking
#  7. Navigate to hotel booking website
#  8. Search for hotels near SF airport
#  9. Select hotel and complete reservation"
```

## Agent History and Results

Access execution history and results

The AgentHistoryList class provides comprehensive tracking of all actions, results, and state changes throughout the agent's execution, enabling debugging, analysis, and workflow replay.

```python
from src.agent.views import AgentHistoryList
import asyncio

# After running agent
agent = Agent(task="open Safari and go to github.com", llm=llm, controller=controller)
history = asyncio.run(agent.run(max_steps=50))

# Check completion status
if history.is_done():
    print("Task completed successfully!")

# Get all executed actions
actions = history.model_actions()
for i, action in enumerate(actions):
    print(f"Step {i+1}: {action}")

# Get all action results
results = history.action_results()
for result in results:
    if result.error:
        print(f"Error: {result.error}")
    else:
        print(f"Success: {result.extracted_content}")

# Get errors only
errors = history.errors()
if errors:
    print(f"Encountered {len(errors)} errors:")
    for error in errors:
        print(f"  - {error}")

# Get final result
final = history.final_result()
print(f"Final result: {final}")

# Get extracted content from all steps
content = history.extracted_content()
print(f"All extracted content: {content}")

# Get agent's thoughts/reasoning at each step
thoughts = history.model_thoughts()
for i, thought in enumerate(thoughts):
    print(f"Step {i+1} - Goal: {thought.next_goal}")
    print(f"         Evaluation: {thought.evaluation_previous_goal}")
    print(f"         Info Stored: {thought.information_stored}")

# Save history to file
history.save_to_file("agent_execution_history.json")

# Load history from file
loaded_history = AgentHistoryList.load_from_file(
    "agent_execution_history.json",
    output_model=agent.AgentOutput
)
```

## Configuration and Model Selection

Configure LLM providers and models

TuriX supports multiple LLM providers through a unified configuration interface, enabling easy switching between OpenAI, Anthropic, Google, and custom models.

```python
import json
from pathlib import Path
from langchain_openai import ChatOpenAI
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_anthropic import ChatAnthropic

# Load configuration from JSON
def load_config(path: Path) -> dict:
    with path.open("r", encoding="utf-8") as fp:
        return json.load(fp)

config = load_config(Path("examples/config.json"))

# Build LLM based on provider
def build_llm(cfg: dict):
    provider = cfg["provider"].lower()
    api_key = cfg.get("api_key")
    model_name = cfg.get("model_name")
    base_url = cfg.get("base_url")

    if provider == "turix":
        return ChatOpenAI(
            model=model_name,
            openai_api_base=base_url,
            openai_api_key=api_key,
            temperature=0.3,
        )
    elif provider == "google_flash":
        return ChatGoogleGenerativeAI(
            model="gemini-2.5-flash",
            api_key=api_key,
            temperature=0.3
        )
    elif provider == "gpt":
        return ChatOpenAI(
            model="gpt-4.1-mini",
            api_key=api_key,
            temperature=0.3
        )
    elif provider == "anthropic":
        return ChatAnthropic(
            model="claude-4-opus",
            api_key=api_key,
            temperature=0.3
        )
    else:
        raise ValueError(f"Unknown llm provider '{provider}'")

# Example config.json structure:
config_example = {
    "logging_level": "DEBUG",
    "llm": {
        "provider": "turix",
        "model_name": "turix-model",
        "api_key": "your_api_key_here",
        "base_url": "https://llm.turixapi.io/v1"
    },
    "planner_llm": {
        "provider": "google_flash",
        "api_key": "google_api_key_here"
    },
    "agent": {
        "task": "open Chrome and search for AI news",
        "short_memory_len": 5,
        "use_ui": False,
        "max_actions_per_step": 5,
        "max_steps": 100,
        "use_turix": True,
        "save_conversation_path": "llm_interactions.log"
    }
}

# Initialize agent with configuration
llm = build_llm(config["llm"])
planner_llm = build_llm(config["planner_llm"])
agent = Agent(
    task=config["agent"]["task"],
    llm=llm,
    planner_llm=planner_llm,
    controller=Controller(),
    short_memory_len=config["agent"]["short_memory_len"],
    max_actions_per_step=config["agent"]["max_actions_per_step"],
)
```

## Main Execution Entry Point

Run TuriX from command line

The main.py script provides the primary entry point for running TuriX agents, handling configuration loading, permission checks, logging setup, and agent initialization.

```python
import asyncio
from pathlib import Path
from examples.main import main, load_config, build_llm

# Run with default config
main("examples/config.json")

# Run with custom configuration
custom_config = {
    "logging_level": "INFO",
    "llm": {
        "provider": "turix",
        "model_name": "turix-model",
        "api_key": "your_api_key",
        "base_url": "https://llm.turixapi.io/v1"
    },
    "planner_llm": {
        "provider": "turix",
        "model_name": "turix-planner-model",
        "api_key": "your_api_key",
        "base_url": "https://llm.turixapi.io/v1"
    },
    "agent": {
        "task": "open system settings and enable dark mode",
        "short_memory_len": 5,
        "use_ui": False,
        "max_actions_per_step": 5,
        "max_steps": 100,
        "use_turix": True
    }
}

# Save and run
import json
config_path = Path("custom_config.json")
with open(config_path, "w") as f:
    json.dump(custom_config, f, indent=2)

# Execute from command line
# python examples/main.py -c custom_config.json

# Or programmatically
if __name__ == "__main__":
    main(str(config_path))
```

# Summary

TuriX provides a comprehensive framework for AI-driven desktop automation, combining vision-language models with low-level system control to execute complex multi-step tasks. The system achieves state-of-the-art performance through its multi-agent architecture, where a planner breaks down high-level objectives into executable steps, and a controller manages precise desktop interactions including mouse operations, keyboard input, and application control. The framework supports multiple LLM providers (OpenAI, Anthropic, Google, TuriX API) and maintains execution history for debugging and analysis.

Primary use cases include automated workflow execution (booking flights/hotels, data entry), application testing and QA, research automation (web scraping, data collection), and accessibility assistance. Integration patterns include standalone Python scripting for custom automation tasks, Model Context Protocol (MCP) integration for Claude Desktop and other MCP-compatible agents, custom action registration for domain-specific operations, and API-based integration for embedding TuriX capabilities into larger systems. The platform is designed for personal use, research projects, and development environments requiring desktop automation without app-specific APIs.