### Run Local Script in Dev Mode to Start Client Source: https://github.com/microsoft/windowsagentarena/blob/main/docs/Development-Tips.md After preparing the image, use this command to run the entire setup, including starting the client process, in development mode. ```bash ./run-local.sh --mode dev --start-client true ``` -------------------------------- ### Start VM and Client Processes Individually Source: https://github.com/microsoft/windowsagentarena/blob/main/docs/Development-Tips.md Once the container is running in interactive mode, you can start the VM and client processes separately. ```bash ./start_vm.sh .. ./start_client.sh ``` -------------------------------- ### Install python-pptx Library Source: https://github.com/microsoft/windowsagentarena/blob/main/src/win-arena-container/client/desktop_env/evaluators/README.md Install the python-pptx library using pip for LibreOffice Press. ```shell pip install python-pptx ``` -------------------------------- ### Configuration File Example Source: https://github.com/microsoft/windowsagentarena/blob/main/README.md Create a config.json file at the project root with your API keys for OpenAI or Azure endpoints. ```json { "OPENAI_API_KEY": "", // if you are using OpenAI endpoint "AZURE_API_KEY": "", // if you are using Azure endpoint "AZURE_ENDPOINT": "https://yourendpoint.openai.azure.com/", // if you are using Azure endpoint } ``` -------------------------------- ### Install Dependencies and Playwright Source: https://github.com/microsoft/windowsagentarena/blob/main/src/win-arena-container/client/mm_agents/navi/screenparsing_oss/webparse/README.md Install Python requirements and Playwright browser binaries. Run this command in your terminal. ```bash pip install -r requirements.txt ; playwright install ``` -------------------------------- ### Testing a Custom Agent Source: https://github.com/microsoft/windowsagentarena/blob/main/docs/Develop-Agent.md Example Python code to instantiate and test a custom agent. It shows how to get observations, provide instructions, and execute predicted actions. ```python from mm_agents.my_agent.agent import MyAgent agent = MyAgent() obs = get_current_observation() # Function to retrieve the current observation instruction = "Your test instruction here" actions = agent.predict(instruction, obs) execute_actions(actions) # Function to execute the predicted actions ``` -------------------------------- ### Install python-docx and odfpy Libraries Source: https://github.com/microsoft/windowsagentarena/blob/main/src/win-arena-container/client/desktop_env/evaluators/README.md Install the python-docx and odfpy libraries using pip for LibreOffice Writer. ```shell pip install python-docx pip install odfpy ``` -------------------------------- ### Clone Repository and Install Dependencies Source: https://github.com/microsoft/windowsagentarena/blob/main/README.md These commands clone the WindowsAgentArena repository and install the required Python dependencies. Activate the 'winarena' Conda environment before running the pip install command. ```bash git clone https://github.com/microsoft/WindowsAgentArena.git cd WindowsAgentArena # Install the required dependencies in your python environment # conda activate winarena pip install -r requirements.txt ``` -------------------------------- ### Install Playwright for Python Source: https://github.com/microsoft/windowsagentarena/blob/main/src/win-arena-container/client/desktop_env/evaluators/README.md Install the Playwright library for Python using pip. After installation, run 'playwright install' to download necessary browser binaries. ```bash pip install playwright ``` ```bash playwright install ``` -------------------------------- ### Task JSON Configuration Example Source: https://github.com/microsoft/windowsagentarena/blob/main/docs/Develop-Tasks.md This JSON defines a task for WAA, including its ID, natural language instruction, initial configuration steps (launching VLC and simulating a click), an evaluator to check the outcome, and the expected result type. ```json { "id": "8ba5ae7a-5ae5-4eab-9fcc-5dd4fe3abf89-W0S", "instruction": "Help me modify the folder used to store my recordings to the Desktop", "config": [ { "type": "launch", "parameters": { "command": "vlc" } }, { "type": "execute", "parameters": { "command": [ "python", "-c", "import pyautogui; import time; pyautogui.click(960, 540); time.sleep(0.5);" ] } } ], "evaluator": { "func": "vis_vlc_recordings_folder", "expected": { "type": "rule", "rules": { "recording_file_path": "C:\\Users\\Docker\\Desktop" } } }, "result": { "type": "vlc_config", "dest": "vlcrc" } } ``` -------------------------------- ### Example Navi Agent Implementation Source: https://github.com/microsoft/windowsagentarena/blob/main/docs/Develop-Agent.md An example Python implementation of an agent named Navi, demonstrating the structure and required `predict()` and `reset()` methods. This serves as a template for custom agents. ```python # agent.py import logging from typing import Dict, List from PIL import Image from io import BytesIO import copy logger = logging.getLogger("desktopenv.agent") class NaviAgent: def __init__( self, server: str = "azure", model: str = "gpt-4o", som_config = None, som_origin = "oss", obs_view = "screen", auto_window_maximize = False, use_last_screen = True, temperature: float = 0.5, ): # Initialize agent parameters self.action_space = "code_block" self.server = server self.model = model # ... (additional initialization) def predict(self, instruction: str, obs: Dict) -> List: """ Predict the next action(s) based on the current observation. """ # Process the observation # Generate actions based on the instruction # ... actions = ["# Your code logic here"] return actions def reset(self): """ Reset the agent's internal state. """ # Reset logic pass ``` -------------------------------- ### Install Python Packages Source: https://github.com/microsoft/windowsagentarena/blob/main/src/win-arena-container/client/desktop_env/evaluators/README.md Installs necessary Python packages for image processing and hashing. Ensure pip is available in your environment. ```bash pip install opencv-python-headless Pillow imagehash ``` -------------------------------- ### Run Strongest Agent Configuration Source: https://github.com/microsoft/windowsagentarena/blob/main/README.md Executes the `run-local.sh` script with GPU enabled and specific origin and accessibility backend configurations for the strongest agent setup. ```bash ./run-local.sh --gpu-enabled true --som-origin mixed-omni --a11y-backend uia ``` -------------------------------- ### Run Local Script with Specific Model Origin and GPU Enabled Source: https://github.com/microsoft/windowsagentarena/blob/main/README.md This command is used to run the local deployment with a specific screen understanding model origin and enables GPU usage. Ensure you have the necessary dependencies installed. ```bash ./run-local.sh --som-origin mixed-omni --gpu-enabled true ``` -------------------------------- ### Import Playwright Module in Python Source: https://github.com/microsoft/windowsagentarena/blob/main/src/win-arena-container/client/desktop_env/evaluators/README.md Import the synchronous Playwright API at the beginning of your Python script to start browser automation. ```python from playwright.sync_api import sync_playwright ``` -------------------------------- ### Change Difficulty Level in Start Client Script Source: https://github.com/microsoft/windowsagentarena/blob/main/README.md To enable the new harder difficulty mode, modify the 'diff_lvl' parameter in the 'start_client.sh' script from 'normal' to 'hard'. This mode requires agents to initialize tasks themselves. ```bash diff_lvl="hard" ``` -------------------------------- ### Test Python Server Accessibility Source: https://github.com/microsoft/windowsagentarena/blob/main/docs/Development-Tips.md Send a GET request to the screenshot endpoint to verify that the Python server is running and accessible within the Docker container. ```bash curl -v -X GET http://20.20.20.21:5000/screenshot # you should get a HTTP/1.1 200 OK respose ``` -------------------------------- ### Run Local Script in Interactive Mode Source: https://github.com/microsoft/windowsagentarena/blob/main/docs/Development-Tips.md Launch the Docker container without starting the VM and client processes. This is useful for developing agents and extensions. ```bash cd scripts ./run-local.sh --interactive true ``` -------------------------------- ### Open a Web Page with Playwright Source: https://github.com/microsoft/windowsagentarena/blob/main/src/win-arena-container/client/desktop_env/evaluators/README.md Launches a Chromium browser, navigates to a specified URL, and closes the browser. Requires Playwright for Python to be installed. ```python from playwright.sync_api import sync_playwright def run(playwright): browser = playwright.chromium.launch() page = browser.new_page() page.goto("http://example.com") ## other actions... browser.close() with sync_playwright() as playwright: run(playwright) ``` -------------------------------- ### Upload Folder to Azure Blob Storage using Azure CLI Source: https://github.com/microsoft/windowsagentarena/blob/main/README.md Use this Azure CLI command to upload a local folder to an Azure Blob container. Ensure you have the Azure CLI installed and are logged in. ```bash az login --use-device-code az storage blob upload-batch --account-name --destination --source ``` -------------------------------- ### Start Chrome with Remote Debugging Enabled Source: https://github.com/microsoft/windowsagentarena/blob/main/src/win-arena-container/client/desktop_env/evaluators/README.md Modify the Chrome shortcut properties to include the --remote-debugging-port flag. This enables remote debugging on port 9222, allowing tools like Playwright to connect. ```bash "C:\Path\To\Chrome.exe" --remote-debugging-port=9222 ``` -------------------------------- ### Run Local Script Help Source: https://github.com/microsoft/windowsagentarena/blob/main/README.md Displays the help information for the `run-local.sh` script. ```bash ./run-local.sh --help ``` -------------------------------- ### Prepare Windows 11 Golden Image Source: https://github.com/microsoft/windowsagentarena/blob/main/README.md Run this script once to prepare a Windows 11 VM snapshot with all necessary programs and a Python server. Monitor progress at http://localhost:8006. ```bash cd ./scripts ./run-local.sh --prepare-image true ``` -------------------------------- ### Run Base Benchmark Source: https://github.com/microsoft/windowsagentarena/blob/main/README.md Launch the evaluation to run the baseline agent on all benchmark tasks. Navigate to the scripts directory first. ```bash cd scripts ./run-local.sh # For client/agent options: ``` -------------------------------- ### Baseline Navi Agent Configuration Source: https://github.com/microsoft/windowsagentarena/blob/main/README.md Sets up the baseline configuration for the Navi agent using webparse, groundingdino, and OCR (TesseractOCR). ```bash ./run-local.sh --som-origin oss ``` -------------------------------- ### Show Results Script Source: https://github.com/microsoft/windowsagentarena/blob/main/README.md Navigates to the client directory and runs the `show_results.py` script to display benchmark results. Requires specifying the directory where results are stored. ```bash cd src/win-arena-container/client python show_results.py --result_dir ``` -------------------------------- ### Run Local Benchmark with Custom Resources Source: https://github.com/microsoft/windowsagentarena/blob/main/README.md Execute the benchmark locally, overriding default RAM and CPU allocations. Useful for systems with limited resources. ```bash ./run-local.sh --ram-size 4G --cpu-cores 4 ``` -------------------------------- ### Run Azure Experiments Source: https://github.com/microsoft/windowsagentarena/blob/main/README.md Execute ML training jobs on Azure Compute Instances. Ensure your experiments.json is configured. ```bash cd scripts python run_azure.py --experiments_json "experiments.json" ``` -------------------------------- ### Run Azure Deployment with Experiment Parameters Source: https://github.com/microsoft/windowsagentarena/blob/main/README.md Execute the run_azure.py script with specific experiment parameters using command-line arguments. This is an alternative to manually editing experiments.json. ```bash cd scripts python run_azure.py --experiments_json "experiments.json" --update_json --exp_name "experiment_1" --ci_startup_script_path "Users//compute-instance-startup.sh" --agent "navi" --json_name "evaluation_examples_windows/test_all.json" --num_workers 4 --som_origin oss --a11y_backend win32 ``` -------------------------------- ### Build WinArena Docker Image Locally Source: https://github.com/microsoft/windowsagentarena/blob/main/README.md Build the WinArena image locally from the scripts directory. Use --build-base-image to rebuild the base image if Dockerfile-WinArena-Base changes. ```bash cd scripts ./build-container-image.sh ``` ```bash # If there are any changes in 'Dockerfile-WinArena-Base', use the --build-base-image flag to build also the base image locally # ./build-container-image.sh --build-base-image true # For other build options: # ./build-container-image.sh --help ``` -------------------------------- ### Recommended Navi Agent Configuration Source: https://github.com/microsoft/windowsagentarena/blob/main/README.md Uses the recommended configuration for the Navi agent, combining Omniparser with accessibility tree information for optimal results. ```bash ./run-local.sh --som-origin mixed-omni --a11y-backend uia ``` -------------------------------- ### Navigate to mm_agents Directory Source: https://github.com/microsoft/windowsagentarena/blob/main/docs/Develop-Agent.md Change the current directory to the `src/win-arena-container/client/mm_agents` folder where agent files are located. ```bash cd src/win-arena-container/client/mm_agents ``` -------------------------------- ### Show Azure Experiment Results Source: https://github.com/microsoft/windowsagentarena/blob/main/README.md Display experiment results from downloaded agent outputs. Requires a JSON configuration and the path to the results directory. ```bash cd scripts python show_azure.py --json_config "experiments.json" --result_dir ``` -------------------------------- ### Create New Agent Folder Source: https://github.com/microsoft/windowsagentarena/blob/main/docs/Develop-Agent.md Create a new directory for your custom agent within the `mm_agents` directory. Replace `my_agent` with your desired agent name. ```bash mkdir my_agent ``` -------------------------------- ### Copy Default Agent Template Source: https://github.com/microsoft/windowsagentarena/blob/main/docs/Develop-Agent.md Copy the `agent.py` file from the default agent's directory to your new agent's directory to use as a template. ```bash cp default_agent/agent.py my_agent/ ``` -------------------------------- ### Log in to Azure CLI Source: https://github.com/microsoft/windowsagentarena/blob/main/README.md Command to log in to Azure CLI. It may prompt for device code authentication. Use the commented-out lines if you need to specify a tenant or set a subscription. ```bash az login --use-device-code # If multiple tenants or subscriptions, make sure to select the right ones with: # az login --use-device-code --tenant "" # az account set --subscription "" ``` -------------------------------- ### Run Local Script in Dev Mode for Image Preparation Source: https://github.com/microsoft/windowsagentarena/blob/main/docs/Development-Tips.md Use this command to prepare the Windows golden image in development mode, enabling shared folders between the Docker host and the Windows VM. ```bash cd ./scripts ./run-local.sh --mode dev --prepare-image true ``` -------------------------------- ### Connect to Running Docker Container Source: https://github.com/microsoft/windowsagentarena/blob/main/docs/Development-Tips.md Connect to the running Docker container to test VM accessibility and Python server status. ```bash cd scripts ./run-local.sh --connect true ``` -------------------------------- ### Mixed OSS and Accessibility Tree Navi Agent Configuration Source: https://github.com/microsoft/windowsagentarena/blob/main/README.md Combines OSS detections with accessibility tree information for the Navi agent. ```bash ./run-local.sh --som-origin mixed-oss --a11y-backend uia ``` -------------------------------- ### Define Experiment Parameters in experiments.json Source: https://github.com/microsoft/windowsagentarena/blob/main/README.md This JSON structure defines parameters for an experiment run, including agent, model, and paths. Use this as a reference for your own experiment configurations. ```json { "experiment_1": { "ci_startup_script_path": "Users//compute-instance-startup.sh", "agent": "navi", "datastore_input_path": "storage", "docker_img_name": "windowsarena/winarena:latest", "exp_name": "experiment_1", "num_workers": 4, "use_managed_identity": false, "json_name": "evaluation_examples_windows/test_all.json", "model_name": "gpt-4-1106-vision-preview", "som_origin": "oss", "a11y_backend": "win32" } } ``` -------------------------------- ### Fast Accessibility Tree Navi Agent Configuration Source: https://github.com/microsoft/windowsagentarena/blob/main/README.md Configures the Navi agent to use a faster but less accurate accessibility tree backend. ```bash ./run-local.sh --som-origin a11y --a11y-backend win32 ``` -------------------------------- ### Run Local Benchmark Disabling KVM Acceleration Source: https://github.com/microsoft/windowsagentarena/blob/main/README.md Disable KVM acceleration for local benchmark runs. Not recommended due to performance impact; consider Azure for better performance. ```bash ./run-local.sh --use-kvm false ``` -------------------------------- ### Accurate Accessibility Tree Navi Agent Configuration Source: https://github.com/microsoft/windowsagentarena/blob/main/README.md Configures the Navi agent to use a slower but more accurate accessibility tree backend. ```bash ./run-local.sh --som-origin a11y --a11y-backend uia ``` -------------------------------- ### Add Azure Configuration to config.json Source: https://github.com/microsoft/windowsagentarena/blob/main/README.md Append these keys to your project's config.json file to specify Azure subscription, resource group, and workspace details. ```json { ... "AZURE_SUBSCRIPTION_ID": "", "AZURE_ML_RESOURCE_GROUP": "", "AZURE_ML_WORKSPACE_NAME": "" } ``` -------------------------------- ### Pull WinArena-Base Docker Image Source: https://github.com/microsoft/windowsagentarena/blob/main/README.md Pull the base image from Docker Hub to include necessary dependencies for running the code. ```bash docker pull windowsarena/winarena-base:latest ``` -------------------------------- ### OmniParser Navi Agent Configuration Source: https://github.com/microsoft/windowsagentarena/blob/main/README.md Configures the Navi agent to use Omniparser for screen element understanding. ```bash ./run-local.sh --som-origin omni ``` -------------------------------- ### Upload Custom Docker Image to Azure Container Registry Source: https://github.com/microsoft/windowsagentarena/blob/main/README.md Steps to log in to Azure and Docker, tag a local Docker image, and push it to your Azure Container Registry. This is for using custom images. ```bash az login --use-device-code # potentially needed if commands below don't work: az acr login --name docker login docker tag .azurecr.io/: docker push .azurecr.io/: ``` -------------------------------- ### Create Conda Environment for Windows Agent Arena Source: https://github.com/microsoft/windowsagentarena/blob/main/README.md This command creates a new Conda environment named 'winarena' with Python version 3.9, which is recommended for running the scripts. ```bash conda create -n winarena python=3.9 ``` -------------------------------- ### Forward Ports to Connect Python Server Externally Source: https://github.com/microsoft/windowsagentarena/blob/main/docs/Development-Tips.md Create a proxy server within the Docker container to forward specific ports (5000, 9222, 1337) to the Windows server's IP address, enabling external connections. ```bash # connect to the running docker cd scripts ./run-local.sh --connect true # It will forward the requests and responses from the ports to the windows server's IP inside the docker echo -n 5000 9222 1337 | xargs -d ' ' -I% bash -c 'socat tcp-listen:%,fork tcp: ``` -------------------------------- ### Required Libraries for LibreOffice Calc Source: https://github.com/microsoft/windowsagentarena/blob/main/src/win-arena-container/client/desktop_env/evaluators/README.md List of libraries required for LibreOffice Calc operations. ```text openpyxl pandas lxml xmltodict ``` -------------------------------- ### Convert Bash Scripts to Unix Format (WSL2) Source: https://github.com/microsoft/windowsagentarena/blob/main/README.md If running on WSL2 and encountering interpreter errors, convert bash scripts from DOS/Windows format to Unix format. ```bash cd ./scripts find . -maxdepth 1 -type f -exec dos2unix {} + ``` -------------------------------- ### Generate CSV from XLSX using LibreOffice Source: https://github.com/microsoft/windowsagentarena/blob/main/src/win-arena-container/client/desktop_env/evaluators/README.md Convert an XLSX file to CSV format using the LibreOffice command-line tool. Specify conversion options and output directory. The last parameter indicates the sheet number to export. ```shell libreoffice --convert-to "csv:Text - txt - csv (StarCalc):44,34,UTF8,,,,false,true,true,false,false,1" --out-dir /home/user /home/user/abc.xlsx ``` -------------------------------- ### Disable System Crash Report Source: https://github.com/microsoft/windowsagentarena/blob/main/src/win-arena-container/client/desktop_env/evaluators/README.md Disable the system crash report by editing the apport configuration file. ```shell sudo vim /etc/default/apport ``` === COMPLETE CONTENT === This response contains all available snippets from this library. No additional content exists. Do not make further requests.