### Install Midscene Python from Source Source: https://github.com/python51888/midscene-python/blob/master/wiki/安装配置.md Clone the repository and install the package in editable mode with its dependencies. ```bash git clone https://gitee.com/Python51888/midscene-python.git cd midscene-python pip install -e . ``` -------------------------------- ### Android Automation Example Source: https://github.com/python51888/midscene-python/blob/master/wiki/快速开始.md Automate Android applications using Midscene's Agent and AndroidDevice. This example connects to a device, starts an app, performs login actions, and asserts the login state. ```python import asyncio from midscene import Agent from midscene.android import AndroidDevice async def android_example(): """Android 应用自动化""" # 连接 Android 设备 device = AndroidDevice() await device.connect() # 创建 Agent agent = Agent(device) # 启动应用 await device.start_app("com.example.app") # 自然语言操作 await agent.ai_action("点击登录按钮") await agent.ai_action("输入用户名 'testuser'") await agent.ai_action("输入密码 'password123'") await agent.ai_action("点击确认登录") # 验证登录状态 await agent.ai_assert("显示用户已登录") print("✅ Android 自动化完成!") # 运行示例 asyncio.run(android_example()) ``` -------------------------------- ### Install and Configure Playwright Source: https://github.com/python51888/midscene-python/blob/master/wiki/安装配置.md Install Playwright and its browser binaries. You can install all browsers or specific ones like Chromium to save space. ```bash pip install playwright playwright install ``` ```bash playwright install chromium ``` -------------------------------- ### Basic Installation Verification Test Source: https://github.com/python51888/midscene-python/blob/master/wiki/安装配置.md An asynchronous Python script to verify the successful import of Midscene modules and initialization of the AIModelService. ```python # test_installation.py import asyncio from midscene import Agent from midscene.core.ai_model import AIModelService async def test_installation(): """测试安装是否成功""" # 测试导入 print("✓ 导入模块成功") # 测试 AI 服务配置 try: ai_service = AIModelService() print("✓ AI 服务初始化成功") except Exception as e: print(f"✗ AI 服务初始化失败: {e}") print("🎉 安装验证完成!") # 运行测试 asyncio.run(test_installation()) ``` -------------------------------- ### Install Browser Drivers Source: https://github.com/python51888/midscene-python/blob/master/wiki/快速开始.md Install necessary browser drivers for web automation using pip. Choose between Selenium WebDriver or Playwright. ```bash # Selenium WebDriver pip install webdriver-manager # 或者 Playwright pip install playwright playwright install ``` -------------------------------- ### Install ADB on Linux and macOS Source: https://github.com/python51888/midscene-python/blob/master/wiki/安装配置.md Commands to install the Android Debug Bridge (ADB) on Ubuntu/Debian and macOS systems. ```bash sudo apt-get install android-tools-adb ``` ```bash brew install android-platform-tools ``` -------------------------------- ### Configure Selenium with WebDriver Manager Source: https://github.com/python51888/midscene-python/blob/master/wiki/安装配置.md Install Selenium and WebDriver Manager, then use it to automatically manage and install the correct WebDriver for Chrome. ```bash pip install selenium webdriver-manager ``` ```python from selenium import webdriver from webdriver_manager.chrome import ChromeDriverManager from selenium.webdriver.chrome.service import Service service = Service(ChromeDriverManager().install()) driver = webdriver.Chrome(service=service) ``` -------------------------------- ### Developer Installation for Midscene Python Source: https://github.com/python51888/midscene-python/blob/master/wiki/安装配置.md Install the package in editable mode along with development and documentation dependencies, and set up pre-commit hooks. ```bash git clone https://gitee.com/Python51888/midscene-python.git cd midscene-python pip install -e ".[dev,docs]" pre-commit install ``` -------------------------------- ### Create and Activate a Virtual Environment Source: https://github.com/python51888/midscene-python/blob/master/wiki/安装配置.md Steps to create a virtual environment using Python's venv module and activate it on Linux/macOS and Windows, before installing packages. ```bash # 创建虚拟环境(推荐) python -m venv midscene-env source midscene-env/bin/activate # Linux/macOS # 或 midscene-env\Scripts\activate # Windows # 在虚拟环境中安装 pip install midscene-python ``` -------------------------------- ### AI Element Localization Examples Source: https://github.com/python51888/midscene-python/blob/master/wiki/核心概念/Agent核心控制器.md Demonstrates using `ai_locate` for precise element identification based on various descriptions, including basic names, visual attributes, and relative positioning. ```python # 基础定位 login_btn = await agent.ai_locate("登录按钮") search_box = await agent.ai_locate("搜索输入框") # 描述性定位 submit_btn = await agent.ai_locate("蓝色的提交按钮") user_avatar = await agent.ai_locate("页面右上角的用户头像") # 相对定位 next_btn = await agent.ai_locate("位于分页控件中的下一页按钮") ``` -------------------------------- ### Install Midscene Python Source: https://github.com/python51888/midscene-python/blob/master/README.zh.md Install the midscene-python package using pip. Ensure you have Python 3.9+. ```bash pip install midscene-python ``` -------------------------------- ### Basic and Complex ai_action Examples Source: https://github.com/python51888/midscene-python/blob/master/wiki/核心概念/Agent核心控制器.md Showcases various use cases for the `ai_action` method, from simple clicks and input to complex conditional operations and multi-step actions. ```python # 基础交互 await agent.ai_action("点击登录按钮") await agent.ai_action("在用户名框输入 'admin'") await agent.ai_action("选择下拉菜单中的第二个选项") # 复杂操作 await agent.ai_action("滚动到页面底部并点击加载更多按钮") await agent.ai_action("在搜索框输入'Python'并按回车搜索") # 条件操作 await agent.ai_action("如果页面显示错误信息,点击确定按钮") ``` -------------------------------- ### Configure AI Model Source: https://github.com/python51888/midscene-python/blob/master/wiki/快速开始.md Create a .env file to configure your AI model API key and optional base URL. This example shows OpenAI configuration. ```bash # .env OPENAI_API_KEY=your_openai_api_key_here OPENAI_BASE_URL=https://api.openai.com/v1 # 可选,默认官方 API ``` -------------------------------- ### Install Midscene Python via Pip Source: https://github.com/python51888/midscene-python/blob/master/wiki/安装配置.md Use pip to install the latest version or a specific version of the midscene-python package. ```bash pip install midscene-python ``` ```bash pip install midscene-python==0.1.0 ``` -------------------------------- ### Install Optional Dependencies for Midscene Python Source: https://github.com/python51888/midscene-python/blob/master/wiki/安装配置.md Install optional dependency groups for Midscene Python, such as 'dev' for development tools, 'docs' for documentation tools, or both. ```bash # 开发工具 pip install "midscene-python[dev]" ``` ```bash # 文档工具 pip install "midscene-python[docs]" ``` ```bash # 全部依赖 pip install "midscene-python[dev,docs]" ``` -------------------------------- ### Perform Login Automation with Agent Source: https://context7.com/python51888/midscene-python/llms.txt Demonstrates how to use the Agent to perform a login sequence on a web page using natural language commands. Requires SeleniumWebPage setup. ```python import asyncio from midscene import Agent from midscene.web import SeleniumWebPage async def login_automation(): # Create web page with Selenium with SeleniumWebPage.create(headless=False, window_size=(1920, 1080)) as page: agent = Agent(page) # Navigate to website await page.navigate_to("https://example.com/login") # Perform login using natural language commands await agent.ai_action("Click the login button") await agent.ai_action("Enter 'user@example.com' in the email field") await agent.ai_action("Enter 'password123' in the password field") await agent.ai_action("Click the submit button") # Wait for login to complete await agent.ai_wait_for("Dashboard is visible", timeout_ms=10000) # Verify login was successful await agent.ai_assert("User is logged in and welcome message is displayed") asyncio.run(login_automation()) ``` -------------------------------- ### Web Automation: Search Example Source: https://github.com/python51888/midscene-python/blob/master/wiki/快速开始.md Perform a web search using natural language commands with Midscene's Agent and SeleniumWebPage. Navigates to Baidu, inputs a search query, clicks search, and asserts results. ```python import asyncio from midscene import Agent from midscene.web import SeleniumWebPage async def search_example(): """在百度搜索 Python 教程""" # 创建 Web 页面实例 with SeleniumWebPage.create() as page: # 创建 Agent agent = Agent(page) # 导航到网站 await page.goto("https://www.baidu.com") # 使用自然语言进行搜索 await agent.ai_action("在搜索框输入'Python 教程'") await agent.ai_action("点击搜索按钮") # 验证搜索结果 await agent.ai_assert("页面显示了 Python 教程的搜索结果") print("✅ 搜索操作完成!") # 运行示例 asyncio.run(search_example()) ``` -------------------------------- ### AI Assertion Examples Source: https://github.com/python51888/midscene-python/blob/master/wiki/核心概念/Agent核心控制器.md Provides examples of using `ai_assert` for verifying page states, content, and conditional logic, ensuring the application behaves as expected. ```python # 状态验证 await agent.ai_assert("用户已成功登录") await agent.ai_assert("页面显示错误信息") await agent.ai_assert("表单验证通过") # 内容验证 await agent.ai_assert("搜索结果包含'Python 教程'") await agent.ai_assert("购物车中有 3 件商品") await agent.ai_assert("订单状态为已发货") # 条件验证 await agent.ai_assert("如果是新用户,显示欢迎向导") ``` -------------------------------- ### Web Automation: Data Extraction Example Source: https://github.com/python51888/midscene-python/blob/master/wiki/快速开始.md Extract structured data from a news website using Midscene's Agent. This example extracts news titles, publication times, and summaries. ```python import asyncio from midscene import Agent from midscene.web import SeleniumWebPage async def extract_example(): """提取新闻标题""" with SeleniumWebPage.create() as page: agent = Agent(page) # 访问新闻网站 await page.goto("https://news.example.com") # 提取结构化数据 news_data = await agent.ai_extract({ "articles": [ { "title": "新闻标题", "time": "发布时间", "summary": "新闻摘要" } ] }) # 输出结果 for article in news_data["articles"]: print(f"📰 {article['title']}") print(f"⏰ {article['time']}") print(f"📄 {article['summary']}\n") # 运行示例 asyncio.run(extract_example()) ``` -------------------------------- ### Check and Install Python Version Source: https://github.com/python51888/midscene-python/blob/master/wiki/安装配置.md Command to check the current Python version and instructions for installing Python 3.9 on different operating systems if needed. ```bash # 检查 Python 版本 python --version ``` ```bash # 如果版本低于 3.9,安装新版本 # Ubuntu/Debian sudo apt-get install python3.9 # macOS brew install python@3.9 # Windows # 从 python.org 下载安装 ``` -------------------------------- ### Web Automation YAML Script Example Source: https://github.com/python51888/midscene-python/blob/master/docs/quickstart.md Define a web automation script in YAML format, specifying browser settings, tasks, steps with AI actions, data extraction, and assertions. ```yaml # Web 自动化脚本 web: url: "https://example.com" browser: "chrome" headless: false tasks: - name: "登录操作" steps: - action: "ai_action" prompt: "点击登录按钮" - action: "ai_action" prompt: "输入用户名 'demo@example.com'" - action: "ai_action" prompt: "输入密码 'password123'" - action: "ai_action" prompt: "点击提交按钮" - name: "数据提取" steps: - action: "ai_extract" prompt: username: "用户名" email: "邮箱地址" save_to: "user_info" - name: "状态验证" steps: - action: "ai_assert" prompt: "页面显示欢迎信息" ``` -------------------------------- ### Configure Pip with a Mirror Source Source: https://github.com/python51888/midscene-python/blob/master/wiki/安装配置.md Install packages using a specific mirror source (e.g., Tsinghua University) or configure pip to use a mirror source permanently for faster downloads. ```bash # 使用国内镜像源 pip install -i https://pypi.tuna.tsinghua.edu.cn/simple midscene-python # 或配置永久镜像源 pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple ``` -------------------------------- ### Web Automation with Playwright Source: https://github.com/python51888/midscene-python/blob/master/docs/quickstart.md Perform web automation using Playwright and Midscene's Agent. This example demonstrates navigation, AI actions, and extracting page information. ```python import asyncio from midscene import Agent from midscene.web import PlaywrightWebPage async def playwright_automation(): # 创建 Playwright 页面 async with await PlaywrightWebPage.create() as page: agent = Agent(page) await page.navigate_to("https://playwright.dev") await agent.ai_action("点击文档链接") # 提取页面信息 page_info = await agent.ai_extract({ "title": "页面标题", "sections": ["主要章节列表"] }) print(f"页面信息: {page_info}") asyncio.run(playwright_automation()) ``` -------------------------------- ### Enable Detailed httpx Request Logging Source: https://github.com/python51888/midscene-python/blob/master/wiki/核心概念/AI模型服务抽象层.md Enable DEBUG level logging for the 'httpx' library to get detailed insights into HTTP requests and responses. ```python import logging # 启用详细日志 logging.getLogger("httpx").setLevel(logging.DEBUG) ``` -------------------------------- ### AI Action: Search with Python Tutorial Source: https://github.com/python51888/midscene-python/blob/master/README.zh.md Example of using the ai_action method to perform a search query on a web page using natural language. The agent will interpret '在搜索框中输入'Python教程'并搜索' to locate the search box, input the text, and initiate the search. ```python await agent.ai_action("在搜索框中输入'Python教程'并搜索") ``` -------------------------------- ### Data Extraction Examples Source: https://github.com/python51888/midscene-python/blob/master/wiki/核心概念/Agent核心控制器.md Illustrates how to use `ai_extract` to retrieve structured data from a page, including single objects, lists of objects, and complex nested data structures. ```python # 提取单个对象 user_info = await agent.ai_extract({ "name": "用户姓名", "email": "邮箱地址", "role": "用户角色" }) # 提取列表数据 products = await agent.ai_extract({ "products": [ { "name": "商品名称", "price": "价格", "rating": "评分", "in_stock": "是否有货" } ] }) # 复杂嵌套结构 order_data = await agent.ai_extract({ "order_id": "订单号", "customer": { "name": "客户姓名", "address": "送货地址" }, "items": [ { "product": "商品名称", "quantity": "数量", "price": "单价" } ], "total": "总金额" }) ``` -------------------------------- ### AI Extract: Product Information Source: https://github.com/python51888/midscene-python/blob/master/README.zh.md Shows how to use the ai_extract method to pull structured data from a web page. The example defines a desired JSON structure for product information, specifying fields like 'name', 'price', and 'rating', which the agent will attempt to extract. ```python products = await agent.ai_extract({ "products": [ {"name": "产品名称", "price": "价格", "rating": "评分"} ] }) ``` -------------------------------- ### Initialize Midscene Project Source: https://github.com/python51888/midscene-python/blob/master/docs/quickstart.md Initialize a new Midscene project directory using the CLI. ```bash midscene init my-project cd my-project ``` -------------------------------- ### Enable Detailed Logging Source: https://github.com/python51888/midscene-python/blob/master/wiki/快速开始.md Enable debug-level logging for detailed insights into the automation process by calling setup_logger with logging.DEBUG. ```python import logging from midscene.shared.logger import setup_logger # 启用调试日志 setup_logger(level=logging.DEBUG) ``` -------------------------------- ### Initialize Agents for Web and Android Source: https://github.com/python51888/midscene-python/blob/master/wiki/核心概念/Agent核心控制器.md Demonstrates initializing Agent instances for different platforms using the same interface. Ensure the appropriate platform interface (e.g., selenium_page, android_device) is passed during initialization. ```python # Web 和 Android 使用相同的接口 web_agent = Agent(selenium_page) android_agent = Agent(android_device) # 相同的操作方法 await web_agent.ai_action("点击登录按钮") await android_agent.ai_action("点击登录按钮") ``` -------------------------------- ### Agent Lifecycle Management Source: https://github.com/python51888/midscene-python/blob/master/wiki/核心概念/Agent核心控制器.md Illustrates two methods for managing the Agent's lifecycle: manual initialization and destruction using a try-finally block, and using the recommended asynchronous context manager. ```python # 方式1: 手动管理 agent = Agent(page) try: await agent.ai_action("执行操作") finally: await agent.destroy() # 方式2: 上下文管理器(推荐) async with Agent(page) as agent: await agent.ai_action("执行操作") # 自动调用 destroy() ``` -------------------------------- ### Configure AI Provider via YAML File Source: https://github.com/python51888/midscene-python/blob/master/docs/quickstart.md Create a `midscene.yml` configuration file to set the AI provider, model, and API key. ```yaml ai: provider: "openai" model: "gpt-4-vision-preview" api_key: "your-api-key-here" ``` -------------------------------- ### AgentOptions Configuration Source: https://github.com/python51888/midscene-python/blob/master/wiki/核心概念/Agent核心控制器.md Demonstrates how to configure Agent behavior using the `AgentOptions` class, including settings for timeouts, retries, debugging, performance, and AI model parameters. ```python from midscene.core import AgentOptions options = AgentOptions( # 超时设置 timeout=30, # 操作超时时间(秒) # 重试机制 retry_count=3, # 失败重试次数 retry_delay=1.0, # 重试间隔(秒) # 调试选项 screenshot_on_error=True, # 错误时自动截图 save_execution_logs=True, # 保存执行日志 # 性能优化 cache_enabled=True, # 启用智能缓存 parallel_execution=False, # 并行执行(实验性) # AI 模型设置 model_temperature=0.1, # AI 响应随机性 max_tokens=1000, # 最大 token 数 ) agent = Agent(page, options=options) ``` -------------------------------- ### Basic Web Automation with Agent Source: https://github.com/python51888/midscene-python/blob/master/README.zh.md Demonstrates basic web automation using the Agent with SeleniumWebPage. This involves creating a page instance, initializing an agent, and performing actions like clicking, inputting text, and extracting data using natural language commands. Requires async execution. ```python from midscene import Agent from midscene.web import SeleniumWebPage # 创建 Web Agent with SeleniumWebPage.create() as page: agent = Agent(page) # 使用自然语言进行自动化操作 await agent.ai_action("点击登录按钮") await agent.ai_action("输入用户名 'test@example.com'") await agent.ai_action("输入密码 'password123'") await agent.ai_action("点击提交按钮") # 数据提取 user_info = await agent.ai_extract("提取用户个人信息") # 断言验证 await agent.ai_assert("页面显示欢迎信息") ``` -------------------------------- ### Configure Agent with Options Source: https://github.com/python51888/midscene-python/blob/master/docs/quickstart.md Initialize the Midscene Agent with custom options, such as enabling caching with a `cache_id` and enabling report generation. ```python from midscene.core import AgentOptions options = AgentOptions( cache_id="my_automation", generate_report=True ) agent = Agent(page, options) ``` -------------------------------- ### Configure AI Model API Keys using Environment Variables Source: https://github.com/python51888/midscene-python/blob/master/wiki/安装配置.md Set up API keys and optional base URLs for various AI providers (OpenAI, Anthropic, DashScope, Gemini) and configure default Midscene AI settings in a .env file. ```dotenv # OpenAI 配置 OPENAI_API_KEY=sk-your-openai-api-key OPENAI_BASE_URL=https://api.openai.com/v1 # 可选 # Anthropic 配置 ANTHROPIC_API_KEY=sk-ant-your-anthropic-key # 通义千问配置 DASHSCOPE_API_KEY=sk-your-dashscope-key # Gemini 配置 GOOGLE_API_KEY=AIza-your-google-api-key # 默认模型配置 MIDSCENE_AI_PROVIDER=openai MIDSCENE_AI_MODEL=gpt-4-vision-preview ``` -------------------------------- ### Qwen (通义千问) Model Configuration Source: https://github.com/python51888/midscene-python/blob/master/wiki/核心概念/AI模型服务抽象层.md Configure the Qwen provider with models like qwen-vl-max, using the DashScope API Key and setting the temperature. ```python # Qwen 配置 qwen_config = AIModelConfig( provider="qwen", model="qwen-vl-max", api_key="sk-…", # DashScope API Key temperature=0.1 ) ``` -------------------------------- ### Playwright Web Page Automation Source: https://context7.com/python51888/midscene-python/llms.txt Automate web browser interactions using Playwright with AI-driven actions and direct page manipulation. Requires Playwright to be installed. ```python import asyncio from midscene import Agent from midscene.web import PlaywrightWebPage async def playwright_automation(): # Create Playwright page with async context manager async with await PlaywrightWebPage.create( headless=False, viewport_size=(1920, 1080) ) as page: agent = Agent(page) # Navigate to URL await page.navigate_to("https://playwright.dev") # AI-driven interactions await agent.ai_action("Click on the Get Started button") await agent.ai_action("Search for 'selectors' in the search box") # Extract documentation info docs = await agent.ai_extract({ "current_section": "Current documentation section title", "code_examples": ["Code example snippets on the page"] }) print(f"Documentation: {docs}") # Direct page interactions await page.scroll("down", distance=300) await page.evaluate_script("console.log('Test complete')") asyncio.run(playwright_automation()) ``` -------------------------------- ### Agent Class Structure Source: https://github.com/python51888/midscene-python/blob/master/wiki/核心概念/Agent核心控制器.md Illustrates the core components of the Agent class, including its initialization with platform interfaces, options, AI services, and execution modules. ```python class Agent: """Core Agent class that orchestrates AI model and device interactions""" def __init__( self, interface: AbstractInterface, options: Optional[AgentOptions] = None ): self.interface = interface # 平台接口 self.options = options or AgentOptions() # 配置选项 self.ai_service = AIModelService() # AI 服务 self.insight = Insight(...) self.task_executor = TaskExecutor(...) ``` -------------------------------- ### Negative Assertion Source: https://github.com/python51888/midscene-python/blob/master/wiki/核心概念/Insight-UI理解引擎.md Assert that a specific condition is NOT met, for example, 'the page does not display an error message'. This is useful for ensuring the absence of unwanted elements or states. ```python # 否定断言 result = await insight.assert_condition("页面没有显示错误信息") ``` -------------------------------- ### Basic Web Automation with Midscene Python Agent Source: https://github.com/python51888/midscene-python/blob/master/README.md Demonstrates basic web automation tasks using the Midscene Agent with Selenium. This includes logging in, extracting data, and verifying assertions via natural language commands. ```python from midscene import Agent from midscene.web import SeleniumWebPage # Create a Web Agent with SeleniumWebPage.create() as page: agent = Agent(page) # Perform automation operations using natural language await agent.ai_action("Click the login button") await agent.ai_action("Enter username 'test@example.com'") await agent.ai_action("Enter password 'password123'") await agent.ai_action("Click the submit button") # Data extraction user_info = await agent.ai_extract("Extract user personal information") # Assertion verification await agent.ai_assert("Page displays welcome message") ``` -------------------------------- ### List Android Devices with Midscene CLI Source: https://github.com/python51888/midscene-python/blob/master/docs/quickstart.md Use the Midscene CLI to list available Android devices connected to the system. ```bash midscene devices ``` -------------------------------- ### Configure AI Provider via Environment Variables Source: https://github.com/python51888/midscene-python/blob/master/docs/quickstart.md Set environment variables to configure the AI provider, model, and API key for Midscene. ```bash export MIDSCENE_AI_PROVIDER=openai export MIDSCENE_AI_MODEL=gpt-4-vision-preview export MIDSCENE_AI_API_KEY=your-api-key-here ``` -------------------------------- ### Complex Condition Assertion Source: https://github.com/python51888/midscene-python/blob/master/wiki/核心概念/Insight-UI理解引擎.md Assert complex, conditional logic on the page, such as 'if it's a new user, display a welcome guide'. This allows for testing conditional UI behaviors. ```python # 复杂条件断言 result = await insight.assert_condition( "如果是新用户,页面应该显示欢迎指引" ) ``` -------------------------------- ### Android Device Low-Level Control Source: https://context7.com/python51888/midscene-python/llms.txt Provides direct ADB-based control for Android devices, including UI hierarchy parsing, input simulation, and app management. Requires ADB setup and a connected device. ```python import asyncio from midscene.android.device import AndroidDevice async def device_control(): # List available devices devices = await AndroidDevice.list_devices() print(f"Available devices: {devices}") # Connect to device device = await AndroidDevice.create(device_id=devices[0]) # Get UI context with screenshot and element tree context = await device.get_context() print(f"Screen size: {context.size.width}x{context.size.height}") print(f"Found {len(context.content)} UI elements") # Direct input operations await device.tap(540, 960) await device.input_text("Hello Android") await device.clear_text() # Scrolling and gestures await device.scroll("down", distance=500) await device.swipe(100, 500, 900, 500, duration=200) # Swipe right await device.long_press(540, 960, duration=2000) # Key events await device.key_event("KEYCODE_ENTER") await device.back() await device.home() # App management await device.install_app("/path/to/app.apk") await device.launch_app("com.example.app", activity=".MainActivity") await device.stop_app("com.example.app") # Disconnect await device.disconnect() asyncio.run(device_control()) ``` -------------------------------- ### Optimize Context for Different Insight Actions Source: https://github.com/python51888/midscene-python/blob/master/wiki/核心概念/Insight-UI理解引擎.md Provide optimized UI context based on the Insight action being performed. For example, provide detailed element information for locate actions and full page content for extract actions. ```python # 为不同操作提供优化的上下文 async def optimized_context_provider(action: InsightAction) -> UIContext: base_context = await page.get_context() if action == InsightAction.LOCATE: # 定位操作需要更详细的元素信息 base_context.elements = await page.get_all_elements() elif action == InsightAction.EXTRACT: # 提取操作需要更完整的页面内容 base_context.page_content = await page.get_page_content() return base_context ``` -------------------------------- ### AI Model Service Configuration Source: https://context7.com/python51888/midscene-python/llms.txt Configure AI providers like OpenAI, Anthropic, Qwen, and Gemini using environment variables or explicit configuration objects. Ensure necessary API keys and model names are set. ```python import asyncio import os from midscene import Agent from midscene.web import SeleniumWebPage from midscene.core.ai_model import AIModelService, AIModelConfig from midscene.core.types import AgentOptions # Method 1: Environment variables (recommended) os.environ["MIDSCENE_AI_PROVIDER"] = "openai" os.environ["MIDSCENE_AI_MODEL"] = "gpt-4-vision-preview" os.environ["MIDSCENE_AI_API_KEY"] = "sk-your-api-key" ``` -------------------------------- ### AbstractInterface Definition in Python Source: https://github.com/python51888/midscene-python/blob/master/wiki/核心概念/UI上下文与数据模型.md Defines the abstract base class `AbstractInterface` for platform-specific implementations, outlining methods for getting UI context, available actions, tapping, inputting text, and scrolling. This serves as a contract for different platform integrations. ```python class AbstractInterface(ABC): """平台实现的抽象接口""" @property @abstractmethod def interface_type(self) -> InterfaceType: """获取接口类型""" pass @abstractmethod async def get_context(self) -> UIContext: """获取当前 UI 上下文""" pass @abstractmethod async def action_space(self) -> List[str]: """获取可用操作列表""" pass @abstractmethod async def tap(self, x: float, y: float) -> None: """在坐标处点击""" pass @abstractmethod async def input_text(self, text: str) -> None: """输入文本""" pass @abstractmethod async def scroll(self, direction: str, distance: Optional[int] = None) -> None: """滚动操作""" pass ``` -------------------------------- ### AIModelService Class Structure Source: https://github.com/python51888/midscene-python/blob/master/wiki/核心概念/AI模型服务抽象层.md The core service class initializes providers and offers a unified interface for AI calls. ```python class AIModelService: """Unified AI model service interface""" def __init__(self): self.providers: Dict[str, AIProvider] = {} self._register_providers() async def call_ai( self, messages: List[Dict[str, Any]], response_schema: Optional[Type[BaseModel]] = None, model_config: Optional[AIModelConfig] = None, **kwargs ) -> Dict[str, Any]: """统一的 AI 调用接口""" ``` -------------------------------- ### Accessing Context Information in Python Source: https://github.com/python51888/midscene-python/blob/master/wiki/核心概念/UI上下文与数据模型.md Demonstrates how to access and process context information such as screenshots, viewport dimensions, and UI elements. ```python screenshot_data = base64.b64decode(context.screenshot_base64) viewport_width = context.size.width all_buttons = [elem for elem in context.content if elem.node_type == NodeType.BUTTON] ``` -------------------------------- ### Basic Element Localization Source: https://github.com/python51888/midscene-python/blob/master/wiki/核心概念/Insight-UI理解引擎.md Locate elements using simple, direct natural language descriptions. This is the most straightforward way to find UI components. ```python # 基础定位 login_btn = await insight.locate("登录按钮") search_box = await insight.locate("搜索输入框") ``` -------------------------------- ### Configure Connection Pooling with httpx Source: https://github.com/python51888/midscene-python/blob/master/wiki/核心概念/AI模型服务抽象层.md Configure httpx.AsyncClient with connection pooling limits and timeouts for efficient reuse of connections. ```python class OptimizedAIProvider(AIProvider): def __init__(self): # 配置连接池 self.client = httpx.AsyncClient( limits=httpx.Limits( max_keepalive_connections=10, max_connections=20 ), timeout=httpx.Timeout(60.0) ) async def call(self, messages, config, **kwargs): # 复用连接 response = await self.client.post(...) return response.json() ``` -------------------------------- ### Configure AI Model with Explicit Settings Source: https://context7.com/python51888/midscene-python/llms.txt Use this to set up a specific AI model with custom parameters like provider, model name, API key, and generation settings. Ensure the API key is kept secure. ```python config = AIModelConfig( provider="anthropic", # openai, anthropic, qwen, gemini model="claude-3-opus-20240229", api_key="your-api-key", base_url=None, # Optional custom endpoint max_tokens=4000, temperature=0.1, timeout=60 ) async def custom_ai_config(): with SeleniumWebPage.create() as page: # Create agent with custom model config options = AgentOptions( model_config=lambda: config, generate_report=True, group_name="Custom AI Test" ) agent = Agent(page, options=options) await page.navigate_to("https://example.com") await agent.ai_action("Click the main button") asyncio.run(custom_ai_config()) ``` -------------------------------- ### Initialize Midscene Agent Source: https://github.com/python51888/midscene-python/blob/master/docs/quickstart.md Create an instance of the Midscene Agent, which acts as the core controller for automation tasks, coordinating AI models with device interactions. ```python from midscene import Agent from midscene.web import SeleniumWebPage page = SeleniumWebPage.create() agent = Agent(page) ``` -------------------------------- ### OpenAI Model Configuration Source: https://github.com/python51888/midscene-python/blob/master/wiki/核心概念/AI模型服务抽象层.md Configure the OpenAI provider with specific model names like gpt-4-vision-preview or gpt-4o, and optionally set a custom base URL. ```python # GPT-4V 配置 openai_config = AIModelConfig( provider="openai", model="gpt-4-vision-preview", # 或 "gpt-4o" api_key="sk-…", base_url="https://api.openai.com/v1", # 可选 temperature=0.1 ) ``` -------------------------------- ### Configure AI Model Options Source: https://github.com/python51888/midscene-python/blob/master/wiki/快速开始.md Customize AI model configurations such as provider, model name, temperature, and max tokens using AIModelConfig. ```python from midscene.core.ai_model import AIModelConfig # 自定义 AI 配置 config = AIModelConfig( provider="openai", # 或 "claude", "qwen", "gemini" model="gpt-4-vision-preview", temperature=0.1, max_tokens=1000 ) agent = Agent(page, ai_config=config) ``` -------------------------------- ### Static UIContext Initialization Source: https://github.com/python51888/midscene-python/blob/master/wiki/核心概念/Insight-UI理解引擎.md Initialize the UIContext with static information like a base64 encoded screenshot, page title, and URL. This provides a fixed snapshot of the UI for the Insight engine. ```python # 静态上下文 context = UIContext( screenshot_base64="...", page_title="登录页面", url="https://example.com/login" ) insight = Insight(context) ``` -------------------------------- ### Insight Class Initialization Source: https://github.com/python51888/midscene-python/blob/master/wiki/核心概念/Insight-UI理解引擎.md Initializes the Insight engine with a context provider, an optional AI service, and model configuration. The context provider is essential for fetching UI information. ```python class Insight: """AI-powered UI understanding and reasoning engine""" def __init__( self, context_provider: Union[UIContext, Callable], ai_service: Optional[AIModelService] = None, model_config: Optional[AIModelConfig] = None ): self.context_provider = context_provider # 上下文提供者 self.ai_service = ai_service # AI 模型服务 self.model_config = model_config # 模型配置 self._dump_subscribers = [] # 调试订阅者 ``` -------------------------------- ### Multiple AI Configurations for Different Tasks Source: https://github.com/python51888/midscene-python/blob/master/wiki/核心概念/AI模型服务抽象层.md Manage and use different AI model configurations for distinct tasks, adjusting parameters like temperature and max tokens as needed. ```python # 为不同任务配置不同的模型 configs = { "locate": AIModelConfig( provider="openai", model="gpt-4-vision-preview", temperature=0.0, # 定位需要确定性 max_tokens=500 ), "extract": AIModelConfig( provider="claude", model="claude-3-sonnet-20240229", temperature=0.2, # 提取允许创造性 max_tokens=2000 ), "assert": AIModelConfig( provider="qwen", model="qwen-vl-max", temperature=0.1, max_tokens=1000 ) } # 根据任务选择配置 result = await ai_service.call_ai( messages=messages, model_config=configs["locate"] ) ``` -------------------------------- ### Configure AI Models in Python Code Source: https://github.com/python51888/midscene-python/blob/master/wiki/安装配置.md Define multiple AI provider configurations using AIModelConfig objects within a Python dictionary, specifying provider, model, API key, and temperature. ```python from midscene.core.ai_model import AIModelConfig # 多个 AI 提供商配置 configs = { "openai": AIModelConfig( provider="openai", model="gpt-4-vision-preview", api_key="your-openai-key", temperature=0.1 ), "claude": AIModelConfig( provider="anthropic", model="claude-3-sonnet-20240229", api_key="your-claude-key", temperature=0.1 ) } ``` -------------------------------- ### Cross-Platform UIContext Usage Source: https://github.com/python51888/midscene-python/blob/master/wiki/核心概念/UI上下文与数据模型.md Demonstrates using the same UIContext data model across different platforms like Web and Android, ensuring consistent interaction patterns. ```python # Web 和 Android 使用相同的数据模型 web_context: UIContext = await web_page.get_context() android_context: UIContext = await android_device.get_context() # 操作方式完全相同 print(web_context.screenshot_base64) print(android_context.screenshot_base64) ``` -------------------------------- ### Basic Interaction Actions Source: https://github.com/python51888/midscene-python/blob/master/wiki/快速开始.md Perform basic UI interactions like clicking, inputting text, and scrolling using natural language commands with the Agent. ```python # 点击操作 await agent.ai_action("点击提交按钮") await agent.ai_action("点击页面右上角的用户头像") # 输入操作 await agent.ai_action("在用户名框输入 'admin'") await agent.ai_action("在密码框输入密码") # 滚动操作 await agent.ai_action("向下滚动查看更多内容") await agent.ai_action("滚动到页面底部") # 等待操作 await agent.ai_action("等待页面加载完成") ``` -------------------------------- ### Code-based AI Configuration Source: https://github.com/python51888/midscene-python/blob/master/wiki/核心概念/AI模型服务抽象层.md Configure AI models directly in code using the AIModelConfig class and then use it with the AIModelService. ```python from midscene.core.ai_model import AIModelConfig, AIModelService # 创建配置 config = AIModelConfig( provider="openai", model="gpt-4-vision-preview", api_key="your_api_key", temperature=0.1, max_tokens=2000 ) # 创建服务实例 ai_service = AIModelService() # 使用配置调用 result = await ai_service.call_ai( messages=messages, model_config=config ) ``` -------------------------------- ### Google Gemini Model Configuration Source: https://github.com/python51888/midscene-python/blob/master/wiki/核心概念/AI模型服务抽象层.md Configure the Gemini provider with models such as gemini-1.5-pro-vision, providing the API key and temperature. ```python # Gemini 配置 gemini_config = AIModelConfig( provider="gemini", model="gemini-1.5-pro-vision", api_key="AIza…", temperature=0.2 ) ``` -------------------------------- ### Environment Variable Configuration Source: https://github.com/python51888/midscene-python/blob/master/wiki/核心概念/AI模型服务抽象层.md Configure AI service settings using environment variables, such as provider, model, and API key. ```bash # .env 文件 MIDSCENE_AI_PROVIDER=openai MIDSCENE_AI_MODEL=gpt-4-vision-preview MIDSCENE_AI_API_KEY=your_api_key_here MIDSCENE_AI_BASE_URL=https://api.openai.com/v1 ``` -------------------------------- ### Utilize Insight Engine for UI Operations Source: https://context7.com/python51888/midscene-python/llms.txt Demonstrates how to use the Insight class for locating elements, extracting data, and asserting conditions on a web page. Requires an initialized AIModelService and a SeleniumWebPage context. ```python import asyncio from midscene.core.insight import Insight from midscene.core.ai_model import AIModelService from midscene.web import SeleniumWebPage async def insight_usage(): with SeleniumWebPage.create() as page: await page.navigate_to("https://example.com") # Create Insight with context provider ai_service = AIModelService() insight = Insight( context_provider=page.get_context, ai_service=ai_service ) # Add debugging subscriber def debug_handler(data): print(f"Operation: {data['type']}, Success: {'error' not in data}") insight.add_dump_subscriber(debug_handler) # Locate element result = await insight.locate("Submit button") if result.element: print(f"Found at: {result.rect}") # Extract data extracted = await insight.extract("All form field labels") print(f"Extracted: {extracted['data']}") # Assert condition assert_result = await insight.assert_condition("Page has loaded completely") print(f"Assertion passed: {assert_result.passed}") print(f"Reasoning: {assert_result.thought}") # Describe element at coordinates description = await insight.describe((500, 300)) print(f"Element description: {description}") asyncio.run(insight_usage()) ``` -------------------------------- ### Execute Search Actions with ai_action Source: https://context7.com/python51888/midscene-python/llms.txt Shows how to use the `ai_action` method to perform a series of search-related actions on a web page, such as entering text and clicking. The AI automatically handles element location. ```python import asyncio from midscene import Agent from midscene.web import SeleniumWebPage async def search_example(): with SeleniumWebPage.create() as page: agent = Agent(page) await page.navigate_to("https://www.google.com") # AI understands context and locates elements automatically await agent.ai_action("Enter 'Python automation' in the search box") await agent.ai_action("Click the search button") await agent.ai_action("Scroll down to see more results") await agent.ai_action("Click on the first search result") asyncio.run(search_example()) ``` -------------------------------- ### Configure Report Generation Source: https://github.com/python51888/midscene-python/blob/master/docs/quickstart.md Enable report generation for Midscene automation by setting `generate_report=True` and specifying a report file name in `AgentOptions`. ```python # 生成执行报告 options = AgentOptions( generate_report=True, report_file_name="automation_report" ) ``` -------------------------------- ### Dynamic UIContext Provider Source: https://github.com/python51888/midscene-python/blob/master/wiki/核心概念/Insight-UI理解引擎.md Implement a dynamic context provider function that fetches UI context information based on the type of Insight action being performed. This allows for adaptive context gathering. ```python # 动态上下文 async def get_context(action: InsightAction) -> UIContext: # 根据操作类型获取不同的上下文信息 if action == InsightAction.LOCATE: return await page.get_locate_context() elif action == InsightAction.EXTRACT: return await page.get_extract_context() else: return await page.get_default_context() insight = Insight(get_context) ``` -------------------------------- ### AI Locate: Find Login Button Source: https://github.com/python51888/midscene-python/blob/master/README.zh.md Demonstrates using the ai_locate method to find a UI element, specifically the '登录按钮' (login button), using natural language. The agent will employ intelligent strategies to identify and return the element. ```python element = await agent.ai_locate("登录按钮") ``` -------------------------------- ### AIModelConfig Configuration Class Source: https://github.com/python51888/midscene-python/blob/master/wiki/核心概念/AI模型服务抽象层.md Defines the configuration parameters for AI models, including provider, model name, API key, and other settings. ```python class AIModelConfig(BaseModel): """AI model configuration""" provider: str # 提供商名称 model: str # 模型名称 api_key: str # API 密钥 base_url: Optional[str] = None # 自定义 API 地址 max_tokens: int = 4000 # 最大 token 数 temperature: float = 0.1 # 随机性控制 timeout: int = 60 # 请求超时 ``` -------------------------------- ### Troubleshoot AI Model API Key Source: https://github.com/python51888/midscene-python/blob/master/wiki/快速开始.md Verify your AI model API key configuration by printing a portion of the key retrieved from environment variables. ```python # 检查 API Key 配置 import os print(f"API Key: {os.getenv('OPENAI_API_KEY')[:10]}...") ``` -------------------------------- ### Run YAML Scripts with Midscene CLI Source: https://github.com/python51888/midscene-python/blob/master/docs/quickstart.md Execute Midscene automation scripts using the command-line interface. Supports single scripts, directories, configuration files, concurrent execution, and device specification. ```bash # 运行单个脚本 midscene run script.yaml # 运行目录中的所有脚本 midscene run scripts/ # 使用配置文件 midscene run script.yaml --config midscene.yml # 并发执行 midscene run scripts/ --concurrent 3 # Android 设备指定 midscene run android_script.yaml --device device_id ``` -------------------------------- ### Verify ADB Device Connection Source: https://github.com/python51888/midscene-python/blob/master/wiki/安装配置.md Connect an Android device with USB debugging enabled and use 'adb devices' to verify the connection. The output should list the device ID. ```bash adb devices ``` -------------------------------- ### Locate UI Elements with ai_locate Source: https://context7.com/python51888/midscene-python/llms.txt Demonstrates the `ai_locate` method for finding UI elements using natural language descriptions. It returns a `LocateResult` object and supports options for deep analysis. ```python import asyncio from midscene import Agent from midscene.web import SeleniumWebPage from midscene.core.types import LocateOption async def locate_elements(): with SeleniumWebPage.create() as page: agent = Agent(page) await page.navigate_to("https://example.com") # Basic element location login_button = await agent.ai_locate("Login button in the header") print(f"Found element at: {login_button.rect}") # Location with deep analysis for complex UIs options = LocateOption(deep_think=True, cacheable=True) submit_btn = await agent.ai_locate("Submit form button", options=options) # Interact with located element if submit_btn.element: await submit_btn.element.tap() asyncio.run(locate_elements()) ``` -------------------------------- ### Web Automation with Selenium Source: https://github.com/python51888/midscene-python/blob/master/docs/quickstart.md Automate web interactions using Selenium and Midscene's Agent. Requires navigation, AI actions for user input, data extraction, and assertions. ```python import asyncio from midscene import Agent from midscene.web import SeleniumWebPage async def web_automation(): # 创建浏览器实例 with SeleniumWebPage.create(headless=False) as page: agent = Agent(page) # 导航到网站 await page.navigate_to("https://example.com") # 使用自然语言进行操作 await agent.ai_action("点击登录按钮") await agent.ai_action("在用户名框输入 'demo@example.com'") await agent.ai_action("在密码框输入 'password123'") await agent.ai_action("点击提交按钮") # 数据提取 user_info = await agent.ai_extract({ "username": "用户名", "email": "邮箱地址" }) print(f"用户信息: {user_info}") # 断言验证 await agent.ai_assert("页面显示欢迎信息") # 运行示例 asyncio.run(web_automation()) ``` -------------------------------- ### AI Assert: User Login Verification Source: https://github.com/python51888/midscene-python/blob/master/README.zh.md Illustrates the use of the ai_assert method for verifying page state. The agent will interpret the natural language assertion '用户已成功登录' (user has successfully logged in) to check if the login operation was successful. ```python await agent.ai_assert("用户已成功登录") ```