### Install Midscene Python from Source

Source: https://github.com/python51888/midscene-python/blob/master/wiki/安装配置.md

Clone the repository and install the package in editable mode with its dependencies.

```bash
git clone https://gitee.com/Python51888/midscene-python.git
cd midscene-python
pip install -e .
```

--------------------------------

### Android Automation Example

Source: https://github.com/python51888/midscene-python/blob/master/wiki/快速开始.md

Automate Android applications using Midscene's Agent and AndroidDevice. This example connects to a device, starts an app, performs login actions, and asserts the login state.

```python
import asyncio
from midscene import Agent
from midscene.android import AndroidDevice

async def android_example():
    """Android 应用自动化"""
    
    # 连接 Android 设备
    device = AndroidDevice()
    await device.connect()
    
    # 创建 Agent
    agent = Agent(device)
    
    # 启动应用
    await device.start_app("com.example.app")
    
    # 自然语言操作
    await agent.ai_action("点击登录按钮")
    await agent.ai_action("输入用户名 'testuser'")
    await agent.ai_action("输入密码 'password123'")
    await agent.ai_action("点击确认登录")
    
    # 验证登录状态
    await agent.ai_assert("显示用户已登录")
    
    print("✅ Android 自动化完成！")

# 运行示例
asyncio.run(android_example())
```

--------------------------------

### Install and Configure Playwright

Source: https://github.com/python51888/midscene-python/blob/master/wiki/安装配置.md

Install Playwright and its browser binaries. You can install all browsers or specific ones like Chromium to save space.

```bash
pip install playwright
playwright install
```

```bash
playwright install chromium
```

--------------------------------

### Basic Installation Verification Test

Source: https://github.com/python51888/midscene-python/blob/master/wiki/安装配置.md

An asynchronous Python script to verify the successful import of Midscene modules and initialization of the AIModelService.

```python
# test_installation.py
import asyncio
from midscene import Agent
from midscene.core.ai_model import AIModelService

async def test_installation():
    """测试安装是否成功"""
    
    # 测试导入
    print("✓ 导入模块成功")
    
    # 测试 AI 服务配置
    try:
        ai_service = AIModelService()
        print("✓ AI 服务初始化成功")
    except Exception as e:
        print(f"✗ AI 服务初始化失败: {e}")
    
    print("🎉 安装验证完成！")

# 运行测试
asyncio.run(test_installation())
```

--------------------------------

### Install Browser Drivers

Source: https://github.com/python51888/midscene-python/blob/master/wiki/快速开始.md

Install necessary browser drivers for web automation using pip. Choose between Selenium WebDriver or Playwright.

```bash
# Selenium WebDriver
pip install webdriver-manager

# 或者 Playwright
pip install playwright
playwright install
```

--------------------------------

### Install ADB on Linux and macOS

Source: https://github.com/python51888/midscene-python/blob/master/wiki/安装配置.md

Commands to install the Android Debug Bridge (ADB) on Ubuntu/Debian and macOS systems.

```bash
sudo apt-get install android-tools-adb
```

```bash
brew install android-platform-tools
```

--------------------------------

### Configure Selenium with WebDriver Manager

Source: https://github.com/python51888/midscene-python/blob/master/wiki/安装配置.md

Install Selenium and WebDriver Manager, then use it to automatically manage and install the correct WebDriver for Chrome.

```bash
pip install selenium webdriver-manager
```

```python
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.chrome.service import Service

service = Service(ChromeDriverManager().install())
driver = webdriver.Chrome(service=service)
```

--------------------------------

### Developer Installation for Midscene Python

Source: https://github.com/python51888/midscene-python/blob/master/wiki/安装配置.md

Install the package in editable mode along with development and documentation dependencies, and set up pre-commit hooks.

```bash
git clone https://gitee.com/Python51888/midscene-python.git
cd midscene-python
pip install -e ".[dev,docs]"
pre-commit install
```

--------------------------------

### Create and Activate a Virtual Environment

Source: https://github.com/python51888/midscene-python/blob/master/wiki/安装配置.md

Steps to create a virtual environment using Python's venv module and activate it on Linux/macOS and Windows, before installing packages.

```bash
# 创建虚拟环境（推荐）
python -m venv midscene-env
source midscene-env/bin/activate  # Linux/macOS
# 或
midscene-env\Scripts\activate     # Windows

# 在虚拟环境中安装
pip install midscene-python
```

--------------------------------

### AI Element Localization Examples

Source: https://github.com/python51888/midscene-python/blob/master/wiki/核心概念/Agent核心控制器.md

Demonstrates using `ai_locate` for precise element identification based on various descriptions, including basic names, visual attributes, and relative positioning.

```python
# 基础定位
login_btn = await agent.ai_locate("登录按钮")
search_box = await agent.ai_locate("搜索输入框")

# 描述性定位
submit_btn = await agent.ai_locate("蓝色的提交按钮")
user_avatar = await agent.ai_locate("页面右上角的用户头像")

# 相对定位
next_btn = await agent.ai_locate("位于分页控件中的下一页按钮")
```

--------------------------------

### Install Midscene Python

Source: https://github.com/python51888/midscene-python/blob/master/README.zh.md

Install the midscene-python package using pip. Ensure you have Python 3.9+.

```bash
pip install midscene-python
```

--------------------------------

### Basic and Complex ai_action Examples

Source: https://github.com/python51888/midscene-python/blob/master/wiki/核心概念/Agent核心控制器.md

Showcases various use cases for the `ai_action` method, from simple clicks and input to complex conditional operations and multi-step actions.

```python
# 基础交互
await agent.ai_action("点击登录按钮")
await agent.ai_action("在用户名框输入 'admin'")
await agent.ai_action("选择下拉菜单中的第二个选项")

# 复杂操作
await agent.ai_action("滚动到页面底部并点击加载更多按钮")
await agent.ai_action("在搜索框输入'Python'并按回车搜索")

# 条件操作
await agent.ai_action("如果页面显示错误信息，点击确定按钮")
```

--------------------------------

### Configure AI Model

Source: https://github.com/python51888/midscene-python/blob/master/wiki/快速开始.md

Create a .env file to configure your AI model API key and optional base URL. This example shows OpenAI configuration.

```bash
# .env
OPENAI_API_KEY=your_openai_api_key_here
OPENAI_BASE_URL=https://api.openai.com/v1  # 可选，默认官方 API
```

--------------------------------

### Install Midscene Python via Pip

Source: https://github.com/python51888/midscene-python/blob/master/wiki/安装配置.md

Use pip to install the latest version or a specific version of the midscene-python package.

```bash
pip install midscene-python
```

```bash
pip install midscene-python==0.1.0
```

--------------------------------

### Install Optional Dependencies for Midscene Python

Source: https://github.com/python51888/midscene-python/blob/master/wiki/安装配置.md

Install optional dependency groups for Midscene Python, such as 'dev' for development tools, 'docs' for documentation tools, or both.

```bash
# 开发工具
pip install "midscene-python[dev]"
```

```bash
# 文档工具
pip install "midscene-python[docs]"
```

```bash
# 全部依赖
pip install "midscene-python[dev,docs]"
```

--------------------------------

### Perform Login Automation with Agent

Source: https://context7.com/python51888/midscene-python/llms.txt

Demonstrates how to use the Agent to perform a login sequence on a web page using natural language commands. Requires SeleniumWebPage setup.

```python
import asyncio
from midscene import Agent
from midscene.web import SeleniumWebPage

async def login_automation():
    # Create web page with Selenium
    with SeleniumWebPage.create(headless=False, window_size=(1920, 1080)) as page:
        agent = Agent(page)

        # Navigate to website
        await page.navigate_to("https://example.com/login")

        # Perform login using natural language commands
        await agent.ai_action("Click the login button")
        await agent.ai_action("Enter 'user@example.com' in the email field")
        await agent.ai_action("Enter 'password123' in the password field")
        await agent.ai_action("Click the submit button")

        # Wait for login to complete
        await agent.ai_wait_for("Dashboard is visible", timeout_ms=10000)

        # Verify login was successful
        await agent.ai_assert("User is logged in and welcome message is displayed")

asyncio.run(login_automation())
```

--------------------------------

### Web Automation: Search Example

Source: https://github.com/python51888/midscene-python/blob/master/wiki/快速开始.md

Perform a web search using natural language commands with Midscene's Agent and SeleniumWebPage. Navigates to Baidu, inputs a search query, clicks search, and asserts results.

```python
import asyncio
from midscene import Agent
from midscene.web import SeleniumWebPage

async def search_example():
    """在百度搜索 Python 教程"""
    
    # 创建 Web 页面实例
    with SeleniumWebPage.create() as page:
        # 创建 Agent
        agent = Agent(page)
        
        # 导航到网站
        await page.goto("https://www.baidu.com")
        
        # 使用自然语言进行搜索
        await agent.ai_action("在搜索框输入'Python 教程'")
        await agent.ai_action("点击搜索按钮")
        
        # 验证搜索结果
        await agent.ai_assert("页面显示了 Python 教程的搜索结果")
        
        print("✅ 搜索操作完成！")

# 运行示例
asyncio.run(search_example())
```

--------------------------------

### AI Assertion Examples

Source: https://github.com/python51888/midscene-python/blob/master/wiki/核心概念/Agent核心控制器.md

Provides examples of using `ai_assert` for verifying page states, content, and conditional logic, ensuring the application behaves as expected.

```python
# 状态验证
await agent.ai_assert("用户已成功登录")
await agent.ai_assert("页面显示错误信息")
await agent.ai_assert("表单验证通过")

# 内容验证
await agent.ai_assert("搜索结果包含'Python 教程'")
await agent.ai_assert("购物车中有 3 件商品")
await agent.ai_assert("订单状态为已发货")

# 条件验证
await agent.ai_assert("如果是新用户，显示欢迎向导")
```

--------------------------------

### Web Automation: Data Extraction Example

Source: https://github.com/python51888/midscene-python/blob/master/wiki/快速开始.md

Extract structured data from a news website using Midscene's Agent. This example extracts news titles, publication times, and summaries.

```python
import asyncio
from midscene import Agent
from midscene.web import SeleniumWebPage

async def extract_example():
    """提取新闻标题"""
    
    with SeleniumWebPage.create() as page:
        agent = Agent(page)
        
        # 访问新闻网站
        await page.goto("https://news.example.com")
        
        # 提取结构化数据
        news_data = await agent.ai_extract({
            "articles": [
                {
                    "title": "新闻标题",
                    "time": "发布时间",
                    "summary": "新闻摘要"
                }
            ]
        })
        
        # 输出结果
        for article in news_data["articles"]:
            print(f"📰 {article['title']}")
            print(f"⏰ {article['time']}")
            print(f"📄 {article['summary']}\n")

# 运行示例
asyncio.run(extract_example())
```

--------------------------------

### Check and Install Python Version

Source: https://github.com/python51888/midscene-python/blob/master/wiki/安装配置.md

Command to check the current Python version and instructions for installing Python 3.9 on different operating systems if needed.

```bash
# 检查 Python 版本
python --version
```

```bash
# 如果版本低于 3.9，安装新版本
# Ubuntu/Debian
sudo apt-get install python3.9

# macOS
brew install python@3.9

# Windows
# 从 python.org 下载安装
```

--------------------------------

### Web Automation YAML Script Example

Source: https://github.com/python51888/midscene-python/blob/master/docs/quickstart.md

Define a web automation script in YAML format, specifying browser settings, tasks, steps with AI actions, data extraction, and assertions.

```yaml
# Web 自动化脚本
web:
  url: "https://example.com"
  browser: "chrome"
  headless: false

tasks:
  - name: "登录操作"
    steps:
      - action: "ai_action"
        prompt: "点击登录按钮"
      
      - action: "ai_action" 
        prompt: "输入用户名 'demo@example.com'"
      
      - action: "ai_action"
        prompt: "输入密码 'password123'"
      
      - action: "ai_action"
        prompt: "点击提交按钮"
  
  - name: "数据提取"
    steps:
      - action: "ai_extract"
        prompt:
          username: "用户名"
          email: "邮箱地址"
        save_to: "user_info"
  
  - name: "状态验证"
    steps:
      - action: "ai_assert"
        prompt: "页面显示欢迎信息"
```

--------------------------------

### Configure Pip with a Mirror Source

Source: https://github.com/python51888/midscene-python/blob/master/wiki/安装配置.md

Install packages using a specific mirror source (e.g., Tsinghua University) or configure pip to use a mirror source permanently for faster downloads.

```bash
# 使用国内镜像源
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple midscene-python

# 或配置永久镜像源
pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple
```

--------------------------------

### Web Automation with Playwright

Source: https://github.com/python51888/midscene-python/blob/master/docs/quickstart.md

Perform web automation using Playwright and Midscene's Agent. This example demonstrates navigation, AI actions, and extracting page information.

```python
import asyncio
from midscene import Agent
from midscene.web import PlaywrightWebPage

async def playwright_automation():
    # 创建 Playwright 页面
    async with await PlaywrightWebPage.create() as page:
        agent = Agent(page)
        
        await page.navigate_to("https://playwright.dev")
        await agent.ai_action("点击文档链接")
        
        # 提取页面信息
        page_info = await agent.ai_extract({
            "title": "页面标题",
            "sections": ["主要章节列表"]
        })
        print(f"页面信息: {page_info}")

asyncio.run(playwright_automation())
```

--------------------------------

### Enable Detailed httpx Request Logging

Source: https://github.com/python51888/midscene-python/blob/master/wiki/核心概念/AI模型服务抽象层.md

Enable DEBUG level logging for the 'httpx' library to get detailed insights into HTTP requests and responses.

```python
import logging

# 启用详细日志
logging.getLogger("httpx").setLevel(logging.DEBUG)
```

--------------------------------

### AI Action: Search with Python Tutorial

Source: https://github.com/python51888/midscene-python/blob/master/README.zh.md

Example of using the ai_action method to perform a search query on a web page using natural language. The agent will interpret '在搜索框中输入'Python教程'并搜索' to locate the search box, input the text, and initiate the search.

```python
await agent.ai_action("在搜索框中输入'Python教程'并搜索")
```

--------------------------------

### Data Extraction Examples

Source: https://github.com/python51888/midscene-python/blob/master/wiki/核心概念/Agent核心控制器.md

Illustrates how to use `ai_extract` to retrieve structured data from a page, including single objects, lists of objects, and complex nested data structures.

```python
# 提取单个对象
user_info = await agent.ai_extract({
    "name": "用户姓名",
    "email": "邮箱地址",
    "role": "用户角色"
})

# 提取列表数据
products = await agent.ai_extract({
    "products": [
        {
            "name": "商品名称",
            "price": "价格",
            "rating": "评分",
            "in_stock": "是否有货"
        }
    ]
})

# 复杂嵌套结构
order_data = await agent.ai_extract({
    "order_id": "订单号",
    "customer": {
        "name": "客户姓名",
        "address": "送货地址"
    },
    "items": [
        {
            "product": "商品名称",
            "quantity": "数量",
            "price": "单价"
        }
    ],
    "total": "总金额"
})
```

--------------------------------

### AI Extract: Product Information

Source: https://github.com/python51888/midscene-python/blob/master/README.zh.md

Shows how to use the ai_extract method to pull structured data from a web page. The example defines a desired JSON structure for product information, specifying fields like 'name', 'price', and 'rating', which the agent will attempt to extract.

```python
products = await agent.ai_extract({
    "products": [
        {"name": "产品名称", "price": "价格", "rating": "评分"}
    ]
})
```

--------------------------------

### Initialize Midscene Project

Source: https://github.com/python51888/midscene-python/blob/master/docs/quickstart.md

Initialize a new Midscene project directory using the CLI.

```bash
midscene init my-project
cd my-project
```

--------------------------------

### Enable Detailed Logging

Source: https://github.com/python51888/midscene-python/blob/master/wiki/快速开始.md

Enable debug-level logging for detailed insights into the automation process by calling setup_logger with logging.DEBUG.

```python
import logging
from midscene.shared.logger import setup_logger

# 启用调试日志
setup_logger(level=logging.DEBUG)
```

--------------------------------

### Initialize Agents for Web and Android

Source: https://github.com/python51888/midscene-python/blob/master/wiki/核心概念/Agent核心控制器.md

Demonstrates initializing Agent instances for different platforms using the same interface. Ensure the appropriate platform interface (e.g., selenium_page, android_device) is passed during initialization.

```python
# Web 和 Android 使用相同的接口
web_agent = Agent(selenium_page)
android_agent = Agent(android_device)

# 相同的操作方法
await web_agent.ai_action("点击登录按钮")
await android_agent.ai_action("点击登录按钮")
```

--------------------------------

### Agent Lifecycle Management

Source: https://github.com/python51888/midscene-python/blob/master/wiki/核心概念/Agent核心控制器.md

Illustrates two methods for managing the Agent's lifecycle: manual initialization and destruction using a try-finally block, and using the recommended asynchronous context manager.

```python
# 方式1: 手动管理
agent = Agent(page)
try:
    await agent.ai_action("执行操作")
finally:
    await agent.destroy()

# 方式2: 上下文管理器（推荐）
async with Agent(page) as agent:
    await agent.ai_action("执行操作")
    # 自动调用 destroy()

```

--------------------------------

### Configure AI Provider via YAML File

Source: https://github.com/python51888/midscene-python/blob/master/docs/quickstart.md

Create a `midscene.yml` configuration file to set the AI provider, model, and API key.

```yaml
ai:
  provider: "openai"
  model: "gpt-4-vision-preview"
  api_key: "your-api-key-here"
```

--------------------------------

### AgentOptions Configuration

Source: https://github.com/python51888/midscene-python/blob/master/wiki/核心概念/Agent核心控制器.md

Demonstrates how to configure Agent behavior using the `AgentOptions` class, including settings for timeouts, retries, debugging, performance, and AI model parameters.

```python
from midscene.core import AgentOptions

options = AgentOptions(
    # 超时设置
    timeout=30,                    # 操作超时时间（秒）
    
    # 重试机制
    retry_count=3,                 # 失败重试次数
    retry_delay=1.0,               # 重试间隔（秒）
    
    # 调试选项
    screenshot_on_error=True,      # 错误时自动截图
    save_execution_logs=True,      # 保存执行日志
    
    # 性能优化
    cache_enabled=True,            # 启用智能缓存
    parallel_execution=False,      # 并行执行（实验性）
    
    # AI 模型设置
    model_temperature=0.1,         # AI 响应随机性
    max_tokens=1000,              # 最大 token 数
)

agent = Agent(page, options=options)
```

--------------------------------

### Basic Web Automation with Agent

Source: https://github.com/python51888/midscene-python/blob/master/README.zh.md

Demonstrates basic web automation using the Agent with SeleniumWebPage. This involves creating a page instance, initializing an agent, and performing actions like clicking, inputting text, and extracting data using natural language commands. Requires async execution.

```python
from midscene import Agent
from midscene.web import SeleniumWebPage

# 创建 Web Agent
with SeleniumWebPage.create() as page:
    agent = Agent(page)
    
    # 使用自然语言进行自动化操作
    await agent.ai_action("点击登录按钮")
    await agent.ai_action("输入用户名 'test@example.com'")
    await agent.ai_action("输入密码 'password123'")
    await agent.ai_action("点击提交按钮")
    
    # 数据提取
    user_info = await agent.ai_extract("提取用户个人信息")
    
    # 断言验证
    await agent.ai_assert("页面显示欢迎信息")
```

--------------------------------

### Configure Agent with Options

Source: https://github.com/python51888/midscene-python/blob/master/docs/quickstart.md

Initialize the Midscene Agent with custom options, such as enabling caching with a `cache_id` and enabling report generation.

```python
from midscene.core import AgentOptions

options = AgentOptions(
    cache_id="my_automation",
    generate_report=True
)
agent = Agent(page, options)
```

--------------------------------

### Configure AI Model API Keys using Environment Variables

Source: https://github.com/python51888/midscene-python/blob/master/wiki/安装配置.md

Set up API keys and optional base URLs for various AI providers (OpenAI, Anthropic, DashScope, Gemini) and configure default Midscene AI settings in a .env file.

```dotenv
# OpenAI 配置
OPENAI_API_KEY=sk-your-openai-api-key
OPENAI_BASE_URL=https://api.openai.com/v1  # 可选

# Anthropic 配置
ANTHROPIC_API_KEY=sk-ant-your-anthropic-key

# 通义千问配置
DASHSCOPE_API_KEY=sk-your-dashscope-key

# Gemini 配置
GOOGLE_API_KEY=AIza-your-google-api-key

# 默认模型配置
MIDSCENE_AI_PROVIDER=openai
MIDSCENE_AI_MODEL=gpt-4-vision-preview
```

--------------------------------

### Qwen (通义千问) Model Configuration

Source: https://github.com/python51888/midscene-python/blob/master/wiki/核心概念/AI模型服务抽象层.md

Configure the Qwen provider with models like qwen-vl-max, using the DashScope API Key and setting the temperature.

```python
# Qwen 配置
qwen_config = AIModelConfig(
    provider="qwen",
    model="qwen-vl-max",
    api_key="sk-…",  # DashScope API Key
    temperature=0.1
)
```

--------------------------------

### Playwright Web Page Automation

Source: https://context7.com/python51888/midscene-python/llms.txt

Automate web browser interactions using Playwright with AI-driven actions and direct page manipulation. Requires Playwright to be installed.

```python
import asyncio
from midscene import Agent
from midscene.web import PlaywrightWebPage

async def playwright_automation():
    # Create Playwright page with async context manager
    async with await PlaywrightWebPage.create(
        headless=False,
        viewport_size=(1920, 1080)
    ) as page:
        agent = Agent(page)

        # Navigate to URL
        await page.navigate_to("https://playwright.dev")

        # AI-driven interactions
        await agent.ai_action("Click on the Get Started button")
        await agent.ai_action("Search for 'selectors' in the search box")

        # Extract documentation info
        docs = await agent.ai_extract({
            "current_section": "Current documentation section title",
            "code_examples": ["Code example snippets on the page"]
        })
        print(f"Documentation: {docs}")

        # Direct page interactions
        await page.scroll("down", distance=300)
        await page.evaluate_script("console.log('Test complete')")

asyncio.run(playwright_automation())
```

--------------------------------

### Agent Class Structure

Source: https://github.com/python51888/midscene-python/blob/master/wiki/核心概念/Agent核心控制器.md

Illustrates the core components of the Agent class, including its initialization with platform interfaces, options, AI services, and execution modules.

```python
class Agent:
    """Core Agent class that orchestrates AI model and device interactions"""
    
    def __init__(
        self,
        interface: AbstractInterface,
        options: Optional[AgentOptions] = None
    ):
        self.interface = interface              # 平台接口
        self.options = options or AgentOptions() # 配置选项
        self.ai_service = AIModelService()      # AI 服务
        self.insight = Insight(...)
        self.task_executor = TaskExecutor(...)

```

--------------------------------

### Negative Assertion

Source: https://github.com/python51888/midscene-python/blob/master/wiki/核心概念/Insight-UI理解引擎.md

Assert that a specific condition is NOT met, for example, 'the page does not display an error message'. This is useful for ensuring the absence of unwanted elements or states.

```python
# 否定断言
result = await insight.assert_condition("页面没有显示错误信息")
```

--------------------------------

### Basic Web Automation with Midscene Python Agent

Source: https://github.com/python51888/midscene-python/blob/master/README.md

Demonstrates basic web automation tasks using the Midscene Agent with Selenium. This includes logging in, extracting data, and verifying assertions via natural language commands.

```python
from midscene import Agent
from midscene.web import SeleniumWebPage

# Create a Web Agent
with SeleniumWebPage.create() as page:
    agent = Agent(page)
    
    # Perform automation operations using natural language
    await agent.ai_action("Click the login button")
    await agent.ai_action("Enter username 'test@example.com'")
    await agent.ai_action("Enter password 'password123'")
    await agent.ai_action("Click the submit button")
    
    # Data extraction
    user_info = await agent.ai_extract("Extract user personal information")
    
    # Assertion verification
    await agent.ai_assert("Page displays welcome message")
```

--------------------------------

### List Android Devices with Midscene CLI

Source: https://github.com/python51888/midscene-python/blob/master/docs/quickstart.md

Use the Midscene CLI to list available Android devices connected to the system.

```bash
midscene devices
```

--------------------------------

### Configure AI Provider via Environment Variables

Source: https://github.com/python51888/midscene-python/blob/master/docs/quickstart.md

Set environment variables to configure the AI provider, model, and API key for Midscene.

```bash
export MIDSCENE_AI_PROVIDER=openai
export MIDSCENE_AI_MODEL=gpt-4-vision-preview
export MIDSCENE_AI_API_KEY=your-api-key-here
```

--------------------------------

### Complex Condition Assertion

Source: https://github.com/python51888/midscene-python/blob/master/wiki/核心概念/Insight-UI理解引擎.md

Assert complex, conditional logic on the page, such as 'if it's a new user, display a welcome guide'. This allows for testing conditional UI behaviors.

```python
# 复杂条件断言
result = await insight.assert_condition(
    "如果是新用户，页面应该显示欢迎指引"
)
```

--------------------------------

### Android Device Low-Level Control

Source: https://context7.com/python51888/midscene-python/llms.txt

Provides direct ADB-based control for Android devices, including UI hierarchy parsing, input simulation, and app management. Requires ADB setup and a connected device.

```python
import asyncio
from midscene.android.device import AndroidDevice

async def device_control():
    # List available devices
    devices = await AndroidDevice.list_devices()
    print(f"Available devices: {devices}")

    # Connect to device
    device = await AndroidDevice.create(device_id=devices[0])

    # Get UI context with screenshot and element tree
    context = await device.get_context()
    print(f"Screen size: {context.size.width}x{context.size.height}")
    print(f"Found {len(context.content)} UI elements")

    # Direct input operations
    await device.tap(540, 960)
    await device.input_text("Hello Android")
    await device.clear_text()

    # Scrolling and gestures
    await device.scroll("down", distance=500)
    await device.swipe(100, 500, 900, 500, duration=200)  # Swipe right
    await device.long_press(540, 960, duration=2000)

    # Key events
    await device.key_event("KEYCODE_ENTER")
    await device.back()
    await device.home()

    # App management
    await device.install_app("/path/to/app.apk")
    await device.launch_app("com.example.app", activity=".MainActivity")
    await device.stop_app("com.example.app")

    # Disconnect
    await device.disconnect()

asyncio.run(device_control())
```

--------------------------------

### Optimize Context for Different Insight Actions

Source: https://github.com/python51888/midscene-python/blob/master/wiki/核心概念/Insight-UI理解引擎.md

Provide optimized UI context based on the Insight action being performed. For example, provide detailed element information for locate actions and full page content for extract actions.

```python
# 为不同操作提供优化的上下文
async def optimized_context_provider(action: InsightAction) -> UIContext:
    base_context = await page.get_context()
    
    if action == InsightAction.LOCATE:
        # 定位操作需要更详细的元素信息
        base_context.elements = await page.get_all_elements()
    elif action == InsightAction.EXTRACT:
        # 提取操作需要更完整的页面内容
        base_context.page_content = await page.get_page_content()
    
    return base_context
```

--------------------------------

### AI Model Service Configuration

Source: https://context7.com/python51888/midscene-python/llms.txt

Configure AI providers like OpenAI, Anthropic, Qwen, and Gemini using environment variables or explicit configuration objects. Ensure necessary API keys and model names are set.

```python
import asyncio
import os
from midscene import Agent
from midscene.web import SeleniumWebPage
from midscene.core.ai_model import AIModelService, AIModelConfig
from midscene.core.types import AgentOptions

# Method 1: Environment variables (recommended)
os.environ["MIDSCENE_AI_PROVIDER"] = "openai"
os.environ["MIDSCENE_AI_MODEL"] = "gpt-4-vision-preview"
os.environ["MIDSCENE_AI_API_KEY"] = "sk-your-api-key"
```

--------------------------------

### AbstractInterface Definition in Python

Source: https://github.com/python51888/midscene-python/blob/master/wiki/核心概念/UI上下文与数据模型.md

Defines the abstract base class `AbstractInterface` for platform-specific implementations, outlining methods for getting UI context, available actions, tapping, inputting text, and scrolling. This serves as a contract for different platform integrations.

```python
class AbstractInterface(ABC):
    """平台实现的抽象接口"""
    
    @property
    @abstractmethod
    def interface_type(self) -> InterfaceType:
        """获取接口类型"""
        pass
    
    @abstractmethod
    async def get_context(self) -> UIContext:
        """获取当前 UI 上下文"""
        pass
    
    @abstractmethod
    async def action_space(self) -> List[str]:
        """获取可用操作列表"""
        pass
    
    @abstractmethod
    async def tap(self, x: float, y: float) -> None:
        """在坐标处点击"""
        pass
    
    @abstractmethod
    async def input_text(self, text: str) -> None:
        """输入文本"""
        pass
    
    @abstractmethod
    async def scroll(self, direction: str, distance: Optional[int] = None) -> None:
        """滚动操作"""
        pass
```

--------------------------------

### AIModelService Class Structure

Source: https://github.com/python51888/midscene-python/blob/master/wiki/核心概念/AI模型服务抽象层.md

The core service class initializes providers and offers a unified interface for AI calls.

```python
class AIModelService:
    """Unified AI model service interface"""
    
    def __init__(self):
        self.providers: Dict[str, AIProvider] = {}
        self._register_providers()
    
    async def call_ai(
        self,
        messages: List[Dict[str, Any]], 
        response_schema: Optional[Type[BaseModel]] = None,
        model_config: Optional[AIModelConfig] = None,
        **kwargs
    ) -> Dict[str, Any]:
        """统一的 AI 调用接口"""
```

--------------------------------

### Accessing Context Information in Python

Source: https://github.com/python51888/midscene-python/blob/master/wiki/核心概念/UI上下文与数据模型.md

Demonstrates how to access and process context information such as screenshots, viewport dimensions, and UI elements.

```python
screenshot_data = base64.b64decode(context.screenshot_base64)
viewport_width = context.size.width
all_buttons = [elem for elem in context.content if elem.node_type == NodeType.BUTTON]
```

--------------------------------

### Basic Element Localization

Source: https://github.com/python51888/midscene-python/blob/master/wiki/核心概念/Insight-UI理解引擎.md

Locate elements using simple, direct natural language descriptions. This is the most straightforward way to find UI components.

```python
# 基础定位
login_btn = await insight.locate("登录按钮")
search_box = await insight.locate("搜索输入框")
```

--------------------------------

### Configure Connection Pooling with httpx

Source: https://github.com/python51888/midscene-python/blob/master/wiki/核心概念/AI模型服务抽象层.md

Configure httpx.AsyncClient with connection pooling limits and timeouts for efficient reuse of connections.

```python
class OptimizedAIProvider(AIProvider):
    def __init__(self):
        # 配置连接池
        self.client = httpx.AsyncClient(
            limits=httpx.Limits(
                max_keepalive_connections=10,
                max_connections=20
            ),
            timeout=httpx.Timeout(60.0)
        )
    
    async def call(self, messages, config, **kwargs):
        # 复用连接
        response = await self.client.post(...)
        return response.json()
```

--------------------------------

### Configure AI Model with Explicit Settings

Source: https://context7.com/python51888/midscene-python/llms.txt

Use this to set up a specific AI model with custom parameters like provider, model name, API key, and generation settings. Ensure the API key is kept secure.

```python
config = AIModelConfig(
    provider="anthropic",  # openai, anthropic, qwen, gemini
    model="claude-3-opus-20240229",
    api_key="your-api-key",
    base_url=None,  # Optional custom endpoint
    max_tokens=4000,
    temperature=0.1,
    timeout=60
)

async def custom_ai_config():
    with SeleniumWebPage.create() as page:
        # Create agent with custom model config
        options = AgentOptions(
            model_config=lambda: config,
            generate_report=True,
            group_name="Custom AI Test"
        )
        agent = Agent(page, options=options)

        await page.navigate_to("https://example.com")
        await agent.ai_action("Click the main button")

asyncio.run(custom_ai_config())
```

--------------------------------

### Initialize Midscene Agent

Source: https://github.com/python51888/midscene-python/blob/master/docs/quickstart.md

Create an instance of the Midscene Agent, which acts as the core controller for automation tasks, coordinating AI models with device interactions.

```python
from midscene import Agent
from midscene.web import SeleniumWebPage

page = SeleniumWebPage.create()
agent = Agent(page)
```

--------------------------------

### OpenAI Model Configuration

Source: https://github.com/python51888/midscene-python/blob/master/wiki/核心概念/AI模型服务抽象层.md

Configure the OpenAI provider with specific model names like gpt-4-vision-preview or gpt-4o, and optionally set a custom base URL.

```python
# GPT-4V 配置
openai_config = AIModelConfig(
    provider="openai",
    model="gpt-4-vision-preview",  # 或 "gpt-4o"
    api_key="sk-…",
    base_url="https://api.openai.com/v1",  # 可选
    temperature=0.1
)
```

--------------------------------

### Configure AI Model Options

Source: https://github.com/python51888/midscene-python/blob/master/wiki/快速开始.md

Customize AI model configurations such as provider, model name, temperature, and max tokens using AIModelConfig.

```python
from midscene.core.ai_model import AIModelConfig

# 自定义 AI 配置
config = AIModelConfig(
    provider="openai",  # 或 "claude", "qwen", "gemini"
    model="gpt-4-vision-preview",
    temperature=0.1,
    max_tokens=1000
)

agent = Agent(page, ai_config=config)
```

--------------------------------

### Static UIContext Initialization

Source: https://github.com/python51888/midscene-python/blob/master/wiki/核心概念/Insight-UI理解引擎.md

Initialize the UIContext with static information like a base64 encoded screenshot, page title, and URL. This provides a fixed snapshot of the UI for the Insight engine.

```python
# 静态上下文
context = UIContext(
    screenshot_base64="...",
    page_title="登录页面",
    url="https://example.com/login"
)
insight = Insight(context)
```

--------------------------------

### Insight Class Initialization

Source: https://github.com/python51888/midscene-python/blob/master/wiki/核心概念/Insight-UI理解引擎.md

Initializes the Insight engine with a context provider, an optional AI service, and model configuration. The context provider is essential for fetching UI information.

```python
class Insight:
    """AI-powered UI understanding and reasoning engine"""
    
    def __init__(
        self,
        context_provider: Union[UIContext, Callable],
        ai_service: Optional[AIModelService] = None,
        model_config: Optional[AIModelConfig] = None
    ):
        self.context_provider = context_provider  # 上下文提供者
        self.ai_service = ai_service              # AI 模型服务
        self.model_config = model_config          # 模型配置
        self._dump_subscribers = []               # 调试订阅者
```

--------------------------------

### Multiple AI Configurations for Different Tasks

Source: https://github.com/python51888/midscene-python/blob/master/wiki/核心概念/AI模型服务抽象层.md

Manage and use different AI model configurations for distinct tasks, adjusting parameters like temperature and max tokens as needed.

```python
# 为不同任务配置不同的模型
configs = {
    "locate": AIModelConfig(
        provider="openai",
        model="gpt-4-vision-preview",
        temperature=0.0,  # 定位需要确定性
        max_tokens=500
    ),
    "extract": AIModelConfig(
        provider="claude", 
        model="claude-3-sonnet-20240229",
        temperature=0.2,  # 提取允许创造性
        max_tokens=2000
    ),
    "assert": AIModelConfig(
        provider="qwen",
        model="qwen-vl-max",
        temperature=0.1,
        max_tokens=1000
    )
}

# 根据任务选择配置
result = await ai_service.call_ai(
    messages=messages,
    model_config=configs["locate"]
)
```

--------------------------------

### Configure AI Models in Python Code

Source: https://github.com/python51888/midscene-python/blob/master/wiki/安装配置.md

Define multiple AI provider configurations using AIModelConfig objects within a Python dictionary, specifying provider, model, API key, and temperature.

```python
from midscene.core.ai_model import AIModelConfig

# 多个 AI 提供商配置
configs = {
    "openai": AIModelConfig(
        provider="openai",
        model="gpt-4-vision-preview",
        api_key="your-openai-key",
        temperature=0.1
    ),
    "claude": AIModelConfig(
        provider="anthropic", 
        model="claude-3-sonnet-20240229",
        api_key="your-claude-key",
        temperature=0.1
    )
}
```

--------------------------------

### Cross-Platform UIContext Usage

Source: https://github.com/python51888/midscene-python/blob/master/wiki/核心概念/UI上下文与数据模型.md

Demonstrates using the same UIContext data model across different platforms like Web and Android, ensuring consistent interaction patterns.

```python
# Web 和 Android 使用相同的数据模型
web_context: UIContext = await web_page.get_context()
android_context: UIContext = await android_device.get_context()

# 操作方式完全相同
print(web_context.screenshot_base64)
print(android_context.screenshot_base64)
```

--------------------------------

### Basic Interaction Actions

Source: https://github.com/python51888/midscene-python/blob/master/wiki/快速开始.md

Perform basic UI interactions like clicking, inputting text, and scrolling using natural language commands with the Agent.

```python
# 点击操作
await agent.ai_action("点击提交按钮")
await agent.ai_action("点击页面右上角的用户头像")

# 输入操作
await agent.ai_action("在用户名框输入 'admin'")
await agent.ai_action("在密码框输入密码")

# 滚动操作
await agent.ai_action("向下滚动查看更多内容")
await agent.ai_action("滚动到页面底部")

# 等待操作
await agent.ai_action("等待页面加载完成")
```

--------------------------------

### Code-based AI Configuration

Source: https://github.com/python51888/midscene-python/blob/master/wiki/核心概念/AI模型服务抽象层.md

Configure AI models directly in code using the AIModelConfig class and then use it with the AIModelService.

```python
from midscene.core.ai_model import AIModelConfig, AIModelService

# 创建配置
config = AIModelConfig(
    provider="openai",
    model="gpt-4-vision-preview", 
    api_key="your_api_key",
    temperature=0.1,
    max_tokens=2000
)

# 创建服务实例
ai_service = AIModelService()

# 使用配置调用
result = await ai_service.call_ai(
    messages=messages,
    model_config=config
)
```

--------------------------------

### Google Gemini Model Configuration

Source: https://github.com/python51888/midscene-python/blob/master/wiki/核心概念/AI模型服务抽象层.md

Configure the Gemini provider with models such as gemini-1.5-pro-vision, providing the API key and temperature.

```python
# Gemini 配置
gemini_config = AIModelConfig(
    provider="gemini",
    model="gemini-1.5-pro-vision",
    api_key="AIza…",
    temperature=0.2
)
```

--------------------------------

### Environment Variable Configuration

Source: https://github.com/python51888/midscene-python/blob/master/wiki/核心概念/AI模型服务抽象层.md

Configure AI service settings using environment variables, such as provider, model, and API key.

```bash
# .env 文件
MIDSCENE_AI_PROVIDER=openai
MIDSCENE_AI_MODEL=gpt-4-vision-preview
MIDSCENE_AI_API_KEY=your_api_key_here
MIDSCENE_AI_BASE_URL=https://api.openai.com/v1
```

--------------------------------

### Utilize Insight Engine for UI Operations

Source: https://context7.com/python51888/midscene-python/llms.txt

Demonstrates how to use the Insight class for locating elements, extracting data, and asserting conditions on a web page. Requires an initialized AIModelService and a SeleniumWebPage context.

```python
import asyncio
from midscene.core.insight import Insight
from midscene.core.ai_model import AIModelService
from midscene.web import SeleniumWebPage

async def insight_usage():
    with SeleniumWebPage.create() as page:
        await page.navigate_to("https://example.com")

        # Create Insight with context provider
        ai_service = AIModelService()
        insight = Insight(
            context_provider=page.get_context,
            ai_service=ai_service
        )

        # Add debugging subscriber
        def debug_handler(data):
            print(f"Operation: {data['type']}, Success: {'error' not in data}")
        insight.add_dump_subscriber(debug_handler)

        # Locate element
        result = await insight.locate("Submit button")
        if result.element:
            print(f"Found at: {result.rect}")

        # Extract data
        extracted = await insight.extract("All form field labels")
        print(f"Extracted: {extracted['data']}")

        # Assert condition
        assert_result = await insight.assert_condition("Page has loaded completely")
        print(f"Assertion passed: {assert_result.passed}")
        print(f"Reasoning: {assert_result.thought}")

        # Describe element at coordinates
        description = await insight.describe((500, 300))
        print(f"Element description: {description}")

asyncio.run(insight_usage())
```

--------------------------------

### Execute Search Actions with ai_action

Source: https://context7.com/python51888/midscene-python/llms.txt

Shows how to use the `ai_action` method to perform a series of search-related actions on a web page, such as entering text and clicking. The AI automatically handles element location.

```python
import asyncio
from midscene import Agent
from midscene.web import SeleniumWebPage

async def search_example():
    with SeleniumWebPage.create() as page:
        agent = Agent(page)
        await page.navigate_to("https://www.google.com")

        # AI understands context and locates elements automatically
        await agent.ai_action("Enter 'Python automation' in the search box")
        await agent.ai_action("Click the search button")
        await agent.ai_action("Scroll down to see more results")
        await agent.ai_action("Click on the first search result")

asyncio.run(search_example())
```

--------------------------------

### Configure Report Generation

Source: https://github.com/python51888/midscene-python/blob/master/docs/quickstart.md

Enable report generation for Midscene automation by setting `generate_report=True` and specifying a report file name in `AgentOptions`.

```python
# 生成执行报告
options = AgentOptions(
    generate_report=True,
    report_file_name="automation_report"
)
```

--------------------------------

### Dynamic UIContext Provider

Source: https://github.com/python51888/midscene-python/blob/master/wiki/核心概念/Insight-UI理解引擎.md

Implement a dynamic context provider function that fetches UI context information based on the type of Insight action being performed. This allows for adaptive context gathering.

```python
# 动态上下文
async def get_context(action: InsightAction) -> UIContext:
    # 根据操作类型获取不同的上下文信息
    if action == InsightAction.LOCATE:
        return await page.get_locate_context()
    elif action == InsightAction.EXTRACT:
        return await page.get_extract_context()
    else:
        return await page.get_default_context()

insight = Insight(get_context)
```

--------------------------------

### AI Locate: Find Login Button

Source: https://github.com/python51888/midscene-python/blob/master/README.zh.md

Demonstrates using the ai_locate method to find a UI element, specifically the '登录按钮' (login button), using natural language. The agent will employ intelligent strategies to identify and return the element.

```python
element = await agent.ai_locate("登录按钮")
```

--------------------------------

### AIModelConfig Configuration Class

Source: https://github.com/python51888/midscene-python/blob/master/wiki/核心概念/AI模型服务抽象层.md

Defines the configuration parameters for AI models, including provider, model name, API key, and other settings.

```python
class AIModelConfig(BaseModel):
    """AI model configuration"""
    provider: str                    # 提供商名称
    model: str                      # 模型名称
    api_key: str                    # API 密钥
    base_url: Optional[str] = None  # 自定义 API 地址
    max_tokens: int = 4000          # 最大 token 数
    temperature: float = 0.1        # 随机性控制
    timeout: int = 60               # 请求超时
```

--------------------------------

### Troubleshoot AI Model API Key

Source: https://github.com/python51888/midscene-python/blob/master/wiki/快速开始.md

Verify your AI model API key configuration by printing a portion of the key retrieved from environment variables.

```python
# 检查 API Key 配置
import os
print(f"API Key: {os.getenv('OPENAI_API_KEY')[:10]}...")
```

--------------------------------

### Run YAML Scripts with Midscene CLI

Source: https://github.com/python51888/midscene-python/blob/master/docs/quickstart.md

Execute Midscene automation scripts using the command-line interface. Supports single scripts, directories, configuration files, concurrent execution, and device specification.

```bash
# 运行单个脚本
midscene run script.yaml

# 运行目录中的所有脚本
midscene run scripts/

# 使用配置文件
midscene run script.yaml --config midscene.yml

# 并发执行
midscene run scripts/ --concurrent 3

# Android 设备指定
midscene run android_script.yaml --device device_id
```

--------------------------------

### Verify ADB Device Connection

Source: https://github.com/python51888/midscene-python/blob/master/wiki/安装配置.md

Connect an Android device with USB debugging enabled and use 'adb devices' to verify the connection. The output should list the device ID.

```bash
adb devices
```

--------------------------------

### Locate UI Elements with ai_locate

Source: https://context7.com/python51888/midscene-python/llms.txt

Demonstrates the `ai_locate` method for finding UI elements using natural language descriptions. It returns a `LocateResult` object and supports options for deep analysis.

```python
import asyncio
from midscene import Agent
from midscene.web import SeleniumWebPage
from midscene.core.types import LocateOption

async def locate_elements():
    with SeleniumWebPage.create() as page:
        agent = Agent(page)
        await page.navigate_to("https://example.com")

        # Basic element location
        login_button = await agent.ai_locate("Login button in the header")
        print(f"Found element at: {login_button.rect}")

        # Location with deep analysis for complex UIs
        options = LocateOption(deep_think=True, cacheable=True)
        submit_btn = await agent.ai_locate("Submit form button", options=options)

        # Interact with located element
        if submit_btn.element:
            await submit_btn.element.tap()

asyncio.run(locate_elements())
```

--------------------------------

### Web Automation with Selenium

Source: https://github.com/python51888/midscene-python/blob/master/docs/quickstart.md

Automate web interactions using Selenium and Midscene's Agent. Requires navigation, AI actions for user input, data extraction, and assertions.

```python
import asyncio
from midscene import Agent
from midscene.web import SeleniumWebPage

async def web_automation():
    # 创建浏览器实例
    with SeleniumWebPage.create(headless=False) as page:
        agent = Agent(page)
        
        # 导航到网站
        await page.navigate_to("https://example.com")
        
        # 使用自然语言进行操作
        await agent.ai_action("点击登录按钮")
        await agent.ai_action("在用户名框输入 'demo@example.com'")
        await agent.ai_action("在密码框输入 'password123'")
        await agent.ai_action("点击提交按钮")
        
        # 数据提取
        user_info = await agent.ai_extract({
            "username": "用户名",
            "email": "邮箱地址"
        })
        print(f"用户信息: {user_info}")
        
        # 断言验证
        await agent.ai_assert("页面显示欢迎信息")

# 运行示例
asyncio.run(web_automation())
```

--------------------------------

### AI Assert: User Login Verification

Source: https://github.com/python51888/midscene-python/blob/master/README.zh.md

Illustrates the use of the ai_assert method for verifying page state. The agent will interpret the natural language assertion '用户已成功登录' (user has successfully logged in) to check if the login operation was successful.

```python
await agent.ai_assert("用户已成功登录")
```