### Manually Install PyTorch Framework

Source: https://github.com/cann/pyasc/blob/master/docs/quick_start.md

Manually install the PyTorch framework by downloading the appropriate wheel file and using pip. This example is for Python 3.9 on x86_64 architecture with PyTorch 2.7.1.

```bash
# Download package
wget https://download.pytorch.org/whl/cpu/torch-2.7.1%2Bcpu-cp39-cp39-manylinux_2_28_x86_64.whl
# Install command
pip3 install torch-2.7.1+cpu-cp39-cp39-manylinux_2_28_x86_64.whl
```

--------------------------------

### Install pyasc from Source

Source: https://context7.com/cann/pyasc/llms.txt

Steps to install pyasc from source, including downloading LLVM and setting up the environment.

```bash
# Download LLVM
wget https://cann-ai.obs.cn-north-4.myhuaweicloud.com/llvm/LLVM-19.1.7-aarch64.tar.xz
tar -xJf LLVM-19.1.7-aarch64.tar.xz
export LLVM_INSTALL_PREFIX=$PWD/LLVM-19.1.7-aarch64

# Clone and install
git clone https://gitcode.com/cann/pyasc.git
cd pyasc
python3 -m pip install -r requirements-build.txt
python3 -m pip install -r requirements-runtime.txt
python3 -m pip install .
```

--------------------------------

### Install PyTorch and torch_npu Plugin

Source: https://github.com/cann/pyasc/blob/master/docs/quick_start.md

Install PyTorch framework and torch_npu plugin. This is required for running operator verification with PyTorch input/output tensors. The script `torch_npu_install.sh` provides a one-click installation.

```bash
cd pyasc
bash torch_npu_install.sh
```

--------------------------------

### Run Add Framework Example

Source: https://github.com/cann/pyasc/blob/master/python/tutorials/02_add_framework/README.md

Execute the Add framework example script. Replace '[RUN_MODE]' with 'Model' or 'NPU', and '[SOC_VERSION]' with your Ascend AI processor model (e.g., Ascend910C).

```bash
cd pyasc/python/tutorials/02_add_framework
python3 add_framework.py -r [RUN_MODE] -v [SOC_VERSION]
```

```bash
python3 add_framework.py -r Model -v Ascendxxxyy
```

--------------------------------

### Matmul Operation Setup and Execution

Source: https://github.com/cann/pyasc/blob/master/docs/python-api/language/generated/asc.language.adv.get_normal_config.md

Demonstrates the setup and execution of a matrix multiplication operation using the configured MatmulConfig.

```APIDOC
### Description
This section provides an example of how to use the `get_normal_config` function to obtain a `MatmulConfig` and then utilize it to set up and execute a matrix multiplication.

### Method
N/A (Illustrative Example)

### Endpoint
N/A (Illustrative Example)

### Request Example
```python
# Assume a_type, b_type, c_type, bias_type, pipe, workspace, and tiling are defined elsewhere

# Get the matrix multiplication configuration
mm_cfg = asc.adv.get_normal_config()

# Initialize the Matmul operation with the configuration
mm = asc.adv.Matmul(a_type, b_type, c_type, bias_type, mm_cfg)

# Register the matrix multiplication operation
asc.adv.register_matmul(pipe, workspace, mm, tiling)

# Set the input tensors for matrix multiplication
mm.set_tensor_a(gm_a)
mm.set_tensor_b(gm_b)
mm.set_bias(gm_bias)

# Execute the matrix multiplication operation
mm.iterate_all(gm_c)
```

### Response
N/A (Illustrative Example)
```

--------------------------------

### Install lit for ASC-IR Unit Tests

Source: https://github.com/cann/pyasc/blob/master/docs/quick_start.md

Install the lit tool, which is required for running ASC-IR definition module unit tests. Refer to the LLVM documentation for installation details.

```bash
pip install lit
```

--------------------------------

### Manually Install torch_npu Plugin

Source: https://github.com/cann/pyasc/blob/master/docs/quick_start.md

Manually install the torch_npu plugin by downloading the appropriate wheel file and using pip. This example is for Python 3.9, x86_64, with PyTorch 2.7.1 and torch_npu 7.3.0. Skip if only running in emulator mode.

```bash
# Download plugin package
wget https://gitcode.com/Ascend/pytorch/releases/download/v7.3.0-pytorch2.7.1/torch_npu-2.7.1.post2-cp39-cp39-manylinux_2_28_x86_64.whl
# Install command
pip3 install torch_npu-2.7.1.post2-cp39-cp39-manylinux_2_28_x86_64.whl
```

--------------------------------

### Run Add Operator Example

Source: https://github.com/cann/pyasc/blob/master/docs/quick_start.md

Execute the Add operator example to verify its functionality. Navigate to the pyasc directory and run the specified Python script.

```bash
cd pyasc
python3 ./python/tutorials/01_add/add.py
```

--------------------------------

### Run Add Operator Example with Parameters

Source: https://github.com/cann/pyasc/blob/master/docs/quick_start.md

Run the Add operator example with specified parameters for RUN_MODE and SOC_VERSION. Default RUN_MODE is emulator mode, and default SOC_VERSION in emulator mode is Ascend910B1. NPU on-board mode auto-detects.

```bash
python3 ./python/tutorials/01_add/add.py -r [RUN_MODE] -v [SOC_VERSION]
```

--------------------------------

### Install pytest for Unit Testing

Source: https://github.com/cann/pyasc/blob/master/docs/quick_start.md

Install the pytest framework for running unit tests. This is a prerequisite for executing Python module UT tests.

```bash
pip install pytest
```

--------------------------------

### Build and Install pyasc from Source (Normal Mode)

Source: https://github.com/cann/pyasc/blob/master/docs/quick_start.md

Build and install pyasc from source code in normal mode. This installs the project into the Python environment's site-packages directory, suitable for production environments.

```bash
python3 -m pip install .
```

--------------------------------

### Install pyasc via pip

Source: https://context7.com/cann/pyasc/llms.txt

Use pip to quickly install the pyasc library.

```bash
pip install pyasc
```

--------------------------------

### List Syntax Example

Source: https://github.com/cann/pyasc/blob/master/docs/python_syntax_support.md

Illustrates the creation and return of a list.

```python
@asc.jit
def func_visit_list(a, b, c):
    nums = [a, b, c]
    return nums
```

--------------------------------

### Python Call Example for asc.adv.log

Source: https://github.com/cann/pyasc/blob/master/docs/python-api/language/generated/asc.language.adv.log.md

This is a Python example demonstrating how to call the asc.adv.log function. Ensure that 'dst' and 'src' are properly defined LocalTensor objects.

```python
asc.adv.log(dst, src)
```

--------------------------------

### Python Example for Setting AIPP Functions

Source: https://github.com/cann/pyasc/blob/master/docs/python-api/language/generated/asc.language.basic.set_aipp_functions.md

This example demonstrates how to configure AIPP parameters in Python, including swap and channel padding settings, before calling `set_aipp_functions`.

```python
swap_settings = asc.AippSwapParams(is_swap_rb=True)
cpad_settings = asc.AippChannelPaddingParams(c_padding_mode=0, c_padding_value=-1)

aipp_config_int8 = asc.AippParams(
    dtype=asc.int8,
    swap_params=swap_settings,
    c_padding_params=cpad_settings
)

asc.set_aipp_functions(rgb_gm, asc.AippInputFormat.RGB888_U8, aipp_config_int8)
```

--------------------------------

### Tuple Syntax Example

Source: https://github.com/cann/pyasc/blob/master/docs/python_syntax_support.md

Shows how to create and return a tuple.

```python
@asc.jit
def func_visit_tuple(x, y, z):
    return x, y, z
```

--------------------------------

### Verify LLVM Installation

Source: https://github.com/cann/pyasc/blob/master/docs/quick_start.md

Verify the LLVM installation by checking its version using the llvm-config command. Ensure the path to llvm-config is correctly set via LLVM_INSTALL_PREFIX.

```bash
${LLVM_INSTALL_PREFIX}/bin/llvm-config --version
```

--------------------------------

### Python Calling Example for Ascend C Atan

Source: https://github.com/cann/pyasc/blob/master/docs/python-api/language/generated/asc.language.adv.atan.md

Example demonstrating how to call the asc.adv.Atan function from Python, including buffer initialization and tensor allocation. Ensure buffer_size is obtained from Host-side tiling parameters.

```python
pipe = asc.Tpipe()
tmp_que = asc.TQue(asc.TPosition.VECCALC, 1)
pipe.init_buffer(que=tmp_que, num=1, len=buffer_size)   # buffer_size 通过Host侧tiling参数获取
shared_tmp_buffer = tmp_que.alloc_tensor(asc.uint8)
# 输入tensor长度为1024，算子输入的数据类型为half，实际计算个数为512
asc.adv.Atan(dst, src, count=512, temp_buffer=shared_tmp_buffer)
```

--------------------------------

### Python TPipe.init_buf_pool Usage Example

Source: https://github.com/cann/pyasc/blob/master/docs/python-api/language/generated/asc.language.fwk.TPipe.init_buf_pool.md

Example demonstrating how to set global buffers and initialize buffer pools using TPipe.init_buf_pool. The second call shows reusing a previously initialized buffer pool.

```python
src0_global.set_global_buffer(src0_gm)
src1_global.set_global_buffer(src1_gm)
dst_global.set_global_buffer(dst_gm)
pipe.init_buf_pool(tbuf_pool1, 196608)
pipe.init_buf_pool(tbuf_pool2, 196608, tbuf_pool1)

```

--------------------------------

### Execute matmul_leakyrelu Sample

Source: https://github.com/cann/pyasc/blob/master/python/tutorials/05_matmul_leakyrelu/README.md

Run the matmul_leakyrelu sample with specified run mode and SOC version. Ensure environment is configured and refer to quick_start.md for setup.

```bash
cd pyasc/python/tutorials/05_matmul_leakyrelu
python3 matmul_leakyrelu.py -r [RUN_MODE] -v [SOC_VERSION]
```

--------------------------------

### Constant Syntax Example

Source: https://github.com/cann/pyasc/blob/master/docs/python_syntax_support.md

Illustrates the use of integer and boolean constants.

```python
@asc.jit
def func_visit_constant():
    a = 1
    b = True
    return a, b
```

--------------------------------

### Binary Operator Syntax Example

Source: https://github.com/cann/pyasc/blob/master/docs/python_syntax_support.md

Demonstrates the use of binary operators, including addition and multiplication.

```python
@asc.jit
def func_visit_binop(num1, num2, num3):
    result = num1 + num2 * num3
    return result
```

--------------------------------

### Formatted String and Assert Statement Example

Source: https://github.com/cann/pyasc/blob/master/docs/python_syntax_support.md

Shows an assert statement with a formatted string message, which will always fail in this example.

```python
@asc.jit
def func_visit_joined_and_formatted_and_assert(num):
    assert 1 < 0, f"assert failed {num}"
```

--------------------------------

### Build and Install pyasc from Source (Developer Mode)

Source: https://github.com/cann/pyasc/blob/master/docs/quick_start.md

Build and install pyasc from source code in developer mode. This creates symbolic links, allowing local changes to take effect immediately without reinstallation, ideal for development.

```bash
python3 -m pip install -e .
```

--------------------------------

### For Loop Syntax Example

Source: https://github.com/cann/pyasc/blob/master/docs/python_syntax_support.md

Shows how to use a for loop with the range function for summation.

```python
@asc.jit
def func_visit_for(num, total):
    for i in range(num):
      total += i
    return total
```

--------------------------------

### Python MatmulApiTiling.set_fix_split Example

Source: https://github.com/cann/pyasc/blob/master/docs/python-api/lib/generated/asc.lib.host.MatmulApiTiling.set_fix_split.md

Example demonstrating how to use set_fix_split to configure tiling parameters for matrix multiplication. Ensure that the base_m and base_n values adhere to the specified constraints to avoid tiling failures.

```python
import asc.lib.host as host
ascendc_platform = host.get_ascendc_platform()
tiling = host.MatmulApiTiling(ascendc_platform)
tiling.set_a_type(host.TPosition.GM, host.CubeFormat.ND, host.DataType.DT_FLOAT16)
tiling.set_b_type(host.TPosition.GM, host.CubeFormat.ND, host.DataType.DT_FLOAT16)
tiling.set_c_type(host.TPosition.GM, host.CubeFormat.ND, host.DataType.DT_FLOAT)
tiling.set_bias_type(host.TPosition.GM, host.CubeFormat.ND, host.DataType.DT_FLOAT)
tiling.set_shape(1024, 1024, 1024)
tiling.set_org_shape(1024, 1024, 1024)
tiling.set_bias(True)
tiling.set_fix_split(16, 16, -1)  # Set fixed base_m, bakse_n
tiling.set_buffer_space(-1, -1, -1)
tiling_data = host.TCubeTiling()
ret = tiling.get_tiling(tiling_data)
```

--------------------------------

### Attribute Access Syntax Example

Source: https://github.com/cann/pyasc/blob/master/docs/python_syntax_support.md

Demonstrates attribute access, specifically calling the __len__ method on a list.

```python
@asc.jit
def func_visit_attribute():
    nums = [1, 2, 3]
    return nums.__len__()
```

--------------------------------

### While Loop Syntax Example

Source: https://github.com/cann/pyasc/blob/master/docs/python_syntax_support.md

Illustrates the usage of a while loop for iterative summation.

```python
@asc.jit
def func_visit_while(i, n, ans):
    while i < n:
        ans += i
        i += 1
    return ans
```

--------------------------------

### Expression Statement and Function Call Example

Source: https://github.com/cann/pyasc/blob/master/docs/python_syntax_support.md

Demonstrates calling a function as an expression statement.

```python
@asc.jit
def func():
    pass

@asc.jit
def func_visit_expr():
    func() #表达式/函数调用
```

--------------------------------

### Subscript Access Syntax Example

Source: https://github.com/cann/pyasc/blob/master/docs/python_syntax_support.md

Illustrates accessing elements of a list using index notation.

```python
@asc.jit
def func_visit_subscript(a, b):
    nums = [a, b]
    return nums[0] + nums[1]
```

--------------------------------

### Slice Expression Syntax Example

Source: https://github.com/cann/pyasc/blob/master/docs/python_syntax_support.md

Shows how to use slicing to extract a sublist from a list.

```python
@asc.jit
def func_visit_slice():
    nums = [1, 2, 3, 4, 5]
    ans = nums[2:]
    return len(ans)
```

--------------------------------

### Pass Statement Syntax Example

Source: https://github.com/cann/pyasc/blob/master/docs/python_syntax_support.md

Demonstrates the 'pass' statement, which acts as a null operation.

```python
@asc.jit
def func_visit_pass():
    pass
```

--------------------------------

### Format a file with clang-format

Source: https://github.com/cann/pyasc/blob/master/docs/codestyle.md

Use clang-format to automatically format a file according to the project's coding style. Ensure clang-format is installed and configured.

```bash
clang-format -i <filename>
```

--------------------------------

### Comparison Expression Syntax Example

Source: https://github.com/cann/pyasc/blob/master/docs/python_syntax_support.md

Shows various comparison operators (>, <, >=, <=, ==) used in conditional logic.

```python
@asc.jit
def func_visit_compare(a, b, c, ans):
    if a > b:
        ans += 2
    if a + b < c:
        ans += 1
    if ans >= 10:
        ans += 3
    if ans <= 5:
        ans += 1
    if ans == c:
        ans += 5
    return ans
```

--------------------------------

### Python Example for Ascend C mrg_sort and get_mrg_sort_result

Source: https://github.com/cann/pyasc/blob/master/docs/python-api/language/generated/asc.language.basic.get_mrg_sort_result.md

This Python code demonstrates the setup of local tensors and the usage of asc.mrg_sort with different configurations, including the 'is_exhausted_suspension' and 'MrgSort4Info' parameters. It concludes by calling asc.get_mrg_sort_result to retrieve the processed counts.

```python
src1 = asc.LocalTensor(dtype=asc.float16, pos=asc.TPosition.VECIN, addr=0, tile_size=512)
src2 = asc.LocalTensor(dtype=asc.float16, pos=asc.TPosition.VECIN, addr=512, tile_size=512)
src3 = asc.LocalTensor(dtype=asc.float16, pos=asc.TPosition.VECIN, addr=1024, tile_size=512)
src4 = asc.LocalTensor(dtype=asc.float16, pos=asc.TPosition.VECIN, addr=1536, tile_size=512)
dst = asc.LocalTensor(dtype=asc.float16, pos=asc.TPosition.VECOUT, addr=0, tile_size=2048)
element_count_list = [128, 128, 128, 128]
sorted_num = [0, 0, 0, 0]
asc.mrg_sort(dst, sort_list, element_count_list, sorted_num, valid_bit=15, repeat_time=1)
asc.mrg_sort(dst, sort_list, element_count_list, sorted_num, valid_bit=15,
            repeat_time=1, is_exhausted_suspension=True)
mrg_sort4_info = asc.MrgSort4Info(element_count_list, if_exhausted_suspension=False,
                                  valid_bit=7, repeat_times=1)
asc.mrg_sort(dst, sort_list, mrg_sort4_info)

mrg1, mrg2, mrg3, mrg4 = asc.get_mrg_sort_result()
```

--------------------------------

### Main Execution and Configuration

Source: https://context7.com/cann/pyasc/llms.txt

Sets up the Ascend platform configuration, creates input tensors, launches the vector addition, and verifies the result. This function serves as the entry point for the example.

```python
def main():
    # Configure platform (use Model for simulator, NPU for hardware)
    config.set_platform(config.Backend.Model, config.Platform.Ascend910B1)
    device = "cpu"  # "npu" for actual hardware

    size = 8 * 2048
    x = torch.rand(size, dtype=torch.float32, device=device)
    y = torch.rand(size, dtype=torch.float32, device=device)

    z = vadd_launch(x, y)

    assert torch.allclose(z, x + y)
    logging.info("Vector add completed successfully!")


if __name__ == "__main__":
    main()
```

--------------------------------

### Python Function Docstring Example

Source: https://github.com/cann/pyasc/blob/master/docs/API_docstring_generation_tool_guide.md

Example of a Python function docstring following Google or NumPy style, including C++ API prototypes, parameter descriptions, and usage examples with different mask modes and count parameters.

```python
def add(dst: LocalTensor, src0: LocalTensor, src1: LocalTensor, *args, **kwargs) -> None:
    """
    按元素求和。

    **对应的Ascend C函数原型**

    .. code-block:: c++

        template <typename T>
        __aicore__ inline void Add(const LocalTensor<T>& dst, const LocalTensor<T>& src0,
                                    const LocalTensor<T>& src1, const int32_t& count);

    .. code-block:: c++

        template <typename T, bool isSetMask = true>
        __aicore__ inline void Add(const LocalTensor<T>& dst, const LocalTensor<T>& src0,
                                    const LocalTensor<T>& src1, uint64_t mask[], const uint8_t repeatTimes,
                                    const BinaryRepeatParams& repeatParams);
                                
    .. code-block:: c++

        template <typename T, bool isSetMask = true>
        __aicore__ inline void Add(const LocalTensor<T>& dst, const LocalTensor<T>& src0,
                                    const LocalTensor<T>& src1, uint64_t mask, const uint8_t repeatTimes,
                                    const BinaryRepeatParams& repeatParams);


    **参数说明**

    - is_set_mask：是否在接口内部设置mask模式和mask值。
    - dst: 目的操作数。类型为LocalTensor，支持的TPosition为VECIN/VECCALC/VECOUT。
    - src0, src1: 源操作数。类型为LocalTensor，支持的TPosition为VECIN/VECCALC/VECOUT。
    - count: 参与计算的元素个数。
    - mask: 用于控制每次迭代内参与计算的元素。
    - repeat_times: 重复迭代次数。
    - params: 控制操作数地址步长的参数。

    **返回值说明**（若无则无需补充）
    ...

    **调用示例**

    - tensor高维切分计算样例-mask连续模式

      .. code-block:: python

          mask = 128
          # repeat_times = 4，一次迭代计算128个数，共计算512个数
          # dst_blk_stride, src0_blk_stride, src1_blk_stride = 1，单次迭代内数据连续读取和写入
          # dst_rep_stride, src0_rep_stride, src1_rep_stride = 8，相邻迭代间数据连续读取和写入
          params = asc.BinaryRepeatParams(1, 1, 1, 8, 8, 8)
          asc.add(dst, src0, src1, mask=mask, repeat_times=4, repeat_params=params)

    - tensor高维切分计算样例-mask逐bit模式

      .. code-block:: python

          mask = [uint64_max, uint64_max]
          # repeat_times = 4，一次迭代计算128个数，共计算512个数
          # dst_blk_stride, src0_blk_stride, src1_blk_stride = 1，单次迭代内数据连续读取和写入
          # dst_rep_stride, src0_rep_stride, src1_rep_stride = 8，相邻迭代间数据连续读取和写入
          params = asc.BinaryRepeatParams(1, 1, 1, 8, 8, 8)
          asc.add(dst, src0, src1, mask=mask, repeat_times=4, repeat_params=params)

    - tensor前n个数据计算样例

      .. code-block:: python

          asc.add(dst, src0, src1, count=512)

    """
    builder = global_builder.get_ir_builder()
    op_impl("add", dst, src0, src1, args, kwargs, builder.create_asc_AddL0Op, builder.create_asc_AddL1Op,
            builder.create_asc_AddL2Op)

```

--------------------------------

### Execute matmul_cube_only Sample

Source: https://github.com/cann/pyasc/blob/master/python/tutorials/05_matmul_leakyrelu/README.md

Example of executing the matmul_cube_only sample with 'Model' run mode and a placeholder for the SOC version. Replace Ascendxxxyy with your actual AI processor model.

```bash
python3 matmul_cube_only.py -r Model -v Ascendxxxyy
```

--------------------------------

### Define and Launch a JIT Compiled Kernel

Source: https://context7.com/cann/pyasc/llms.txt

Demonstrates defining a JIT compiled kernel for vector addition and launching it with specified core count and stream. Ensure Ascend NPU platform is configured.

```python
import asc
import torch
import asc.runtime.config as config
import asc.lib.runtime as rt

# Basic JIT function definition
@asc.jit
def vector_add_kernel(x: asc.GlobalAddress, y: asc.GlobalAddress, z: asc.GlobalAddress, length: int):
    offset = asc.get_block_idx() * length
    x_gm = asc.GlobalTensor()
    y_gm = asc.GlobalTensor()
    z_gm = asc.GlobalTensor()
    x_gm.set_global_buffer(x + offset, length)
    y_gm.set_global_buffer(y + offset, length)
    z_gm.set_global_buffer(z + offset, length)

    x_local = asc.LocalTensor(x.dtype, asc.TPosition.VECIN, 0, length)
    y_local = asc.LocalTensor(y.dtype, asc.TPosition.VECIN, length * x.dtype.sizeof(), length)
    z_local = asc.LocalTensor(z.dtype, asc.TPosition.VECOUT, 2 * length * x.dtype.sizeof(), length)

    asc.data_copy(x_local, x_gm, length)
    asc.data_copy(y_local, y_gm, length)
    asc.set_flag(asc.HardEvent.MTE2_V, 0)
    asc.wait_flag(asc.HardEvent.MTE2_V, 0)

    asc.add(z_local, x_local, y_local, length)

    asc.set_flag(asc.HardEvent.V_MTE3, 0)
    asc.wait_flag(asc.HardEvent.V_MTE3, 0)
    asc.data_copy(z_gm, z_local, length)

# JIT with options
@asc.jit(always_compile=True)
def my_kernel_with_options(x: asc.GlobalAddress, y: asc.GlobalAddress):
    pass

# Launch kernel with core count and stream
def launch_add(x: torch.Tensor, y: torch.Tensor) -> torch.Tensor:
    config.set_platform(config.Backend.NPU)
    z = torch.zeros_like(x)
    core_num = 8
    block_length = z.numel() // core_num
    vector_add_kernel[core_num, rt.current_stream()](x, y, z, block_length)
    return z
```

--------------------------------

### Configure ASC Runtime Backend and Platform

Source: https://context7.com/cann/pyasc/llms.txt

Sets the execution backend (simulator or NPU) and specifies the target platform, including SoC version and device ID for multi-NPU systems. Also demonstrates checking runtime availability and getting the current stream and platform.

```python
import asc.runtime.config as config
import asc.lib.runtime as rt

# Set execution backend
# Backend.Model - Use simulator (no NPU hardware required)
# Backend.NPU - Use actual NPU hardware
config.set_platform(config.Backend.Model)          # Simulator mode
config.set_platform(config.Backend.NPU)            # NPU mode

# Specify target platform (SoC version)
config.set_platform(
    config.Backend.Model,
    soc_version=config.Platform.Ascend910B1
)

# Available platforms
platforms = [
    config.Platform.Ascend910B1,
    config.Platform.Ascend910B2,
    config.Platform.Ascend910B3,
    config.Platform.Ascend910B4,
]

# Set device ID for multi-NPU systems
config.set_platform(
    config.Backend.NPU,
    device_id=0
)

# Check runtime availability
is_available = rt.is_available()

# Get current stream for kernel launch
stream = rt.current_stream()

# Get current platform
current = rt.current_platform()
```

--------------------------------

### Install Python Dependencies for pyasc

Source: https://github.com/cann/pyasc/blob/master/docs/quick_start.md

Install Python dependencies required for building and running pyasc. Use requirements-build.txt for build-time dependencies and requirements-runtime.txt for run-time dependencies.

```bash
python3 -m pip install -r requirements-build.txt # build-time dependencies
python3 -m pip install -r requirements-runtime.txt # run-time dependencies
```

--------------------------------

### PairReduceSum Python Example (Bitwise Mask)

Source: https://github.com/cann/pyasc/blob/master/docs/python-api/language/generated/asc.language.basic.pair_reduce_sum.md

Example demonstrating the use of PairReduceSum with a bitwise mask in Python. This allows for fine-grained control over which specific pairs of elements are summed.

```python
x_local = asc.LocalTensor(dtype=asc.float16, pos=asc.TPosition.VECIN, addr=0, tile_size=512)
z_local = asc.LocalTensor(dtype=asc.float16, pos=asc.TPosition.VECOUT, addr=0, tile_size=512)
uint64_max = 2**64 - 1
mask = [uint64_max, uint64_max]
asc.pair_reduce_sum(z_local, x_local, repeat_time=2, mask=mask,
                    dst_rep_stride=1, src_blk_stride=1, src_rep_stride=8)
```

--------------------------------

### PairReduceSum Python Example (Continuous Mask)

Source: https://github.com/cann/pyasc/blob/master/docs/python-api/language/generated/asc.language.basic.pair_reduce_sum.md

Example demonstrating the use of PairReduceSum with a continuous mask in Python. This is suitable for scenarios where a fixed number of initial elements should be summed.

```python
x_local = asc.LocalTensor(dtype=asc.float16, pos=asc.TPosition.VECIN, addr=0, tile_size=512)
z_local = asc.LocalTensor(dtype=asc.float16, pos=asc.TPosition.VECOUT, addr=0, tile_size=512)
asc.pair_reduce_sum(z_local, x_local, repeat_time=2, mask=128,
                    dst_rep_stride=1, src_blk_stride=1, src_rep_stride=8)
```

--------------------------------

### Python mrg_sort4 Function Call Example

Source: https://github.com/cann/pyasc/blob/master/docs/python-api/language/generated/asc.language.basic.mrg_sort4.md

Example demonstrating how to call the asc.mrg_sort4 function. It shows the creation of MrgSortSrcList and MrgSort4Info objects, and the subsequent call to the sorting function.

```python
# vconcat_work_local为已经创建并且完成排序的4个Region Proposals，每个Region Proposal数目是16个
src_list = asc.MrgSortSrcList(vconcat_work_local[0], vconcat_work_local[1], vconcat_work_local[2], vconcat_work_local[3])
element_lengths = [16, 16, 16, 16]
src_info = asc.MrgSort4Info(element_lengths, False, 15, 1)
asc.mrg_sort4(dst_local, src_list, src_info)
```

--------------------------------

### Download and Extract LLVM Precompiled Binaries

Source: https://github.com/cann/pyasc/blob/master/docs/quick_start.md

Download and extract LLVM precompiled binaries. Choose the appropriate command based on your system architecture (e.g., ARM or x86). Set the LLVM_INSTALL_PREFIX environment variable to the extracted directory.

```bash
# Example: Download LLVM precompiled package for ARM architecture
wget https://cann-ai.obs.cn-north-4.myhuaweicloud.com/llvm/LLVM-19.1.7-aarch64.tar.xz
tar -xJf LLVM-19.1.7-aarch64.tar.xz
export LLVM_INSTALL_PREFIX=$PWD/LLVM-19.1.7-aarch64
# Example: Download LLVM precompiled package for X86 architecture
wget https://cann-ai.obs.cn-north-4.myhuaweicloud.com/llvm/llvm-19.1.7-x86_64.tar.xz
tar -xJf llvm-19.1.7-x86_64.tar.xz
export LLVM_INSTALL_PREFIX=$PWD/llvm-19.1.7-x86_64
```

--------------------------------

### Python LocalMemAllocator.alloc Usage Example

Source: https://github.com/cann/pyasc/blob/master/docs/python-api/language/generated/asc.language.core.LocalMemAllocator.alloc.md

Example of allocating LocalTensor objects using the Python API. The first allocation specifies a constant tile size, while the second uses a variable.

```python
allocator = asc.LocalMemAllocator()

# 用户指定逻辑位置 VECIN，float 类型，Tensor 中有 1024 个元素
tensor1 = allocator.alloc(asc.TPosition.VECIN, float, 1024)

# 用户指定逻辑位置 VECIN，float 类型，Tensor 中有 tileLength 个元素
tile_length = 512
tensor2 = allocator.alloc(asc.TPosition.VECIN, float, tile_length)

```

--------------------------------

### Python Example: Shift Left First N Elements

Source: https://github.com/cann/pyasc/blob/master/docs/python-api/language/generated/asc.language.basic.shift_left.md

Example demonstrating the `asc.shift_left` function in Python to perform a left shift operation on the first N elements of a tensor. This is useful for partial tensor operations.

```python
asc.shift_left(dst, src, scalar, count=512)
```

--------------------------------

### Python Example: Shift Left with Mask (Continuous Mode)

Source: https://github.com/cann/pyasc/blob/master/docs/python-api/language/generated/asc.language.basic.shift_left.md

Example demonstrating the `asc.shift_left` function in Python for tensor high-dimensional splitting with a continuous mask mode. It specifies repeat times and stride parameters for iterative computation.

```python
mask = 128
scalar = 2
# repeat_times = 4，一次迭代计算128个数，共计算512个数
# dst_blk_stride, src_blk_stride = 1，单次迭代内数据连续读取和写入
# dst_rep_stride, src_rep_stride = 8，相邻迭代间数据连续读取和写入
params = asc.UnaryRepeatParams(1, 1, 8, 8)
asc.shift_left(dst, src, scalar, mask=mask, repeat_times=4, repeat_params=params)
```

--------------------------------

### Python Example for MultiCoreMatmulTiling Configuration

Source: https://github.com/cann/pyasc/blob/master/docs/python-api/lib/generated/asc.lib.host.MultiCoreMatmulTiling.set_single_range.md

This Python code demonstrates how to configure and use the MultiCoreMatmulTiling class, including setting the single core ranges. Ensure all necessary imports and platform initializations are done before calling this method.

```python
import asc.lib.host as host
ascendc_platform = host.get_ascendc_platform()
tiling = host.MultiCoreMatmulTiling(ascendc_platform)
tiling.set_dim(use_core_nums)
tiling.set_a_type(host.TPosition.GM, host.CubeFormat.ND, host.DataType.DT_FLOAT16)
tiling.set_b_type(host.TPosition.GM, host.CubeFormat.ND, host.DataType.DT_FLOAT16)
tiling.set_c_type(host.TPosition.GM, host.CubeFormat.ND, host.DataType.DT_FLOAT)
tiling.set_bias_type(host.TPosition.GM, host.CubeFormat.ND, host.DataType.DT_FLOAT)
tiling.set_shape(1024, 1024, 1024)
tiling.set_single_range(1024, 1024, 1024, 1024, 1024, 1024) # 设置single_core_m/single_core_n/single_core_k的最大值与最小值
tiling.set_org_shape(1024, 1024, 1024)
tiling.set_bias(True)
tiling.set_buffer_space(-1, -1, -1)
tiling_data = host.TCubeTiling()
ret = tiling.get_tiling(tiling_data)
```

--------------------------------

### ReduceMax API call - mask bitwise mode (Python)

Source: https://github.com/cann/pyasc/blob/master/docs/python-api/language/generated/asc.language.basic.reduce_max.md

Example Python call to the asc.reduce_max function for high-dimension split computation using a bitwise mask. This example demonstrates setting a full mask and requesting the index of the maximum value.

```Python
uint64_max = 2**64 - 1
mask = [uint64_max, uint64_max]
asc.reduce_max(dst, src, shared_tmp_buffer=shared_tmp, mask=mask, repeat_time=65, src_rep_stride=8, cal_index=True)
```

--------------------------------

### C++ Include Ordering Example

Source: https://github.com/cann/pyasc/blob/master/docs/codestyle.md

Include statements should be ordered by category: local project, MLIR/LLVM/Clang, and standard library. Within each category, sort alphabetically. Ensure empty lines separate these categories.

```cpp
#include "ascir/Dialect/EmitAsc/IR/EmitAsc.h"
#include "ascir/Target/Asc/Utils.h"

#include "mlir/IR/Builders.h"
#include "mlir/IR/DialectImplementation.h"
#include "llvm/ADT/TypeSwitch.h"

#include <optional>
#include <unordered_map>
```

--------------------------------

### Python Example: Shift Left with Mask (Bit Mode)

Source: https://github.com/cann/pyasc/blob/master/docs/python-api/language/generated/asc.language.basic.shift_left.md

Example demonstrating the `asc.shift_left` function in Python for tensor high-dimensional splitting with a bitwise mask mode. It uses a list of uint64_max values for the mask and specifies repeat times and stride parameters.

```python
mask = [uint64_max, uint64_max]
scalar = 2
# repeat_times = 4，一次迭代计算128个数，共计算512个数
# dst_blk_stride, src_blk_stride = 1，单次迭代内数据连续读取和写入
# dst_rep_stride, src_rep_stride = 8，相邻迭代间数据连续读取和写入
params = asc.UnaryRepeatParams(1, 1, 8, 8)
asc.shift_left(dst, src, scalar, mask=mask, repeat_times=4, repeat_params=params)
```

--------------------------------

### Python Example for Matmul.set_workspace Usage

Source: https://github.com/cann/pyasc/blob/master/docs/python-api/language/generated/asc.language.adv.Matmul.set_workspace.md

This Python code demonstrates the typical workflow for setting up and executing a matrix multiplication using Ascend C, including registering the operation, setting the workspace, input tensors, and bias, followed by iteration and retrieving the result tensor.

```python
asc.adv.register_matmul(pipe, workspace, mm, tiling)
mm.set_workspace(workspace_gm)
mm.set_tensor_a(gm_a)
mm.set_tensor_b(gm_b)
mm.set_bias(gm_bias)
mm.iterate(sync=True)
for i in range(single_corem // base_m * single_core_n // base_n):
    mm.get_tensor_c(tensor=gm_c, sync=False)
```

--------------------------------

### Python Example for asc.adv.Sign Operator

Source: https://github.com/cann/pyasc/blob/master/docs/python-api/language/generated/asc.language.adv.sign.md

This Python code demonstrates how to call the asc.adv.Sign operator, including buffer initialization and tensor allocation. Ensure buffer_size is obtained from Host-side tiling parameters.

```python
pipe = asc.Tpipe()
tmp_que = asc.TQue(asc.TPosition.VECCALC, 1)
pipe.init_buffer(que=tmp_que, num=1, len=buffer_size)   # buffer_size 通过Host侧tiling参数获取
shared_tmp_buffer = tmp_que.alloc_tensor(asc.uint8)
# 输入tensor长度为1024，算子输入的数据类型为half，实际计算个数为512
asc.adv.Sign(dst, src, count=512, temp_buffer=shared_tmp_buffer)
```

--------------------------------

### MatmulApiTiling Configuration Methods

Source: https://github.com/cann/pyasc/blob/master/docs/python-api/lib/host.md

This section covers methods for configuring matrix multiplication tiling, including setting bias, layout, data types, and batch information.

```APIDOC
## MatmulApiTiling.enable_bias

### Description
Sets whether Bias participates in the computation. The setting must be consistent with the Kernel side.

### Method
`enable_bias(self, is_bias_in)`

### Parameters
- **is_bias_in** (bool) - Description of the bias participation setting.

### Response Example
(No specific response example provided in the source text)
```

```APIDOC
## MatmulApiTiling.get_base_k

### Description
Retrieves the baseK value calculated by Tiling.

### Method
`get_base_k(self)`

### Response Example
(No specific response example provided in the source text)
```

```APIDOC
## MatmulApiTiling.get_base_m

### Description
Retrieves the baseM value calculated by Tiling.

### Method
`get_base_m(self)`

### Response Example
(No specific response example provided in the source text)
```

```APIDOC
## MatmulApiTiling.get_base_n

### Description
Retrieves the baseN value calculated by Tiling.

### Method
`get_base_n(self)`

### Response Example
(No specific response example provided in the source text)
```

```APIDOC
## MatmulApiTiling.get_tiling

### Description
Retrieves the Tiling parameters.

### Method
`get_tiling(self, tiling)`

### Parameters
- **tiling** - The tiling parameter to retrieve.

### Response Example
(No specific response example provided in the source text)
```

```APIDOC
## MatmulApiTiling.set_a_layout

### Description
Sets the layout axis information for matrix A, including B, S, N, G, D axes. For BSNGD, SBNGD, BNGS1S2 layout formats, this interface must be called in the Host side Tiling implementation before calling the IterateBatch interface to set the layout axis information for matrix A.

### Method
`set_a_layout(self, b, s, n, g, d)`

### Parameters
- **b** - Description of the B axis setting.
- **s** - Description of the S axis setting.
- **n** - Description of the N axis setting.
- **g** - Description of the G axis setting.
- **d** - Description of the D axis setting.

### Response Example
(No specific response example provided in the source text)
```

```APIDOC
## MatmulApiTiling.set_a_type

### Description
Sets the position, data format, data type, and transpose information for matrix A. This information needs to be consistent with the settings on the kernel side.

### Method
`set_a_type(self, pos, type, ...)`

### Parameters
- **pos** - The position of matrix A.
- **type** - The data type of matrix A.
- **...** - Additional parameters related to data format and transpose.

### Response Example
(No specific response example provided in the source text)
```

```APIDOC
## MatmulApiTiling.set_b_layout

### Description
Sets the layout axis information for matrix B, including B, S, N, G, D axes. For BSNGD, SBNGD, BNGS1S2 layout formats, this interface must be called in the Host side Tiling implementation before calling the IterateBatch interface to set the layout axis information for matrix B.

### Method
`set_b_layout(self, b, s, n, g, d)`

### Parameters
- **b** - Description of the B axis setting.
- **s** - Description of the S axis setting.
- **n** - Description of the N axis setting.
- **g** - Description of the G axis setting.
- **d** - Description of the D axis setting.

### Response Example
(No specific response example provided in the source text)
```

```APIDOC
## MatmulApiTiling.set_b_type

### Description
Sets the position, data format, data type, and transpose information for matrix B. This information needs to be consistent with the settings on the kernel side.

### Method
`set_b_type(self, pos, type, ...)`

### Parameters
- **pos** - The position of matrix B.
- **type** - The data type of matrix B.
- **...** - Additional parameters related to data format and transpose.

### Response Example
(No specific response example provided in the source text)
```

```APIDOC
## MatmulApiTiling.set_batch_info_for_normal

### Description
Sets the M/N/K axis information for matrices A/B, and the Batch count for matrices A/B. For NORMAL layout types, this interface must be called in the Host side Tiling implementation before calling the IterateBatch or IterateNBatch interfaces to set information such as the M/N/K axes for matrices A/B.

### Method
`set_batch_info_for_normal(...)`

### Parameters
- **...** - Parameters for M/N/K axis information and Batch counts for matrices A/B.

### Response Example
(No specific response example provided in the source text)
```

```APIDOC
## MatmulApiTiling.set_batch_num

### Description
Sets the maximum batch count for multi-batch computation. The maximum batch count is the maximum of batchA of matrix A and batchB of matrix B. This interface must be called in the Host side Tiling implementation before calling the IterateBatch interface to set the batch count for multi-batch computation.

### Method
`set_batch_num(self, batch)`

### Parameters
- **batch** (int) - The maximum batch count for multi-batch computation.

### Response Example
(No specific response example provided in the source text)
```

```APIDOC
## MatmulApiTiling.set_bias_type

### Description
Sets the position, data format, and data type information for Bias. This information needs to be consistent with the settings on the kernel side.

### Method
`set_bias_type(self, pos, ...)`

### Parameters
- **pos** - The position of Bias.
- **...** - Additional parameters related to data format and type.

### Response Example
(No specific response example provided in the source text)
```

```APIDOC
## MatmulApiTiling.set_buffer_space

### Description
Sets the available L1 Buffer/L0C Buffer/Unified Buffer/BiasTable Buffer space for Matmul computation, in bytes.

### Method
`set_buffer_space(self, ...)`

### Parameters
- **...** - Parameters for specifying buffer space sizes.

### Response Example
(No specific response example provided in the source text)
```

```APIDOC
## MatmulApiTiling.set_c_layout

### Description
Sets the layout axis information for matrix C, including B, S, N, G, D axes. For BSNGD, SBNGD, BNGS1S2 layout formats, this interface must be called in the Host side Tiling implementation before calling the IterateBatch interface to set the layout axis information for matrix C.

### Method
`set_c_layout(self, b, s, n, g, d)`

### Parameters
- **b** - Description of the B axis setting.
- **s** - Description of the S axis setting.
- **n** - Description of the N axis setting.
- **g** - Description of the G axis setting.
- **d** - Description of the D axis setting.

### Response Example
(No specific response example provided in the source text)
```

```APIDOC
## MatmulApiTiling.set_c_type

### Description
Sets the position, data format, data type, and transpose information for matrix C. This information needs to be consistent with the settings on the kernel side.

### Method
`set_c_type(self, pos, type, ...)`

### Parameters
- **pos** - The position of matrix C.
- **type** - The data type of matrix C.
- **...** - Additional parameters related to data format and transpose.

### Response Example
(No specific response example provided in the source text)
```

--------------------------------

### Unary Operator Syntax Example

Source: https://github.com/cann/pyasc/blob/master/docs/python_syntax_support.md

Illustrates the use of a unary operator (bitwise NOT).

```python
@asc.jit
def func_visit_unary_op(a):
    return a + ~1
```