### Manually Install PyTorch Framework Source: https://github.com/cann/pyasc/blob/master/docs/quick_start.md Manually install the PyTorch framework by downloading the appropriate wheel file and using pip. This example is for Python 3.9 on x86_64 architecture with PyTorch 2.7.1. ```bash # Download package wget https://download.pytorch.org/whl/cpu/torch-2.7.1%2Bcpu-cp39-cp39-manylinux_2_28_x86_64.whl # Install command pip3 install torch-2.7.1+cpu-cp39-cp39-manylinux_2_28_x86_64.whl ``` -------------------------------- ### Install pyasc from Source Source: https://context7.com/cann/pyasc/llms.txt Steps to install pyasc from source, including downloading LLVM and setting up the environment. ```bash # Download LLVM wget https://cann-ai.obs.cn-north-4.myhuaweicloud.com/llvm/LLVM-19.1.7-aarch64.tar.xz tar -xJf LLVM-19.1.7-aarch64.tar.xz export LLVM_INSTALL_PREFIX=$PWD/LLVM-19.1.7-aarch64 # Clone and install git clone https://gitcode.com/cann/pyasc.git cd pyasc python3 -m pip install -r requirements-build.txt python3 -m pip install -r requirements-runtime.txt python3 -m pip install . ``` -------------------------------- ### Install PyTorch and torch_npu Plugin Source: https://github.com/cann/pyasc/blob/master/docs/quick_start.md Install PyTorch framework and torch_npu plugin. This is required for running operator verification with PyTorch input/output tensors. The script `torch_npu_install.sh` provides a one-click installation. ```bash cd pyasc bash torch_npu_install.sh ``` -------------------------------- ### Run Add Framework Example Source: https://github.com/cann/pyasc/blob/master/python/tutorials/02_add_framework/README.md Execute the Add framework example script. Replace '[RUN_MODE]' with 'Model' or 'NPU', and '[SOC_VERSION]' with your Ascend AI processor model (e.g., Ascend910C). ```bash cd pyasc/python/tutorials/02_add_framework python3 add_framework.py -r [RUN_MODE] -v [SOC_VERSION] ``` ```bash python3 add_framework.py -r Model -v Ascendxxxyy ``` -------------------------------- ### Matmul Operation Setup and Execution Source: https://github.com/cann/pyasc/blob/master/docs/python-api/language/generated/asc.language.adv.get_normal_config.md Demonstrates the setup and execution of a matrix multiplication operation using the configured MatmulConfig. ```APIDOC ### Description This section provides an example of how to use the `get_normal_config` function to obtain a `MatmulConfig` and then utilize it to set up and execute a matrix multiplication. ### Method N/A (Illustrative Example) ### Endpoint N/A (Illustrative Example) ### Request Example ```python # Assume a_type, b_type, c_type, bias_type, pipe, workspace, and tiling are defined elsewhere # Get the matrix multiplication configuration mm_cfg = asc.adv.get_normal_config() # Initialize the Matmul operation with the configuration mm = asc.adv.Matmul(a_type, b_type, c_type, bias_type, mm_cfg) # Register the matrix multiplication operation asc.adv.register_matmul(pipe, workspace, mm, tiling) # Set the input tensors for matrix multiplication mm.set_tensor_a(gm_a) mm.set_tensor_b(gm_b) mm.set_bias(gm_bias) # Execute the matrix multiplication operation mm.iterate_all(gm_c) ``` ### Response N/A (Illustrative Example) ``` -------------------------------- ### Install lit for ASC-IR Unit Tests Source: https://github.com/cann/pyasc/blob/master/docs/quick_start.md Install the lit tool, which is required for running ASC-IR definition module unit tests. Refer to the LLVM documentation for installation details. ```bash pip install lit ``` -------------------------------- ### Manually Install torch_npu Plugin Source: https://github.com/cann/pyasc/blob/master/docs/quick_start.md Manually install the torch_npu plugin by downloading the appropriate wheel file and using pip. This example is for Python 3.9, x86_64, with PyTorch 2.7.1 and torch_npu 7.3.0. Skip if only running in emulator mode. ```bash # Download plugin package wget https://gitcode.com/Ascend/pytorch/releases/download/v7.3.0-pytorch2.7.1/torch_npu-2.7.1.post2-cp39-cp39-manylinux_2_28_x86_64.whl # Install command pip3 install torch_npu-2.7.1.post2-cp39-cp39-manylinux_2_28_x86_64.whl ``` -------------------------------- ### Run Add Operator Example Source: https://github.com/cann/pyasc/blob/master/docs/quick_start.md Execute the Add operator example to verify its functionality. Navigate to the pyasc directory and run the specified Python script. ```bash cd pyasc python3 ./python/tutorials/01_add/add.py ``` -------------------------------- ### Run Add Operator Example with Parameters Source: https://github.com/cann/pyasc/blob/master/docs/quick_start.md Run the Add operator example with specified parameters for RUN_MODE and SOC_VERSION. Default RUN_MODE is emulator mode, and default SOC_VERSION in emulator mode is Ascend910B1. NPU on-board mode auto-detects. ```bash python3 ./python/tutorials/01_add/add.py -r [RUN_MODE] -v [SOC_VERSION] ``` -------------------------------- ### Install pytest for Unit Testing Source: https://github.com/cann/pyasc/blob/master/docs/quick_start.md Install the pytest framework for running unit tests. This is a prerequisite for executing Python module UT tests. ```bash pip install pytest ``` -------------------------------- ### Build and Install pyasc from Source (Normal Mode) Source: https://github.com/cann/pyasc/blob/master/docs/quick_start.md Build and install pyasc from source code in normal mode. This installs the project into the Python environment's site-packages directory, suitable for production environments. ```bash python3 -m pip install . ``` -------------------------------- ### Install pyasc via pip Source: https://context7.com/cann/pyasc/llms.txt Use pip to quickly install the pyasc library. ```bash pip install pyasc ``` -------------------------------- ### List Syntax Example Source: https://github.com/cann/pyasc/blob/master/docs/python_syntax_support.md Illustrates the creation and return of a list. ```python @asc.jit def func_visit_list(a, b, c): nums = [a, b, c] return nums ``` -------------------------------- ### Python Call Example for asc.adv.log Source: https://github.com/cann/pyasc/blob/master/docs/python-api/language/generated/asc.language.adv.log.md This is a Python example demonstrating how to call the asc.adv.log function. Ensure that 'dst' and 'src' are properly defined LocalTensor objects. ```python asc.adv.log(dst, src) ``` -------------------------------- ### Python Example for Setting AIPP Functions Source: https://github.com/cann/pyasc/blob/master/docs/python-api/language/generated/asc.language.basic.set_aipp_functions.md This example demonstrates how to configure AIPP parameters in Python, including swap and channel padding settings, before calling `set_aipp_functions`. ```python swap_settings = asc.AippSwapParams(is_swap_rb=True) cpad_settings = asc.AippChannelPaddingParams(c_padding_mode=0, c_padding_value=-1) aipp_config_int8 = asc.AippParams( dtype=asc.int8, swap_params=swap_settings, c_padding_params=cpad_settings ) asc.set_aipp_functions(rgb_gm, asc.AippInputFormat.RGB888_U8, aipp_config_int8) ``` -------------------------------- ### Tuple Syntax Example Source: https://github.com/cann/pyasc/blob/master/docs/python_syntax_support.md Shows how to create and return a tuple. ```python @asc.jit def func_visit_tuple(x, y, z): return x, y, z ``` -------------------------------- ### Verify LLVM Installation Source: https://github.com/cann/pyasc/blob/master/docs/quick_start.md Verify the LLVM installation by checking its version using the llvm-config command. Ensure the path to llvm-config is correctly set via LLVM_INSTALL_PREFIX. ```bash ${LLVM_INSTALL_PREFIX}/bin/llvm-config --version ``` -------------------------------- ### Python Calling Example for Ascend C Atan Source: https://github.com/cann/pyasc/blob/master/docs/python-api/language/generated/asc.language.adv.atan.md Example demonstrating how to call the asc.adv.Atan function from Python, including buffer initialization and tensor allocation. Ensure buffer_size is obtained from Host-side tiling parameters. ```python pipe = asc.Tpipe() tmp_que = asc.TQue(asc.TPosition.VECCALC, 1) pipe.init_buffer(que=tmp_que, num=1, len=buffer_size) # buffer_size 通过Host侧tiling参数获取 shared_tmp_buffer = tmp_que.alloc_tensor(asc.uint8) # 输入tensor长度为1024,算子输入的数据类型为half,实际计算个数为512 asc.adv.Atan(dst, src, count=512, temp_buffer=shared_tmp_buffer) ``` -------------------------------- ### Python TPipe.init_buf_pool Usage Example Source: https://github.com/cann/pyasc/blob/master/docs/python-api/language/generated/asc.language.fwk.TPipe.init_buf_pool.md Example demonstrating how to set global buffers and initialize buffer pools using TPipe.init_buf_pool. The second call shows reusing a previously initialized buffer pool. ```python src0_global.set_global_buffer(src0_gm) src1_global.set_global_buffer(src1_gm) dst_global.set_global_buffer(dst_gm) pipe.init_buf_pool(tbuf_pool1, 196608) pipe.init_buf_pool(tbuf_pool2, 196608, tbuf_pool1) ``` -------------------------------- ### Execute matmul_leakyrelu Sample Source: https://github.com/cann/pyasc/blob/master/python/tutorials/05_matmul_leakyrelu/README.md Run the matmul_leakyrelu sample with specified run mode and SOC version. Ensure environment is configured and refer to quick_start.md for setup. ```bash cd pyasc/python/tutorials/05_matmul_leakyrelu python3 matmul_leakyrelu.py -r [RUN_MODE] -v [SOC_VERSION] ``` -------------------------------- ### Constant Syntax Example Source: https://github.com/cann/pyasc/blob/master/docs/python_syntax_support.md Illustrates the use of integer and boolean constants. ```python @asc.jit def func_visit_constant(): a = 1 b = True return a, b ``` -------------------------------- ### Binary Operator Syntax Example Source: https://github.com/cann/pyasc/blob/master/docs/python_syntax_support.md Demonstrates the use of binary operators, including addition and multiplication. ```python @asc.jit def func_visit_binop(num1, num2, num3): result = num1 + num2 * num3 return result ``` -------------------------------- ### Formatted String and Assert Statement Example Source: https://github.com/cann/pyasc/blob/master/docs/python_syntax_support.md Shows an assert statement with a formatted string message, which will always fail in this example. ```python @asc.jit def func_visit_joined_and_formatted_and_assert(num): assert 1 < 0, f"assert failed {num}" ``` -------------------------------- ### Build and Install pyasc from Source (Developer Mode) Source: https://github.com/cann/pyasc/blob/master/docs/quick_start.md Build and install pyasc from source code in developer mode. This creates symbolic links, allowing local changes to take effect immediately without reinstallation, ideal for development. ```bash python3 -m pip install -e . ``` -------------------------------- ### For Loop Syntax Example Source: https://github.com/cann/pyasc/blob/master/docs/python_syntax_support.md Shows how to use a for loop with the range function for summation. ```python @asc.jit def func_visit_for(num, total): for i in range(num): total += i return total ``` -------------------------------- ### Python MatmulApiTiling.set_fix_split Example Source: https://github.com/cann/pyasc/blob/master/docs/python-api/lib/generated/asc.lib.host.MatmulApiTiling.set_fix_split.md Example demonstrating how to use set_fix_split to configure tiling parameters for matrix multiplication. Ensure that the base_m and base_n values adhere to the specified constraints to avoid tiling failures. ```python import asc.lib.host as host ascendc_platform = host.get_ascendc_platform() tiling = host.MatmulApiTiling(ascendc_platform) tiling.set_a_type(host.TPosition.GM, host.CubeFormat.ND, host.DataType.DT_FLOAT16) tiling.set_b_type(host.TPosition.GM, host.CubeFormat.ND, host.DataType.DT_FLOAT16) tiling.set_c_type(host.TPosition.GM, host.CubeFormat.ND, host.DataType.DT_FLOAT) tiling.set_bias_type(host.TPosition.GM, host.CubeFormat.ND, host.DataType.DT_FLOAT) tiling.set_shape(1024, 1024, 1024) tiling.set_org_shape(1024, 1024, 1024) tiling.set_bias(True) tiling.set_fix_split(16, 16, -1) # Set fixed base_m, bakse_n tiling.set_buffer_space(-1, -1, -1) tiling_data = host.TCubeTiling() ret = tiling.get_tiling(tiling_data) ``` -------------------------------- ### Attribute Access Syntax Example Source: https://github.com/cann/pyasc/blob/master/docs/python_syntax_support.md Demonstrates attribute access, specifically calling the __len__ method on a list. ```python @asc.jit def func_visit_attribute(): nums = [1, 2, 3] return nums.__len__() ``` -------------------------------- ### While Loop Syntax Example Source: https://github.com/cann/pyasc/blob/master/docs/python_syntax_support.md Illustrates the usage of a while loop for iterative summation. ```python @asc.jit def func_visit_while(i, n, ans): while i < n: ans += i i += 1 return ans ``` -------------------------------- ### Expression Statement and Function Call Example Source: https://github.com/cann/pyasc/blob/master/docs/python_syntax_support.md Demonstrates calling a function as an expression statement. ```python @asc.jit def func(): pass @asc.jit def func_visit_expr(): func() #表达式/函数调用 ``` -------------------------------- ### Subscript Access Syntax Example Source: https://github.com/cann/pyasc/blob/master/docs/python_syntax_support.md Illustrates accessing elements of a list using index notation. ```python @asc.jit def func_visit_subscript(a, b): nums = [a, b] return nums[0] + nums[1] ``` -------------------------------- ### Slice Expression Syntax Example Source: https://github.com/cann/pyasc/blob/master/docs/python_syntax_support.md Shows how to use slicing to extract a sublist from a list. ```python @asc.jit def func_visit_slice(): nums = [1, 2, 3, 4, 5] ans = nums[2:] return len(ans) ``` -------------------------------- ### Pass Statement Syntax Example Source: https://github.com/cann/pyasc/blob/master/docs/python_syntax_support.md Demonstrates the 'pass' statement, which acts as a null operation. ```python @asc.jit def func_visit_pass(): pass ``` -------------------------------- ### Format a file with clang-format Source: https://github.com/cann/pyasc/blob/master/docs/codestyle.md Use clang-format to automatically format a file according to the project's coding style. Ensure clang-format is installed and configured. ```bash clang-format -i ``` -------------------------------- ### Comparison Expression Syntax Example Source: https://github.com/cann/pyasc/blob/master/docs/python_syntax_support.md Shows various comparison operators (>, <, >=, <=, ==) used in conditional logic. ```python @asc.jit def func_visit_compare(a, b, c, ans): if a > b: ans += 2 if a + b < c: ans += 1 if ans >= 10: ans += 3 if ans <= 5: ans += 1 if ans == c: ans += 5 return ans ``` -------------------------------- ### Python Example for Ascend C mrg_sort and get_mrg_sort_result Source: https://github.com/cann/pyasc/blob/master/docs/python-api/language/generated/asc.language.basic.get_mrg_sort_result.md This Python code demonstrates the setup of local tensors and the usage of asc.mrg_sort with different configurations, including the 'is_exhausted_suspension' and 'MrgSort4Info' parameters. It concludes by calling asc.get_mrg_sort_result to retrieve the processed counts. ```python src1 = asc.LocalTensor(dtype=asc.float16, pos=asc.TPosition.VECIN, addr=0, tile_size=512) src2 = asc.LocalTensor(dtype=asc.float16, pos=asc.TPosition.VECIN, addr=512, tile_size=512) src3 = asc.LocalTensor(dtype=asc.float16, pos=asc.TPosition.VECIN, addr=1024, tile_size=512) src4 = asc.LocalTensor(dtype=asc.float16, pos=asc.TPosition.VECIN, addr=1536, tile_size=512) dst = asc.LocalTensor(dtype=asc.float16, pos=asc.TPosition.VECOUT, addr=0, tile_size=2048) element_count_list = [128, 128, 128, 128] sorted_num = [0, 0, 0, 0] asc.mrg_sort(dst, sort_list, element_count_list, sorted_num, valid_bit=15, repeat_time=1) asc.mrg_sort(dst, sort_list, element_count_list, sorted_num, valid_bit=15, repeat_time=1, is_exhausted_suspension=True) mrg_sort4_info = asc.MrgSort4Info(element_count_list, if_exhausted_suspension=False, valid_bit=7, repeat_times=1) asc.mrg_sort(dst, sort_list, mrg_sort4_info) mrg1, mrg2, mrg3, mrg4 = asc.get_mrg_sort_result() ``` -------------------------------- ### Main Execution and Configuration Source: https://context7.com/cann/pyasc/llms.txt Sets up the Ascend platform configuration, creates input tensors, launches the vector addition, and verifies the result. This function serves as the entry point for the example. ```python def main(): # Configure platform (use Model for simulator, NPU for hardware) config.set_platform(config.Backend.Model, config.Platform.Ascend910B1) device = "cpu" # "npu" for actual hardware size = 8 * 2048 x = torch.rand(size, dtype=torch.float32, device=device) y = torch.rand(size, dtype=torch.float32, device=device) z = vadd_launch(x, y) assert torch.allclose(z, x + y) logging.info("Vector add completed successfully!") if __name__ == "__main__": main() ``` -------------------------------- ### Python Function Docstring Example Source: https://github.com/cann/pyasc/blob/master/docs/API_docstring_generation_tool_guide.md Example of a Python function docstring following Google or NumPy style, including C++ API prototypes, parameter descriptions, and usage examples with different mask modes and count parameters. ```python def add(dst: LocalTensor, src0: LocalTensor, src1: LocalTensor, *args, **kwargs) -> None: """ 按元素求和。 **对应的Ascend C函数原型** .. code-block:: c++ template __aicore__ inline void Add(const LocalTensor& dst, const LocalTensor& src0, const LocalTensor& src1, const int32_t& count); .. code-block:: c++ template __aicore__ inline void Add(const LocalTensor& dst, const LocalTensor& src0, const LocalTensor& src1, uint64_t mask[], const uint8_t repeatTimes, const BinaryRepeatParams& repeatParams); .. code-block:: c++ template __aicore__ inline void Add(const LocalTensor& dst, const LocalTensor& src0, const LocalTensor& src1, uint64_t mask, const uint8_t repeatTimes, const BinaryRepeatParams& repeatParams); **参数说明** - is_set_mask:是否在接口内部设置mask模式和mask值。 - dst: 目的操作数。类型为LocalTensor,支持的TPosition为VECIN/VECCALC/VECOUT。 - src0, src1: 源操作数。类型为LocalTensor,支持的TPosition为VECIN/VECCALC/VECOUT。 - count: 参与计算的元素个数。 - mask: 用于控制每次迭代内参与计算的元素。 - repeat_times: 重复迭代次数。 - params: 控制操作数地址步长的参数。 **返回值说明**(若无则无需补充) ... **调用示例** - tensor高维切分计算样例-mask连续模式 .. code-block:: python mask = 128 # repeat_times = 4,一次迭代计算128个数,共计算512个数 # dst_blk_stride, src0_blk_stride, src1_blk_stride = 1,单次迭代内数据连续读取和写入 # dst_rep_stride, src0_rep_stride, src1_rep_stride = 8,相邻迭代间数据连续读取和写入 params = asc.BinaryRepeatParams(1, 1, 1, 8, 8, 8) asc.add(dst, src0, src1, mask=mask, repeat_times=4, repeat_params=params) - tensor高维切分计算样例-mask逐bit模式 .. code-block:: python mask = [uint64_max, uint64_max] # repeat_times = 4,一次迭代计算128个数,共计算512个数 # dst_blk_stride, src0_blk_stride, src1_blk_stride = 1,单次迭代内数据连续读取和写入 # dst_rep_stride, src0_rep_stride, src1_rep_stride = 8,相邻迭代间数据连续读取和写入 params = asc.BinaryRepeatParams(1, 1, 1, 8, 8, 8) asc.add(dst, src0, src1, mask=mask, repeat_times=4, repeat_params=params) - tensor前n个数据计算样例 .. code-block:: python asc.add(dst, src0, src1, count=512) """ builder = global_builder.get_ir_builder() op_impl("add", dst, src0, src1, args, kwargs, builder.create_asc_AddL0Op, builder.create_asc_AddL1Op, builder.create_asc_AddL2Op) ``` -------------------------------- ### Execute matmul_cube_only Sample Source: https://github.com/cann/pyasc/blob/master/python/tutorials/05_matmul_leakyrelu/README.md Example of executing the matmul_cube_only sample with 'Model' run mode and a placeholder for the SOC version. Replace Ascendxxxyy with your actual AI processor model. ```bash python3 matmul_cube_only.py -r Model -v Ascendxxxyy ``` -------------------------------- ### Define and Launch a JIT Compiled Kernel Source: https://context7.com/cann/pyasc/llms.txt Demonstrates defining a JIT compiled kernel for vector addition and launching it with specified core count and stream. Ensure Ascend NPU platform is configured. ```python import asc import torch import asc.runtime.config as config import asc.lib.runtime as rt # Basic JIT function definition @asc.jit def vector_add_kernel(x: asc.GlobalAddress, y: asc.GlobalAddress, z: asc.GlobalAddress, length: int): offset = asc.get_block_idx() * length x_gm = asc.GlobalTensor() y_gm = asc.GlobalTensor() z_gm = asc.GlobalTensor() x_gm.set_global_buffer(x + offset, length) y_gm.set_global_buffer(y + offset, length) z_gm.set_global_buffer(z + offset, length) x_local = asc.LocalTensor(x.dtype, asc.TPosition.VECIN, 0, length) y_local = asc.LocalTensor(y.dtype, asc.TPosition.VECIN, length * x.dtype.sizeof(), length) z_local = asc.LocalTensor(z.dtype, asc.TPosition.VECOUT, 2 * length * x.dtype.sizeof(), length) asc.data_copy(x_local, x_gm, length) asc.data_copy(y_local, y_gm, length) asc.set_flag(asc.HardEvent.MTE2_V, 0) asc.wait_flag(asc.HardEvent.MTE2_V, 0) asc.add(z_local, x_local, y_local, length) asc.set_flag(asc.HardEvent.V_MTE3, 0) asc.wait_flag(asc.HardEvent.V_MTE3, 0) asc.data_copy(z_gm, z_local, length) # JIT with options @asc.jit(always_compile=True) def my_kernel_with_options(x: asc.GlobalAddress, y: asc.GlobalAddress): pass # Launch kernel with core count and stream def launch_add(x: torch.Tensor, y: torch.Tensor) -> torch.Tensor: config.set_platform(config.Backend.NPU) z = torch.zeros_like(x) core_num = 8 block_length = z.numel() // core_num vector_add_kernel[core_num, rt.current_stream()](x, y, z, block_length) return z ``` -------------------------------- ### Configure ASC Runtime Backend and Platform Source: https://context7.com/cann/pyasc/llms.txt Sets the execution backend (simulator or NPU) and specifies the target platform, including SoC version and device ID for multi-NPU systems. Also demonstrates checking runtime availability and getting the current stream and platform. ```python import asc.runtime.config as config import asc.lib.runtime as rt # Set execution backend # Backend.Model - Use simulator (no NPU hardware required) # Backend.NPU - Use actual NPU hardware config.set_platform(config.Backend.Model) # Simulator mode config.set_platform(config.Backend.NPU) # NPU mode # Specify target platform (SoC version) config.set_platform( config.Backend.Model, soc_version=config.Platform.Ascend910B1 ) # Available platforms platforms = [ config.Platform.Ascend910B1, config.Platform.Ascend910B2, config.Platform.Ascend910B3, config.Platform.Ascend910B4, ] # Set device ID for multi-NPU systems config.set_platform( config.Backend.NPU, device_id=0 ) # Check runtime availability is_available = rt.is_available() # Get current stream for kernel launch stream = rt.current_stream() # Get current platform current = rt.current_platform() ``` -------------------------------- ### Install Python Dependencies for pyasc Source: https://github.com/cann/pyasc/blob/master/docs/quick_start.md Install Python dependencies required for building and running pyasc. Use requirements-build.txt for build-time dependencies and requirements-runtime.txt for run-time dependencies. ```bash python3 -m pip install -r requirements-build.txt # build-time dependencies python3 -m pip install -r requirements-runtime.txt # run-time dependencies ``` -------------------------------- ### PairReduceSum Python Example (Bitwise Mask) Source: https://github.com/cann/pyasc/blob/master/docs/python-api/language/generated/asc.language.basic.pair_reduce_sum.md Example demonstrating the use of PairReduceSum with a bitwise mask in Python. This allows for fine-grained control over which specific pairs of elements are summed. ```python x_local = asc.LocalTensor(dtype=asc.float16, pos=asc.TPosition.VECIN, addr=0, tile_size=512) z_local = asc.LocalTensor(dtype=asc.float16, pos=asc.TPosition.VECOUT, addr=0, tile_size=512) uint64_max = 2**64 - 1 mask = [uint64_max, uint64_max] asc.pair_reduce_sum(z_local, x_local, repeat_time=2, mask=mask, dst_rep_stride=1, src_blk_stride=1, src_rep_stride=8) ``` -------------------------------- ### PairReduceSum Python Example (Continuous Mask) Source: https://github.com/cann/pyasc/blob/master/docs/python-api/language/generated/asc.language.basic.pair_reduce_sum.md Example demonstrating the use of PairReduceSum with a continuous mask in Python. This is suitable for scenarios where a fixed number of initial elements should be summed. ```python x_local = asc.LocalTensor(dtype=asc.float16, pos=asc.TPosition.VECIN, addr=0, tile_size=512) z_local = asc.LocalTensor(dtype=asc.float16, pos=asc.TPosition.VECOUT, addr=0, tile_size=512) asc.pair_reduce_sum(z_local, x_local, repeat_time=2, mask=128, dst_rep_stride=1, src_blk_stride=1, src_rep_stride=8) ``` -------------------------------- ### Python mrg_sort4 Function Call Example Source: https://github.com/cann/pyasc/blob/master/docs/python-api/language/generated/asc.language.basic.mrg_sort4.md Example demonstrating how to call the asc.mrg_sort4 function. It shows the creation of MrgSortSrcList and MrgSort4Info objects, and the subsequent call to the sorting function. ```python # vconcat_work_local为已经创建并且完成排序的4个Region Proposals,每个Region Proposal数目是16个 src_list = asc.MrgSortSrcList(vconcat_work_local[0], vconcat_work_local[1], vconcat_work_local[2], vconcat_work_local[3]) element_lengths = [16, 16, 16, 16] src_info = asc.MrgSort4Info(element_lengths, False, 15, 1) asc.mrg_sort4(dst_local, src_list, src_info) ``` -------------------------------- ### Download and Extract LLVM Precompiled Binaries Source: https://github.com/cann/pyasc/blob/master/docs/quick_start.md Download and extract LLVM precompiled binaries. Choose the appropriate command based on your system architecture (e.g., ARM or x86). Set the LLVM_INSTALL_PREFIX environment variable to the extracted directory. ```bash # Example: Download LLVM precompiled package for ARM architecture wget https://cann-ai.obs.cn-north-4.myhuaweicloud.com/llvm/LLVM-19.1.7-aarch64.tar.xz tar -xJf LLVM-19.1.7-aarch64.tar.xz export LLVM_INSTALL_PREFIX=$PWD/LLVM-19.1.7-aarch64 # Example: Download LLVM precompiled package for X86 architecture wget https://cann-ai.obs.cn-north-4.myhuaweicloud.com/llvm/llvm-19.1.7-x86_64.tar.xz tar -xJf llvm-19.1.7-x86_64.tar.xz export LLVM_INSTALL_PREFIX=$PWD/llvm-19.1.7-x86_64 ``` -------------------------------- ### Python LocalMemAllocator.alloc Usage Example Source: https://github.com/cann/pyasc/blob/master/docs/python-api/language/generated/asc.language.core.LocalMemAllocator.alloc.md Example of allocating LocalTensor objects using the Python API. The first allocation specifies a constant tile size, while the second uses a variable. ```python allocator = asc.LocalMemAllocator() # 用户指定逻辑位置 VECIN,float 类型,Tensor 中有 1024 个元素 tensor1 = allocator.alloc(asc.TPosition.VECIN, float, 1024) # 用户指定逻辑位置 VECIN,float 类型,Tensor 中有 tileLength 个元素 tile_length = 512 tensor2 = allocator.alloc(asc.TPosition.VECIN, float, tile_length) ``` -------------------------------- ### Python Example: Shift Left First N Elements Source: https://github.com/cann/pyasc/blob/master/docs/python-api/language/generated/asc.language.basic.shift_left.md Example demonstrating the `asc.shift_left` function in Python to perform a left shift operation on the first N elements of a tensor. This is useful for partial tensor operations. ```python asc.shift_left(dst, src, scalar, count=512) ``` -------------------------------- ### Python Example: Shift Left with Mask (Continuous Mode) Source: https://github.com/cann/pyasc/blob/master/docs/python-api/language/generated/asc.language.basic.shift_left.md Example demonstrating the `asc.shift_left` function in Python for tensor high-dimensional splitting with a continuous mask mode. It specifies repeat times and stride parameters for iterative computation. ```python mask = 128 scalar = 2 # repeat_times = 4,一次迭代计算128个数,共计算512个数 # dst_blk_stride, src_blk_stride = 1,单次迭代内数据连续读取和写入 # dst_rep_stride, src_rep_stride = 8,相邻迭代间数据连续读取和写入 params = asc.UnaryRepeatParams(1, 1, 8, 8) asc.shift_left(dst, src, scalar, mask=mask, repeat_times=4, repeat_params=params) ``` -------------------------------- ### Python Example for MultiCoreMatmulTiling Configuration Source: https://github.com/cann/pyasc/blob/master/docs/python-api/lib/generated/asc.lib.host.MultiCoreMatmulTiling.set_single_range.md This Python code demonstrates how to configure and use the MultiCoreMatmulTiling class, including setting the single core ranges. Ensure all necessary imports and platform initializations are done before calling this method. ```python import asc.lib.host as host ascendc_platform = host.get_ascendc_platform() tiling = host.MultiCoreMatmulTiling(ascendc_platform) tiling.set_dim(use_core_nums) tiling.set_a_type(host.TPosition.GM, host.CubeFormat.ND, host.DataType.DT_FLOAT16) tiling.set_b_type(host.TPosition.GM, host.CubeFormat.ND, host.DataType.DT_FLOAT16) tiling.set_c_type(host.TPosition.GM, host.CubeFormat.ND, host.DataType.DT_FLOAT) tiling.set_bias_type(host.TPosition.GM, host.CubeFormat.ND, host.DataType.DT_FLOAT) tiling.set_shape(1024, 1024, 1024) tiling.set_single_range(1024, 1024, 1024, 1024, 1024, 1024) # 设置single_core_m/single_core_n/single_core_k的最大值与最小值 tiling.set_org_shape(1024, 1024, 1024) tiling.set_bias(True) tiling.set_buffer_space(-1, -1, -1) tiling_data = host.TCubeTiling() ret = tiling.get_tiling(tiling_data) ``` -------------------------------- ### ReduceMax API call - mask bitwise mode (Python) Source: https://github.com/cann/pyasc/blob/master/docs/python-api/language/generated/asc.language.basic.reduce_max.md Example Python call to the asc.reduce_max function for high-dimension split computation using a bitwise mask. This example demonstrates setting a full mask and requesting the index of the maximum value. ```Python uint64_max = 2**64 - 1 mask = [uint64_max, uint64_max] asc.reduce_max(dst, src, shared_tmp_buffer=shared_tmp, mask=mask, repeat_time=65, src_rep_stride=8, cal_index=True) ``` -------------------------------- ### C++ Include Ordering Example Source: https://github.com/cann/pyasc/blob/master/docs/codestyle.md Include statements should be ordered by category: local project, MLIR/LLVM/Clang, and standard library. Within each category, sort alphabetically. Ensure empty lines separate these categories. ```cpp #include "ascir/Dialect/EmitAsc/IR/EmitAsc.h" #include "ascir/Target/Asc/Utils.h" #include "mlir/IR/Builders.h" #include "mlir/IR/DialectImplementation.h" #include "llvm/ADT/TypeSwitch.h" #include #include ``` -------------------------------- ### Python Example: Shift Left with Mask (Bit Mode) Source: https://github.com/cann/pyasc/blob/master/docs/python-api/language/generated/asc.language.basic.shift_left.md Example demonstrating the `asc.shift_left` function in Python for tensor high-dimensional splitting with a bitwise mask mode. It uses a list of uint64_max values for the mask and specifies repeat times and stride parameters. ```python mask = [uint64_max, uint64_max] scalar = 2 # repeat_times = 4,一次迭代计算128个数,共计算512个数 # dst_blk_stride, src_blk_stride = 1,单次迭代内数据连续读取和写入 # dst_rep_stride, src_rep_stride = 8,相邻迭代间数据连续读取和写入 params = asc.UnaryRepeatParams(1, 1, 8, 8) asc.shift_left(dst, src, scalar, mask=mask, repeat_times=4, repeat_params=params) ``` -------------------------------- ### Python Example for Matmul.set_workspace Usage Source: https://github.com/cann/pyasc/blob/master/docs/python-api/language/generated/asc.language.adv.Matmul.set_workspace.md This Python code demonstrates the typical workflow for setting up and executing a matrix multiplication using Ascend C, including registering the operation, setting the workspace, input tensors, and bias, followed by iteration and retrieving the result tensor. ```python asc.adv.register_matmul(pipe, workspace, mm, tiling) mm.set_workspace(workspace_gm) mm.set_tensor_a(gm_a) mm.set_tensor_b(gm_b) mm.set_bias(gm_bias) mm.iterate(sync=True) for i in range(single_corem // base_m * single_core_n // base_n): mm.get_tensor_c(tensor=gm_c, sync=False) ``` -------------------------------- ### Python Example for asc.adv.Sign Operator Source: https://github.com/cann/pyasc/blob/master/docs/python-api/language/generated/asc.language.adv.sign.md This Python code demonstrates how to call the asc.adv.Sign operator, including buffer initialization and tensor allocation. Ensure buffer_size is obtained from Host-side tiling parameters. ```python pipe = asc.Tpipe() tmp_que = asc.TQue(asc.TPosition.VECCALC, 1) pipe.init_buffer(que=tmp_que, num=1, len=buffer_size) # buffer_size 通过Host侧tiling参数获取 shared_tmp_buffer = tmp_que.alloc_tensor(asc.uint8) # 输入tensor长度为1024,算子输入的数据类型为half,实际计算个数为512 asc.adv.Sign(dst, src, count=512, temp_buffer=shared_tmp_buffer) ``` -------------------------------- ### MatmulApiTiling Configuration Methods Source: https://github.com/cann/pyasc/blob/master/docs/python-api/lib/host.md This section covers methods for configuring matrix multiplication tiling, including setting bias, layout, data types, and batch information. ```APIDOC ## MatmulApiTiling.enable_bias ### Description Sets whether Bias participates in the computation. The setting must be consistent with the Kernel side. ### Method `enable_bias(self, is_bias_in)` ### Parameters - **is_bias_in** (bool) - Description of the bias participation setting. ### Response Example (No specific response example provided in the source text) ``` ```APIDOC ## MatmulApiTiling.get_base_k ### Description Retrieves the baseK value calculated by Tiling. ### Method `get_base_k(self)` ### Response Example (No specific response example provided in the source text) ``` ```APIDOC ## MatmulApiTiling.get_base_m ### Description Retrieves the baseM value calculated by Tiling. ### Method `get_base_m(self)` ### Response Example (No specific response example provided in the source text) ``` ```APIDOC ## MatmulApiTiling.get_base_n ### Description Retrieves the baseN value calculated by Tiling. ### Method `get_base_n(self)` ### Response Example (No specific response example provided in the source text) ``` ```APIDOC ## MatmulApiTiling.get_tiling ### Description Retrieves the Tiling parameters. ### Method `get_tiling(self, tiling)` ### Parameters - **tiling** - The tiling parameter to retrieve. ### Response Example (No specific response example provided in the source text) ``` ```APIDOC ## MatmulApiTiling.set_a_layout ### Description Sets the layout axis information for matrix A, including B, S, N, G, D axes. For BSNGD, SBNGD, BNGS1S2 layout formats, this interface must be called in the Host side Tiling implementation before calling the IterateBatch interface to set the layout axis information for matrix A. ### Method `set_a_layout(self, b, s, n, g, d)` ### Parameters - **b** - Description of the B axis setting. - **s** - Description of the S axis setting. - **n** - Description of the N axis setting. - **g** - Description of the G axis setting. - **d** - Description of the D axis setting. ### Response Example (No specific response example provided in the source text) ``` ```APIDOC ## MatmulApiTiling.set_a_type ### Description Sets the position, data format, data type, and transpose information for matrix A. This information needs to be consistent with the settings on the kernel side. ### Method `set_a_type(self, pos, type, ...)` ### Parameters - **pos** - The position of matrix A. - **type** - The data type of matrix A. - **...** - Additional parameters related to data format and transpose. ### Response Example (No specific response example provided in the source text) ``` ```APIDOC ## MatmulApiTiling.set_b_layout ### Description Sets the layout axis information for matrix B, including B, S, N, G, D axes. For BSNGD, SBNGD, BNGS1S2 layout formats, this interface must be called in the Host side Tiling implementation before calling the IterateBatch interface to set the layout axis information for matrix B. ### Method `set_b_layout(self, b, s, n, g, d)` ### Parameters - **b** - Description of the B axis setting. - **s** - Description of the S axis setting. - **n** - Description of the N axis setting. - **g** - Description of the G axis setting. - **d** - Description of the D axis setting. ### Response Example (No specific response example provided in the source text) ``` ```APIDOC ## MatmulApiTiling.set_b_type ### Description Sets the position, data format, data type, and transpose information for matrix B. This information needs to be consistent with the settings on the kernel side. ### Method `set_b_type(self, pos, type, ...)` ### Parameters - **pos** - The position of matrix B. - **type** - The data type of matrix B. - **...** - Additional parameters related to data format and transpose. ### Response Example (No specific response example provided in the source text) ``` ```APIDOC ## MatmulApiTiling.set_batch_info_for_normal ### Description Sets the M/N/K axis information for matrices A/B, and the Batch count for matrices A/B. For NORMAL layout types, this interface must be called in the Host side Tiling implementation before calling the IterateBatch or IterateNBatch interfaces to set information such as the M/N/K axes for matrices A/B. ### Method `set_batch_info_for_normal(...)` ### Parameters - **...** - Parameters for M/N/K axis information and Batch counts for matrices A/B. ### Response Example (No specific response example provided in the source text) ``` ```APIDOC ## MatmulApiTiling.set_batch_num ### Description Sets the maximum batch count for multi-batch computation. The maximum batch count is the maximum of batchA of matrix A and batchB of matrix B. This interface must be called in the Host side Tiling implementation before calling the IterateBatch interface to set the batch count for multi-batch computation. ### Method `set_batch_num(self, batch)` ### Parameters - **batch** (int) - The maximum batch count for multi-batch computation. ### Response Example (No specific response example provided in the source text) ``` ```APIDOC ## MatmulApiTiling.set_bias_type ### Description Sets the position, data format, and data type information for Bias. This information needs to be consistent with the settings on the kernel side. ### Method `set_bias_type(self, pos, ...)` ### Parameters - **pos** - The position of Bias. - **...** - Additional parameters related to data format and type. ### Response Example (No specific response example provided in the source text) ``` ```APIDOC ## MatmulApiTiling.set_buffer_space ### Description Sets the available L1 Buffer/L0C Buffer/Unified Buffer/BiasTable Buffer space for Matmul computation, in bytes. ### Method `set_buffer_space(self, ...)` ### Parameters - **...** - Parameters for specifying buffer space sizes. ### Response Example (No specific response example provided in the source text) ``` ```APIDOC ## MatmulApiTiling.set_c_layout ### Description Sets the layout axis information for matrix C, including B, S, N, G, D axes. For BSNGD, SBNGD, BNGS1S2 layout formats, this interface must be called in the Host side Tiling implementation before calling the IterateBatch interface to set the layout axis information for matrix C. ### Method `set_c_layout(self, b, s, n, g, d)` ### Parameters - **b** - Description of the B axis setting. - **s** - Description of the S axis setting. - **n** - Description of the N axis setting. - **g** - Description of the G axis setting. - **d** - Description of the D axis setting. ### Response Example (No specific response example provided in the source text) ``` ```APIDOC ## MatmulApiTiling.set_c_type ### Description Sets the position, data format, data type, and transpose information for matrix C. This information needs to be consistent with the settings on the kernel side. ### Method `set_c_type(self, pos, type, ...)` ### Parameters - **pos** - The position of matrix C. - **type** - The data type of matrix C. - **...** - Additional parameters related to data format and transpose. ### Response Example (No specific response example provided in the source text) ``` -------------------------------- ### Unary Operator Syntax Example Source: https://github.com/cann/pyasc/blob/master/docs/python_syntax_support.md Illustrates the use of a unary operator (bitwise NOT). ```python @asc.jit def func_visit_unary_op(a): return a + ~1 ```