### Audio Transforms Setup Source: https://github.com/mindspore-ai/docs/blob/master/tutorials/source_zh_cn/dataset/eager.ipynb Initializes necessary libraries for audio transformation examples in Eager mode. Supports numpy.array as input. ```python import numpy as np import matplotlib.pyplot as plt import scipy.io.wavfile as wavfile from download import download import mindspore.dataset as ds import mindspore.dataset.audio as audio ds.config.set_seed(5) ``` -------------------------------- ### Ray Deployment - Master Node Source: https://github.com/mindspore-ai/docs/blob/master/docs/vllm_mindspore/docs/source_zh_cn/user_guide/supported_features/parallel/parallel.md Example command to start the master node using the Ray backend for distributed deployment. This simplifies the setup compared to multiprocess by managing worker processes. ```bash # 主节点: vllm-mindspore serve MindSpore-Lab/DeepSeek-R1-0528-A8W8 --trust-remote-code --max-num-seqs=256 --max-model-len=32768 --max-num-batched-tokens=4096 --block-size=128 --gpu-memory-utilization=0.9 --tensor-parallel-size 4 --data-parallel-size 4 --data-parallel-size-local 2 --enable-expert-parallel --additional-config '{"expert_parallel": 4}' --data-parallel-backend ray --quantization ascend ``` -------------------------------- ### vLLM Service Startup Output Source: https://github.com/mindspore-ai/docs/blob/master/docs/vllm_mindspore/docs/source_en/getting_started/tutorials/qwen2.5_32b_multiNPU/qwen2.5_32b_multiNPU.md Example log output indicating that the vLLM service has started successfully and is ready to accept requests. ```text INFO: Started server process [6363] INFO: Waiting for application startup. INFO: Application startup complete. ``` -------------------------------- ### Launch Multi-Node Qwen3 MOE with Expert Parallel using Ray Source: https://github.com/mindspore-ai/docs/blob/master/docs/vllm_mindspore/docs/source_zh_cn/user_guide/supported_features/parallel/parallel.md Example command for a two-node, four-GPU Qwen3 MOE service using Ray for multi-node expert parallel. This requires Ray environment setup and specifies the data parallel backend as Ray. ```bash vllm-mindspore serve /path/to/Qwen3-MOE --trust-remote-code --enable-expert-parallel --additional-config '{"expert_parallel": 8} --data-parallel-backend ray' ``` -------------------------------- ### Run Distributed Training Script Source: https://github.com/mindspore-ai/docs/blob/master/tutorials/source_en/parallel/multiple_mixed.md Execute a distributed training script using the msrun startup method. This example shows how to start an eight-card distributed training session. ```bash bash run_sapp_mix_train.sh ``` -------------------------------- ### Multiprocess Deployment - Master Node Source: https://github.com/mindspore-ai/docs/blob/master/docs/vllm_mindspore/docs/source_zh_cn/user_guide/supported_features/parallel/parallel.md Example command to start the master node for a multiprocess deployment. Ensure `data-parallel-address` and `data-parallel-rpc-port` are set to your environment's actual values. ```bash # 主节点: vllm-mindspore serve MindSpore-Lab/DeepSeek-R1-0528-A8W8 --trust-remote-code --max-num-seqs=256 --max-model-len=32768 --max-num-batched-tokens=4096 --block-size=128 --gpu-memory-utilization=0.9 --tensor-parallel-size 4 --data-parallel-size 4 --data-parallel-size-local 2 --data-parallel-start-rank 0 --data-parallel-address 127.0.0.1 --data-parallel-rpc-port 29550 --enable-expert-parallel --additional-config '{"expert_parallel": 4}' --quantization ascend ``` -------------------------------- ### Launch Single-Node Qwen3 MOE with Expert Parallel Source: https://github.com/mindspore-ai/docs/blob/master/docs/vllm_mindspore/docs/source_zh_cn/user_guide/supported_features/parallel/parallel.md Example command to start a single-node, eight-GPU Qwen3 MOE service with expert parallel enabled. This command specifies the model path, enables trust remote code, and configures EP with 8 workers. ```bash vllm-mindspore serve /path/to/Qwen3-MOE --trust-remote-code --enable-expert-parallel --additional-config '{"expert_parallel": 8}' ``` -------------------------------- ### Install CANN Toolkit Source: https://github.com/mindspore-ai/docs/blob/master/tutorials/source_en/orange_pi/environment_setup.md Execute the CANN toolkit installation command. You will be prompted to confirm the installation by typing 'Y' and pressing Enter. The installation process may take 10-15 minutes. ```bash (base) root@orangepiaipro: /home/HwHiAiUser/Downloads#./Ascend-cann-toolkit_8.3.RC1_linux-aarch64.run --install ``` -------------------------------- ### Directory Structure for Startup Method Example Source: https://github.com/mindspore-ai/docs/blob/master/tutorials/source_en/parallel/mpirun.md This illustrates the typical directory layout for a project using mpirun for startup, including the main script, hostfile, and execution scripts. ```text └─ sample_code ├─ startup_method ├── net.py ├── hostfile ├── run_mpirun_1.sh ├── run_mpirun_2.sh ... ``` -------------------------------- ### Check scikit-learn Installation Source: https://github.com/mindspore-ai/docs/blob/master/docs/mindquantum/docs/source_en/case_library/classification_of_iris_by_qnn.ipynb Use this command to verify if the scikit-learn library is already installed on your system. No setup is required. ```bash pip show scikit-learn ``` -------------------------------- ### Compile and Install from Source Source: https://github.com/mindspore-ai/docs/blob/master/docs/mindformers/docs/source_en/guide/evaluation.md Follow these steps to clone the repository, navigate to the directory, and install from source. This method is useful for debugging and analysis. ```bash git clone --depth 1 -b v0.4.4 https://github.com/EleutherAI/lm-evaluation-harness cd lm-evaluation-harness pip install -e . ``` -------------------------------- ### Configure CANN Environment Variables Source: https://github.com/mindspore-ai/docs/blob/master/docs/vllm_mindspore/docs/source_en/getting_started/installation/installation.md Set the default installation path for CANN and source the environment setup script. This is required after CANN installation. ```bash LOCAL_ASCEND=/usr/local/Ascend # the root directory of run package source ${LOCAL_ASCEND}/ascend-toolkit/set_env.sh export ASCEND_CUSTOM_PATH=${LOCAL_ASCEND}/ascend-toolkit ``` -------------------------------- ### Directory Structure for Startup Method Samples Source: https://github.com/mindspore-ai/docs/blob/master/tutorials/source_en/parallel/rank_table.md This text-based representation shows the directory structure for sample code related to the rank table startup method. It includes Python scripts for network definition and training, shell scripts for execution, and JSON files for rank table configurations for different numbers of cards and cluster setups. ```text └─ sample_code ├─ startup_method ├── net.py ├── rank_table_8pcs.json ├── rank_table_16pcs.json ├── rank_table_cross_cluster_16pcs.json ├── run_rank_table.sh ├── run_rank_table_cluster.sh ├── run_rank_table_cross_cluster.sh ... ``` -------------------------------- ### Initialize Ascend Backend Source: https://github.com/mindspore-ai/docs/blob/master/docs/lite/cloud_docs/source_en/mindir/runtime_java.md Set up the Ascend backend for execution. This involves initializing MSContext and adding AscendDeviceInfo. Use this when leveraging Ascend NPUs for inference. ```java MSContext context = new MSContext(); context.init(); context.addDeviceInfo(DeviceType.DT_ASCEND, false, 0); ``` -------------------------------- ### Tensor Storage Offset Source: https://github.com/mindspore-ai/docs/blob/master/tutorials/source_en/custom_program/operation/cpp_api_for_custom_ops.md Get the offset from the start of the tensor's storage. ```APIDOC ## GET /tensor/storage_offset ### Description Retrieves the storage offset of the tensor. ### Method GET ### Endpoint /tensor/storage_offset ### Parameters None ### Request Example None ### Response #### Success Response (200) - **storage_offset** (integer) - The offset from the start of storage (in terms of elements). #### Response Example ```json { "storage_offset": 0 } ``` ``` -------------------------------- ### Start Online Inference (Single-Card) Source: https://github.com/mindspore-ai/docs/blob/master/docs/vllm_mindspore/docs/source_en/user_guide/supported_features/benchmark/benchmark.md Start online inference for a single-card setup using the vllm-mindspore serve command with a specified model. Ensure the environment variable VLLM_MS_MODEL_BACKEND is set. ```bash vllm-mindspore serve Qwen/Qwen2.5-7B-Instruct --device auto --disable-log-requests ``` -------------------------------- ### Setup Loss Function and Optimizer Source: https://github.com/mindspore-ai/docs/blob/master/docs/mindflow/docs/source_en/data_driven/2D_steady.ipynb Prepares the loss scaler for Ascend, initializes the WaveletTransformLoss, and sets up the Adam optimizer. Learning rate scheduling is handled by get_warmup_cosine_annealing_lr. ```python # prepare loss scaler if use_ascend: from mindspore.amp import DynamicLossScaler, all_finite, init_status loss_scaler = DynamicLossScaler(1024, 2, 100) else: loss_scaler = None steps_per_epoch = train_dataset.get_dataset_size() wave_loss = WaveletTransformLoss(wave_level=optimizer_params['wave_level']) problem = SteadyFlowWithLoss(model, loss_fn=wave_loss) # prepare optimizer epochs = optimizer_params["epochs"] lr = get_warmup_cosine_annealing_lr(lr_init=optimizer_params["lr"], last_epoch=epochs, steps_per_epoch=steps_per_epoch, warmup_epochs=1) optimizer = nn.Adam(model.trainable_params() + wave_loss.trainable_params(), learning_rate=Tensor(lr)) ``` -------------------------------- ### Successful Installation Output Example Source: https://github.com/mindspore-ai/docs/blob/master/docs/mindformers/docs/source_en/installation.md This output indicates that all checks have passed and the environment is correctly set up for MindSpore Transformers. ```text - INFO - All checks passed, used **** seconds, the environment is correctly set up! ``` -------------------------------- ### Execute MindIE Startup Script Source: https://github.com/mindspore-ai/docs/blob/master/docs/mindformers/docs/source_en/guide/deployment.md Use this command to start MindIE with default or specified model configurations. Ensure you are in the scripts directory. ```shell cd ./scripts bash run_mindie.sh --model-name qwen1_5_72b --model-path /path/to/mf_model/qwen1_5_72b ``` -------------------------------- ### Complete Custom Operator Usage Example Source: https://github.com/mindspore-ai/docs/blob/master/tutorials/source_en/custom_program/operation/op_custom_prim.rst This comprehensive example shows the full setup for a custom sine operator using the pyfunc pattern, including context setting, function definitions, operator creation, and tensor computation. ```python import numpy as np import mindspore as ms from mindspore import ops ms.set_context(mode=ms.GRAPH_MODE) ms.set_device(device_target="CPU") def sin_by_numpy(x): return np.sin(x) def infer_shape(x): return x def infer_dtype(x): return x sin_by_numpy_op = ops.Custom(func=sin_by_numpy, out_shape=infer_shape, out_dtype=infer_dtype, func_type="pyfunc") input_tensor = ms.Tensor([0, 1, 0.2, 0.3, 0.4], dtype=ms.float32) result_cus = sin_by_numpy_op(input_tensor) print(result_cus) ``` -------------------------------- ### Start Scheduler Process on Node 1 Source: https://github.com/mindspore-ai/docs/blob/master/tutorials/source_en/parallel/dynamic_cluster.md Configure environment variables to set up the scheduler and then start the training script. Ensure the scheduler IP and port are correctly set. ```bash export MS_WORKER_NUM=8 # Set the total number of Worker processes in the cluster to 8 (including other node processes) export MS_SCHED_HOST= # Set the Scheduler IP address to the Node 1 IP address export MS_SCHED_PORT=8118 # Set the Scheduler port export MS_ROLE=MS_SCHED # Set the startup process to the MS_SCHED role python ./net.py > device/scheduler.log 2>&1 & # Start training script ``` -------------------------------- ### Start Online Inference (Multi-Card) Source: https://github.com/mindspore-ai/docs/blob/master/docs/vllm_mindspore/docs/source_en/user_guide/supported_features/benchmark/benchmark.md Start online inference for a multi-card setup using the vllm-mindspore serve command, specifying the tensor parallel size and max model length. The --trust_remote_code flag may be required. ```bash vllm-mindspore serve Qwen/Qwen2.5-32B-Instruct --trust_remote_code --tensor-parallel-size $TENSOR_PARALLEL_SIZE --max-model-len $MAX_MODEL_LEN ``` -------------------------------- ### Start Multi-Machine Training with mpirun --hostfile Source: https://github.com/mindspore-ai/docs/blob/master/tutorials/source_en/parallel/mpirun.md Launches distributed training using a hostfile to specify machine configurations. Requires setting DATA_PATH and passing the hostfile path as an argument. ```bash export DATA_PATH=./MNIST_Data/train/ HOSTFILE=$1 mpirun -n 16 --hostfile $HOSTFILE --output-filename log_output --merge-stderr-to-stdout python net.py ``` -------------------------------- ### Import Common MindSpore Packages Source: https://github.com/mindspore-ai/docs/blob/master/tutorials/source_en/beginner/quick_start.md Import necessary packages and interfaces for MindSpore examples. Ensure 'download' is installed if running as a Notebook. ```python import mindspore from mindspore import nn from mindspore.dataset import vision, transforms from mindspore.dataset import MnistDataset ``` -------------------------------- ### Initial Image Transformation Setup Source: https://github.com/mindspore-ai/docs/blob/master/docs/mindspore/source_zh_cn/faq/data_processing.md This snippet shows an initial setup for image transformations, including adding a ToPIL operation if the input is not already a PIL Image. ```python transforms = [ py_vision.ToPIL(), py_vision.CenterCrop(375), py_vision.ToTensor() ] transform = mindspore.dataset.transforms.Compose(transforms) data1 = data1.map(operations=decode_op, input_columns=["image"]) data1 = data1.map(operations=transform, input_columns=["image"]) ``` -------------------------------- ### MindSpore SequentialSampler Example Source: https://github.com/mindspore-ai/docs/blob/master/docs/mindspore/source_en/note/api_mapping/pytorch_diff/SequentialSampler.md Illustrates sequential sampling with MindSpore's SequentialSampler, a custom dataset, and GeneratorDataset. Requires MindSpore installation. ```python import mindspore as ms from mindspore.dataset import SequentialSampler class MyMapDataset(): def __init__(self): super(MyMapDataset).__init__() self.data = [i for i in range(4)] def __getitem__(self, index): return self.data[index] def __len__(self): return len(self.data) ds = MyMapDataset() sampler = SequentialSampler() dataloader = ms.dataset.GeneratorDataset(ds, column_names=["data"], sampler=sampler) for data in dataloader: print(data) # Out: # [Tensor(shape=[], dtype=Int64, value= 0)] # [Tensor(shape=[], dtype=Int64, value= 1)] # [Tensor(shape=[], dtype=Int64, value= 2)] # [Tensor(shape=[], dtype=Int64, value= 3)] ``` -------------------------------- ### SSD Model Training Setup Source: https://github.com/mindspore-ai/docs/blob/master/tutorials/source_zh_cn/cv/ssd.ipynb Sets up the data loading, network definition, learning rate, and optimizer for SSD model training. This snippet demonstrates the initial configuration for training an SSD model. ```python import time from mindspore.amp import DynamicLossScaler set_seed(1) # load data mindrecord_dir = "./datasets/MindRecord_COCO" mindrecord_file = "./datasets/MindRecord_COCO/ssd.mindrecord0" dataset = create_ssd_dataset(mindrecord_file, batch_size=5, rank=0, use_multiprocessing=True) dataset_size = dataset.get_dataset_size() image, get_loc, gt_label, num_matched_boxes = next(dataset.create_tuple_iterator()) # Network definition and initialization network = SSD300Vgg16() init_net_param(network) # Define the learning rate lr = Tensor(get_lr(global_step=0 * dataset_size, lr_init=0.001, lr_end=0.001 * 0.05, lr_max=0.05, warmup_epochs=2, total_epochs=60, steps_per_epoch=dataset_size)) # Define the optimizer opt = nn.Momentum(filter(lambda x: x.requires_grad, network.get_parameters()), lr, 0.9, 0.00015, float(1024)) ``` -------------------------------- ### Single-Device Fine-tuning Example Source: https://github.com/mindspore-ai/docs/blob/master/docs/mindformers/docs/source_en/feature/start_tasks.md Execute this Python script for single-device fine-tuning. Ensure the paths to configuration and dataset are correctly specified. ```shell python run_mindformer.py \ --register_path research/qwen2_5 \ --config research/qwen2_5/finetune_qwen2_5_0_5b_8k.yaml \ --use_parallel False \ --run_mode finetune \ --train_dataset_dir ./path/alpaca-data.mindrecord ``` -------------------------------- ### Compile and Install MindSpore Quantum from Source Source: https://github.com/mindspore-ai/docs/blob/master/docs/mindquantum/docs/source_en/mindquantum_install.md Compile and install MindSpore Quantum locally after cloning the source code. ```bash cd ~/mindquantum python setup.py install --user ``` -------------------------------- ### PyTorch SequentialSampler Example Source: https://github.com/mindspore-ai/docs/blob/master/docs/mindspore/source_en/note/api_mapping/pytorch_diff/SequentialSampler.md Demonstrates sequential sampling using PyTorch's SequentialSampler with a custom dataset and DataLoader. Requires PyTorch installation. ```python import torch from torch.utils.data import SequentialSampler class MyMapDataset(torch.utils.data.Dataset): def __init__(self): super(MyMapDataset).__init__() self.data = [i for i in range(4)] def __getitem__(self, index): return self.data[index] def __len__(self): return len(self.data) ds = MyMapDataset() sampler = SequentialSampler(ds) dataloader = torch.utils.data.DataLoader(ds, sampler=sampler) for data in dataloader: print(data) # Out: # tensor([0]) # tensor([1]) # tensor([2]) # tensor([3]) ``` -------------------------------- ### TensorBoard Service Startup Message Source: https://github.com/mindspore-ai/docs/blob/master/docs/mindformers/docs/source_en/feature/monitor.md This message indicates that TensorBoard has started successfully. The version number, host, and port reflect your installation and configuration. ```shell TensorBoard 2.18.0 at http://0.0.0.0:6006/ (Press CTRL+C to quit) ``` -------------------------------- ### Initialize Environment Variables for MindIE and MindSpore Source: https://github.com/mindspore-ai/docs/blob/master/docs/mindformers/docs/source_en/guide/deployment.md Source these scripts to set up the necessary environment variables for Ascend, MindIE, and MindSpore. Ensure correct paths are used if not installed in default locations. Be mindful of potential port conflicts with MS_SCHED_PORT if other MindIE services are running. ```bash # Ascend source /usr/local/Ascend/ascend-toolkit/set_env.sh # MindIE source /usr/local/Ascend/mindie/latest/mindie-llm/set_env.sh source /usr/local/Ascend/mindie/latest/mindie-service/set_env.sh # MindSpore export LCAL_IF_PORT=8129 # Networking Configuration export MS_SCHED_HOST=127.0.0.1 # scheduler node IP address export MS_SCHED_PORT=8090 # Scheduler node service port ``` -------------------------------- ### PyTorch Inference with NPU Source: https://github.com/mindspore-ai/docs/blob/master/docs/mindformers/docs/source_zh_cn/example/finetune_with_glm4/finetune_with_glm4.md Example of loading a model and performing inference using PyTorch on an NPU device. Ensure PyTorch NPU adapter is installed. ```python from transformers import AutoModelForCausalLM, AutoTokenizer import torch import torch_npu # 导入PyTorch NPU适配库 # 加载模型和分词器 model_name = "/path/to/model" device = torch.device("npu:0") tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True).half().to(device) # 将模型设置为评估模式 model.eval() # 输入文本 input_text = "人工智能的未来发展" # 编码输入 input_ids = tokenizer.encode(input_text, return_tensors="pt").to(model.device) with torch.no_grad(): output = model.generate( input_ids, max_length=100, # 最大生成长度 num_return_sequences=1, # 返回的序列数 no_repeat_ngram_size=2, # 避免重复的n-gram # early_stopping=True # 提前停止 ) # 解码输出 generated_text = tokenizer.decode(output[0], skip_special_tokens=True) print("生成的文本:") print(generated_text) ``` -------------------------------- ### Muon Optimizer Configuration Source: https://github.com/mindspore-ai/docs/blob/master/docs/mindformers/docs/source_en/feature/training_hyperparameters.md Example YAML configuration for the Muon optimizer. This setup is suitable for large-scale deep learning tasks, particularly LLM training. ```yaml optimizer: type: Muon adamw_betas: [0.9, 0.95] adamw_eps: 1.e-8 weight_decay: 0.01 matched_adamw_rms: 0.2 qk_clip_threshold: 100 ``` -------------------------------- ### PyTorch RandomRotation Example Source: https://github.com/mindspore-ai/docs/blob/master/docs/mindspore/source_en/note/api_mapping/pytorch_diff/RandomRotation.md Applies a random rotation to an image using PyTorch's torchvision.transforms.RandomRotation. Ensure PyTorch and torchvision are installed. The output size is printed. ```python from download import download from PIL import Image url = "https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/notebook/datasets/flamingos.jpg" download(url, './flamingos.jpg', replace=True) orig_img = Image.open('flamingos.jpg') # PyTorch import torch import torchvision.transforms as T torch.manual_seed(1) affine_transfomer = T.RandomRotation(degrees=(30, 70), center=(0, 0)) img_torch = affine_transfomer(orig_img) print(img_torch.size) # Out: (471, 292) ``` -------------------------------- ### Launch Training with SWAP Configuration Source: https://github.com/mindspore-ai/docs/blob/master/docs/mindformers/docs/source_zh_cn/feature/memory_optimization.md Execute the training script using msrun_launcher.sh, specifying the path to the custom YAML configuration file. Ensure to replace '' with your local environment's IP address. ```bash export GLOG_v=1 export MS_MEMORY_STATISTIC=1 YAML_FILE=$1 # 用户指定YAML文件路径 ROOT_PATH=`pwd` bash ./scripts/msrun_launcher.sh "run_mindformer.py \ --config ${ROOT_PATH}/${YAML_FILE} \ --run_mode train \ --use_parallel True" \ 8 8 8118 0 output/msrun False 300 ``` -------------------------------- ### Instantiate and Get Dataset Source: https://github.com/mindspore-ai/docs/blob/master/tutorials/source_zh_cn/cv/fcn8s.ipynb Define parameters for dataset creation and model training, then instantiate the SegDataset class and retrieve the processed dataset. This setup is for a segmentation task. ```python # 定义创建数据集的参数 IMAGE_MEAN = [103.53, 116.28, 123.675] IMAGE_STD = [57.375, 57.120, 58.395] DATA_FILE = "dataset/dataset_fcn8s/mindname.mindrecord" # 定义模型训练参数 train_batch_size = 4 crop_size = 512 min_scale = 0.5 max_scale = 2.0 ignore_label = 255 num_classes = 21 # 实例化Dataset dataset = SegDataset(image_mean=IMAGE_MEAN, image_std=IMAGE_STD, data_file=DATA_FILE, batch_size=train_batch_size, crop_size=crop_size, max_scale=max_scale, min_scale=min_scale, ignore_label=ignore_label, num_classes=num_classes, num_readers=2, num_parallel_calls=4) dataset = dataset.get_dataset() ``` -------------------------------- ### Start MindIE with One-Click Script Source: https://github.com/mindspore-ai/docs/blob/master/docs/mindformers/docs/source_en/guide/deployment.md Execute the MindIE startup script from the scripts directory. Ensure you provide the model name and its path. ```shell cd ./scripts bash run_mindie.sh --model-name xxx --model-path /path/to/model ``` -------------------------------- ### Incorrect Core Binding Configuration Example Source: https://github.com/mindspore-ai/docs/blob/master/tutorials/source_en/parallel/msrun_launcher.md This example is incorrect because it omits configurations for the scheduler and device0, potentially leading to misassignment of core bindings. ```bash # wrong example --bind_core='{"device1":["20-29", "40-49"]}' ``` -------------------------------- ### Run Distributed Inference Example Source: https://github.com/mindspore-ai/docs/blob/master/docs/lite/cloud_docs/source_en/mindir/runtime_distributed_cpp.md Starts distributed inference by executing the compiled program for each rank. This example shows how to launch the executable for the Ascend backend, specifying the MindIR model path, rank ID, device ID, and a configuration file. It uses shell commands to run processes in the background. ```bash # for Ascend, run the executable file for each rank using shell commands ./build/ascend_ge_distributed /your/path/to/Matmul0.mindir 0 0 ./config_file.ini & ./build/ascend_ge_distributed /your/path/to/Matmul1.mindir 1 1 ./config_file.ini ``` -------------------------------- ### Setup Optimizer and Loss Function Source: https://github.com/mindspore-ai/docs/blob/master/docs/mindflow/docs/source_en/data_driven/burgers_KNO1D.ipynb Configures the training environment by defining the learning rate schedule, optimizer, and loss function. Includes conditional setup for mixed precision training on Ascend hardware. ```python train_size = train_dataset.get_dataset_size() eval_size = eval_dataset.get_dataset_size() lr = get_warmup_cosine_annealing_lr(lr_init=optimizer_params["lr"], last_epoch=optimizer_params["epochs"], steps_per_epoch=train_size, warmup_epochs=1) optimizer = nn.AdamWeightDecay(model.trainable_params(), learning_rate=Tensor(lr), weight_decay=optimizer_params["weight_decay"]) model.set_train() loss_fn = MSELoss() if use_ascend: from mindspore.amp import DynamicLossScaler, auto_mixed_precision, all_finite loss_scaler = DynamicLossScaler(1024, 2, 100) auto_mixed_precision(model, 'O3') else: loss_scaler = None ``` -------------------------------- ### Load Configuration and Setup Logging Source: https://github.com/mindspore-ai/docs/blob/master/docs/mindflow/docs/source_en/data_mechanism_fusion/phympgn.ipynb Loads configuration from a YAML file and sets up logging for training or inference. Ensure the configuration file path is correct and specify whether it's for training. ```python from mindflow.utils import log_config, load_yaml_config, print_log from easydict import EasyDict import os.path as osp from pathlib import Path def load_config(config_file_path, train): config = load_yaml_config(config_file_path) config['train'] = train config = EasyDict(config) log_dir = './logs' if train: log_file = f'phympgn-{config.experiment_name}' else: log_file = f'phympgn-{config.experiment_name}-te' if not osp.exists(osp.join(log_dir, f'{log_file}.log')): Path(osp.join(log_dir, f'{log_file}.log')).touch() log_config(log_dir, log_file) print_log(config) return config ``` ```python config_file_path = 'yamls/train.yaml' config = load_config(config_file_path=config_file_path, train=True) ``` -------------------------------- ### Create a Noisy Simulator with ChannelAdder Source: https://github.com/mindspore-ai/docs/blob/master/docs/mindquantum/docs/source_en/middle_level/noise_simulator.ipynb Demonstrates how to initialize a Simulator with a NoiseBackend and a sequence of adders to create a noisy quantum circuit. ```python from mindquantum.simulator import Simulator from mindquantum.simulator.noise import NoiseBackend noiseless_sim = Simulator('mqvector', 2) noiseless_circ = Circuit().h(0).rx(1.0, 1).z(1, 0).measure(1) display_svg(noiseless_circ.svg()) res1 = noiseless_sim.sampling(noiseless_circ, shots=10000) display(res1.svg()) ``` ```python noise_sim = Simulator(NoiseBackend('mqvector', 2, seq_adder)) res2 = noise_sim.sampling(noiseless_circ, shots=10000) display(res2.svg()) display(noise_sim.backend.transform_circ(noiseless_circ).svg()) ``` -------------------------------- ### Run vLLM-MindSpore with Ray Source: https://github.com/mindspore-ai/docs/blob/master/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/deepseek_parallel/deepseek_r1_671b_w8a8_dp4_tp4_ep4.md Example command to start the vLLM-MindSpore inference service using Ray for distributed execution. This command specifies model, parallelism settings, and quantization. ```bash vllm-mindspore serve MindSpore-Lab/DeepSeek-R1-0528-A8W8 --trust-remote-code --max-num-seqs=256 --max-model-len=32768 --max-num-batched-tokens=4096 --block-size=128 --gpu-memory-utilization=0.9 --tensor-parallel-size 4 --data-parallel-size 4 --data-parallel-size-local 2 --enable-expert-parallel --additional-config '{"expert_parallel": 4}' --data-parallel-backend ray --quantization ascend ``` -------------------------------- ### Configure and Create Datasets Source: https://github.com/mindspore-ai/docs/blob/master/docs/mindflow/docs/source_zh_cn/features/solve_pinns_by_mindflow.ipynb Sets up the configuration parameters for the geometry and sampling, then creates both the training and test datasets. The training dataset is batched for efficient processing. ```python in_disk = {"name": "in_disk", "center_x": 0.0, "center_y": 0.0, "radius": 1.0} out_disk = {"name": "out_disk", "center_x": 0.0, "center_y": 0.0, "radius": 2.0} domain = {"size": 8192, "random_sampling": True, "sampler": "uniform"} BC = {"size": 8192, "random_sampling": True, "sampler": "uniform", "with_normal": True} data = {"domain": domain, "BC": BC} config = {"in_disk": in_disk, "out_disk": out_disk, "data": data} # create training dataset dataset = create_training_dataset(config) train_dataset = dataset.batch(batch_size=8192) # create test dataset inputs, label = create_test_dataset(config) ``` -------------------------------- ### Instantiate Networks with Duplicate Parameters Source: https://github.com/mindspore-ai/docs/blob/master/docs/mindspore/source_en/faq/network_compilation.md When instantiating multiple instances of the same network (InnerNet) that contain parameters with the same name ('name_a'), an error will occur. This example shows the problematic setup. ```python import mindspore as ms import mindspore.nn as nn from mindspore import Tensor, context, ParameterTuple, Parameter context.set_context(mode=context.GRAPH_MODE) class InnerNet(nn.Cell): def __init__(self): super(InnerNet, self).__init__() self.param = Parameter(Tensor([1], ms.float32), name="name_a") def construct(self, x): return x + self.param class OutNet1(nn.Cell): def __init__(self, net1, net2): super(OutNet1, self).__init__() self.param1 = ParameterTuple(net1.get_parameters()) self.param2 = ParameterTuple(net2.get_parameters()) def construct(self, x): return x + self.param1[0] + self.param2[0] net1 = InnerNet() net2 = InnerNet() out_net = OutNet1(net1, net2) res = out_net(Tensor([1], ms.float32)) print("res:", res) ``` -------------------------------- ### Initiating Training Tasks with msrun_launcher.sh Source: https://github.com/mindspore-ai/docs/blob/master/docs/mindformers/docs/source_en/feature/safetensors.md Launch distributed training tasks using the msrun_launcher.sh script. This example shows how to initiate tasks on both the master and sub-nodes. ```shell # The first server (master node) bash scripts/msrun_launcher.sh "run_mindformer.py \ --config {CONFIG_PATH} \ --run_mode train" \ 16 8 ${ip} ${port} 0 output/msrun_log False 300 # The second server (sub-node) bash scripts/msrun_launcher.sh "run_mindformer.py \ --config {CONFIG_PATH} \ --run_mode train" \ 16 8 ${ip} ${port} 1 output/msrun_log False 300 ``` -------------------------------- ### Compile MindSpore Lite C++ Runtime Source: https://github.com/mindspore-ai/docs/blob/master/docs/lite/cloud_docs/source_en/mindir/runtime_cpp.md Compile the C++ runtime using CMake and Make. Ensure environment variables are set as per the quick start guide. ```bash mkdir build && cd build cmake ../ make ``` -------------------------------- ### MindSpore RandomRotation Example Source: https://github.com/mindspore-ai/docs/blob/master/docs/mindspore/source_en/note/api_mapping/pytorch_diff/RandomRotation.md Applies a random rotation to an image using MindSpore's mindspore.dataset.vision.RandomRotation. Ensure MindSpore is installed and the dataset configuration is set. The output size is printed. ```python import mindspore as ms import mindspore.dataset.vision as vision ms.dataset.config.set_seed(2) affine_transfomer = vision.RandomRotation(degrees=(30, 70), center=(0, 0)) img_ms = affine_transfomer(orig_img) print(img_ms.size) # Out: (471, 292) ``` -------------------------------- ### Directory Setup and Model Training Mode Source: https://github.com/mindspore-ai/docs/blob/master/tutorials/source_zh_cn/generative/gan.ipynb Sets up the necessary directories for saving checkpoints and images, and then sets both the generator and discriminator models to training mode. ```python # 设置参数保存路径 os.makedirs(checkpoints_path, exist_ok=True) # 设置中间过程生成图片保存路径 os.makedirs(image_path, exist_ok=True) net_g.set_train() net_d.set_train() ``` -------------------------------- ### Multi-Machine 2-Node 8-Card Training (Node 2) Source: https://github.com/mindspore-ai/docs/blob/master/tutorials/source_en/parallel/msrun_launcher.md This script is executed on the second node (rank 1) of a multi-machine setup. It connects to the master address specified and starts worker processes. ```bash EXEC_PATH=$(pwd) if [ ! -d "${EXEC_PATH}/MNIST_Data" ]; then if [ ! -f "${EXEC_PATH}/MNIST_Data.zip" ]; then wget http://mindspore-website.obs.cn-north-4.myhuaweicloud.com/notebook/datasets/MNIST_Data.zip fi unzip MNIST_Data.zip fi export DATA_PATH=${EXEC_PATH}/MNIST_Data/train/ rm -rf msrun_log mkdir msrun_log echo "start training" msrun --worker_num=8 --local_worker_num=4 --master_addr= --master_port=8118 --node_rank=1 --log_dir=msrun_log --join=True --cluster_time_out=300 net.py ``` -------------------------------- ### Initialize Simulator and Circuit for Bit Phase Flip Source: https://github.com/mindspore-ai/docs/blob/master/docs/mindquantum/docs/source_en/case_library/grover_search_algorithm.ipynb Sets up the quantum simulator and circuit for a bit phase flip operation. Requires the 'mqvector' simulator and basic quantum gates. ```python n_qubits = 3 sim1 = Simulator('mqvector', n_qubits) operator1 = bitphaseflip_operator([i for i in range(1, pow(2, n_qubits))], n_qubits) circuit1 = Circuit() circuit1 += UN(H, n_qubits) circuit1 += operator1 sim1.apply_circuit(circuit1) circuit1.svg() ``` -------------------------------- ### Multiprocess Deployment - Worker Node Source: https://github.com/mindspore-ai/docs/blob/master/docs/vllm_mindspore/docs/source_zh_cn/user_guide/supported_features/parallel/parallel.md Example command to start a worker node for a multiprocess deployment. The `headless` flag is used for worker nodes. Ensure `data-parallel-address` and `data-parallel-rpc-port` match the master node. ```bash # 从节点: vllm-mindspore serve MindSpore-Lab/DeepSeek-R1-0528-A8W8 --trust-remote-code --max-num-seqs=256 --max-model-len=32768 --max-num-batched-tokens=4096 --block-size=128 --gpu-memory-utilization=0.9 --headless --tensor-parallel-size 4 --data-parallel-size 4 --data-parallel-size-local 2 --data-parallel-start-rank 2 --data-parallel-address 127.0.0.1 --data-parallel-rpc-port 29550 --enable-expert-parallel --additional-config '{"expert_parallel": 4}' --quantization ascend ``` -------------------------------- ### Start TensorBoard Web Visualization Service Source: https://github.com/mindspore-ai/docs/blob/master/docs/mindformers/docs/source_en/feature/monitor.md Use this command to launch the TensorBoard service. The logdir parameter should point to the directory where TensorBoard event files are saved. Ensure the host and port are set appropriately for your network environment. ```bash tensorboard --logdir=./worker/tensorboard/ --host=0.0.0.0 --port=6006 ``` -------------------------------- ### Evaluation Log Output Source: https://github.com/mindspore-ai/docs/blob/master/docs/mindearth/docs/source_en/medium-range/graphcast.ipynb Example log output detailing the start and end of the evaluation process, including dataset size and performance metrics like RMSE and ACC for different forecast hours. ```text 2023-08-18 08:39:45,287 - forecast.py[line:204] - INFO: ================================Start Evaluation================================ 2023-08-18 08:40:41,617 - forecast.py[line:222] - INFO: test dataset size: 8 2023-08-18 08:40:41,621 - forecast.py[line:173] - INFO: t = 6 hour: 2023-08-18 08:40:41,622 - forecast.py[line:183] - INFO: RMSE of Z500: 99.63094225503905, T2m: 1.7807282244821678, T850: 1.1389199313716583, U10: 1.3300655484052706 2023-08-18 08:40:41,623 - forecast.py[line:184] - INFO: ACC of Z500: 0.9995030164718628, T2m: 0.9949086904525757, T850: 0.9965617656707764, U10: 0.9709641933441162 2023-08-18 08:40:41,624 - forecast.py[line:173] - INFO: t = 72 hour: 2023-08-18 08:40:41,625 - forecast.py[line:183] - INFO: RMSE of Z500: 846.2669541832905, T2m: 5.095069601461138, T850: 4.291456435667611, U10: 5.033789250954006 2023-08-18 08:40:41,627 - forecast.py[line:184] - INFO: ACC of Z500: 0.9656049013137817, T2m: 0.9600029587745667, T850: 0.9581822752952576, U10: 0.5923701524734497 2023-08-18 08:40:41,628 - forecast.py[line:173] - INFO: t = 120 hour: 2023-08-18 08:40:41,629 - forecast.py[line:183] - INFO: RMSE of Z500: 1289.3497973601945, T2m: 7.078691998772932, T850: 5.762323874978418, U10: 6.205910397891656 2023-08-18 08:40:41,629 - forecast.py[line:184] - INFO: ACC of Z500: 0.9226452112197876, T2m: 0.9238687753677368, T850: 0.9285233020782471, U10: 0.366882860660553 2023-08-18 08:40:41,630 - forecast.py[line:232] - INFO: ================================End Evaluation================================ ``` -------------------------------- ### Install MindQuantum and Chemistry Libraries Source: https://github.com/mindspore-ai/docs/blob/master/docs/mindquantum/docs/source_en/advanced/mqchem_tutorial.ipynb Install the necessary libraries for MindQuantum and quantum chemistry calculations. ```bash pip install mindquantum openfermion openfermionpyscf ``` -------------------------------- ### Example CheckPoint File Paths Source: https://github.com/mindspore-ai/docs/blob/master/docs/mindformers/docs/source_en/feature/high_availability.md Example output from the `find` command, listing the paths of generated CheckPoint files. ```text $ find output/checkpoint/ -name '*.ckpt' output/checkpoint/rank_2/llama2_13b_rank_2-5_1.ckpt output/checkpoint/rank_3/llama2_13b_rank_3-5_1.ckpt output/checkpoint/rank_0/llama2_13b_rank_0-5_1.ckpt output/checkpoint/rank_5/llama2_13b_rank_5-5_1.ckpt ``` -------------------------------- ### Multi-Node Fine-Tuning Configuration Snippet Source: https://github.com/mindspore-ai/docs/blob/master/docs/mindformers/docs/source_en/guide/supervised_fine_tuning.md Example of parallel configuration settings within a YAML file for multi-node, multi-NPU fine-tuning. Adjust data_parallel, model_parallel, pipeline_stage, and context_parallel as per your cluster setup. ```yaml parallel_config: data_parallel: ... model_parallel: ... pipeline_stage: ... context_parallel: ... ```