### Install Dependencies and torch-npu Source: https://github.com/ascend/docs/blob/main/sources/opencompass/install.rst Install necessary Python dependencies and then torch and torch-npu version 2.1.0. Use the provided mirror for installation. ```shell # install the dependencies pip3 install attrs numpy==1.26.4 decorator sympy cffi pyyaml pathlib2 psutil protobuf scipy requests absl-py wheel typing_extensions -i https://pypi.tuna.tsinghua.edu.cn/simple # install torch and torch-npu pip install torch==2.1.0 torch-npu==2.1.0 -i https://pypi.tuna.tsinghua.edu.cn/simple ``` -------------------------------- ### Install Project Dependencies Source: https://github.com/ascend/docs/blob/main/README.md Install the required Python dependencies for building the documentation. This command includes specific index URLs for PyPI and PyTorch. ```bash pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple --extra-index-url https://download.pytorch.org/whl/cpu ``` -------------------------------- ### Install Necessary Libraries Source: https://github.com/ascend/docs/blob/main/sources/transformers/fine-tune.rst Install the required Python libraries for fine-tuning models, including transformers, datasets, evaluate, accelerate, and scikit-learn. ```shell pip install transformers datasets evaluate accelerate scikit-learn ``` -------------------------------- ### Install OpenCompass Source: https://github.com/ascend/docs/blob/main/sources/opencompass/install.rst Install OpenCompass using pip. The command includes a mirror for faster downloads. Uncomment other lines for full installation or specific framework support. ```shell pip install -U opencompass -i https://pypi.tuna.tsinghua.edu.cn/simple ## Full installation (with support for more datasets) # pip install "opencompass[full]" ## Environment with model acceleration frameworks ## Manage different acceleration frameworks using virtual environments ## since they usually have dependency conflicts with each other. # pip install "opencompass[lmdeploy]" # pip install "opencompass[vllm]" ## API evaluation (i.e. Openai, Qwen) # pip install "opencompass[api]" ``` -------------------------------- ### Configure Agentic Pipeline with DeepSpeed Source: https://github.com/ascend/docs/blob/main/sources/roll/quick_start.rst Update the main configuration file to use DeepSpeed for training. This example shows a typical setup for the qwen2.5-0.5B-agentic model, including environment settings, checkpointing, and training parameters. ```yaml # vim examples/qwen2.5-0.5B-agentic/agentic_val_sokoban.yaml defaults: - ../config/traj_envs@_here_ - ../config/deepspeed_zero@_here_ - ../config/deepspeed_zero2@_here_ - ../config/deepspeed_zero3@_here_ - ../config/deepspeed_zero3_cpuoffload@_here_ hydra: run: dir: . output_subdir: null exp_name: "agentic_pipeline" seed: 42 logging_dir: ./output/logs output_dir: ./output render_save_dir: ./output/render system_envs: USE_MODELSCOPE: '1' #track_with: wandb #tracker_kwargs: # api_key: # project: roll-agentic # name: ${exp_name}_sokoban # notes: "agentic_pipeline" # tags: # - agentic # - roll # - baseline track_with: tensorboard tracker_kwargs: log_dir: ./data/oss_bucket_0/yali/llm/tensorboard/roll_exp/agentic_sokoban checkpoint_config: type: file_system output_dir: ./data/cpfs_0/rl_examples/models/${exp_name} num_gpus_per_node: 4 max_steps: 128 save_steps: 10000 logging_steps: 1 eval_steps: 10 resume_from_checkpoint: false rollout_batch_size: 16 val_batch_size: 16 sequence_length: 1024 advantage_clip: 0.2 ppo_epochs: 1 adv_estimator: "grpo" #pg_clip: 0.1 #dual_clip_loss: True init_kl_coef: 0.0 whiten_advantages: true entropy_loss_coef: 0 max_grad_norm: 1.0 pretrain: Qwen/Qwen2.5-0.5B-Instruct reward_pretrain: Qwen/Qwen2.5-0.5B-Instruct actor_train: model_args: attn_implementation: fa2 disable_gradient_checkpointing: false dtype: bf16 model_type: ~ training_args: learning_rate: 1.0e-6 weight_decay: 0 per_device_train_batch_size: 2 gradient_accumulation_steps: 64 warmup_steps: 10 lr_scheduler_type: cosine data_args: template: qwen2_5 strategy_args: strategy_name: deepspeed_train strategy_config: ${deepspeed_zero3} # strategy_name: megatron_train # strategy_config: # tensor_model_parallel_size: 1 # pipeline_model_parallel_size: 1 # expert_model_parallel_size: 1 # use_distributed_optimizer: true # recompute_granularity: full device_mapping: list(range(0,2)) infer_batch_size: 2 actor_infer: model_args: disable_gradient_checkpointing: true dtype: bf16 generating_args: max_new_tokens: 128 # single-turn response length top_p: 0.99 top_k: 100 num_beams: 1 temperature: 0.99 num_return_sequences: 1 data_args: template: qwen2_5 strategy_args: strategy_name: vllm strategy_config: gpu_memory_utilization: 0.6 block_size: 16 load_format: auto device_mapping: list(range(2,3)) reference: model_args: attn_implementation: fa2 disable_gradient_checkpointing: true dtype: bf16 model_type: ~ data_args: template: qwen2_5 strategy_args: strategy_name: hf_infer strategy_config: ~ device_mapping: list(range(3,4)) infer_batch_size: 2 reward_normalization: grouping: traj_group_id # 可以tags(env_type)/traj_group_id(group)/batch(rollout_batch)... group_by计算reward/adv method: mean_std # asym_clip / identity / mean_std train_env_manager: format_penalty: -0.15 # sokoban env penalty_for_step=-0.1 max_env_num_per_worker: 4 num_env_groups: 8 # under the same group, the env config and env seed are ensured to be equal group_size: 1 tags: [SimpleSokoban] num_groups_partition: [8] # If not set, all env names divide nums equally. Under the same group, the env config and env seed (prompt) are equal in each generation val_env_manager: max_env_num_per_worker: 32 num_env_groups: 64 group_size: 1 # should be set to 1 because val temperature is set to 0 and same prompt leads to same output ``` -------------------------------- ### Start Training with Debug Model Source: https://github.com/ascend/docs/blob/main/sources/torchtitan/quick_start.rst Execute the run_train.sh script to start pre-training using the configuration specified in debug_model.toml. ```shell ./run_train.sh ``` -------------------------------- ### Clone Accelerate Repository Source: https://github.com/ascend/docs/blob/main/sources/accelerate/quick_start.rst Download the Accelerate official example code from GitHub. This is required to access the example scripts. ```bash git clone https://github.com/huggingface/accelerate.git ``` -------------------------------- ### Install timm Package Source: https://github.com/ascend/docs/blob/main/sources/timm/install.rst Install the timm library using pip, specifying a mirror for faster downloads. Ensure your virtual environment is activated. ```shell pip install timm -i https://pypi.tuna.tsinghua.edu.cn/simple ``` -------------------------------- ### Start Ray with Dashboard Access Source: https://github.com/ascend/docs/blob/main/sources/Ray/quick_start.rst Start a Ray head node and configure the dashboard to be accessible from external browsers by binding to `0.0.0.0` and specifying a custom dashboard port. ```bash ray start --head --dashboard-host=0.0.0.0 --dashboard-port=8265 ``` -------------------------------- ### Install Ray Source: https://github.com/ascend/docs/blob/main/sources/Ray/quick_start.rst Install Ray version 2.46.0 or higher using pip. This is the recommended installation method for Ascend NPU. ```bash pip install "ray>=2.46.0" ``` -------------------------------- ### Navigate to Example Directory Source: https://github.com/ascend/docs/blob/main/sources/wenet/quick_start.rst Change the current directory to the location of the NPU experiment script. ```shell cd example/aishell/s0 ``` -------------------------------- ### Install Hugging Face Hub CLI Source: https://github.com/ascend/docs/blob/main/sources/transformers/modeldownload.rst Install the Hugging Face Hub command-line interface tool. This is required for downloading models from Hugging Face. ```shell pip install huggingface-hub ``` -------------------------------- ### Serve Local Documentation Source: https://github.com/ascend/docs/blob/main/README.md Start a local HTTP server to view the built documentation. The server will be accessible at localhost:4000. ```bash python -m http.server -d _build/html 4000 ``` -------------------------------- ### Verify timm and NPU Installation Source: https://github.com/ascend/docs/blob/main/sources/timm/install.rst Run this Python script to check the installed timm version and confirm NPU device availability. Successful execution indicates a correct setup. ```python import torch import torch_npu import timm print("timm version:", timm.version.__version__) print("NPU devices:", torch.npu.current_device()) ``` -------------------------------- ### Install torch and torch-npu Source: https://github.com/ascend/docs/blob/main/sources/timm/install.rst Install specific versions of PyTorch (2.2.0) and torch-npu (2.2.0) for Ascend compatibility. This command also installs necessary dependencies. ```shell pip3 install attrs numpy==1.26.4 decorator sympy cffi pyyaml pathlib2 psutil protobuf scipy requests absl-py wheel typing_extensions -i https://pypi.tuna.tsinghua.edu.cn/simple pip install torch==2.2.0 torch-npu==2.2.0 -i https://pypi.tuna.tsinghua.edu.cn/simple ``` -------------------------------- ### Install Required Libraries Source: https://github.com/ascend/docs/blob/main/sources/accelerate/quick_start.rst Install HuggingFace, scikit-learn, and other necessary libraries using pip. Ensure you use the specified index URL for Tsinghua. ```bash pip install datasets evaluate transformers scikit-learn -i https://pypi.tuna.tsinghua.edu.cn/simple ``` -------------------------------- ### Expected Output for Installation Verification Source: https://github.com/ascend/docs/blob/main/sources/opencompass/install.rst This is the expected output when the OpenCompass and NPU installation is successful on a single NPU card environment. ```shell opencompass version: 0.3.3 NPU devices: 0 ``` -------------------------------- ### Start Ray Head Node Source: https://github.com/ascend/docs/blob/main/sources/Ray/quick_start.rst Start a Ray node on a single machine, which will act as the head node. This is the default behavior when no address is specified. ```bash ray start --head ``` -------------------------------- ### Start Ray with Custom Temporary Directory Source: https://github.com/ascend/docs/blob/main/sources/Ray/quick_start.rst Start a Ray head node and specify a custom directory for temporary files and logs. This is recommended for long-running, heavy-load tasks to avoid filling up the default `/tmp` directory. ```bash ray start --head --temp-dir=/data/ray_tmp ``` -------------------------------- ### Launch Training Script with Accelerate Source: https://github.com/ascend/docs/blob/main/sources/pytorch/quick_start.rst Use the `accelerate launch` command to start a training script with specified configurations like mixed precision and model paths. Ensure environment variables like MODEL_NAME and DATASET_NAME are set. ```bash accelerate launch --mixed_precision="fp16" train_text_to_image.py \ --pretrained_model_name_or_path=$MODEL_NAME \ --dataset_name=$DATASET_NAME \ --use_ema \ --resolution=512 --center_crop --random_flip \ --train_batch_size=1 \ --gradient_accumulation_steps=4 \ --gradient_checkpointing \ --max_train_steps=15000 \ --learning_rate=1e-05 \ --max_grad_norm=1 \ --lr_scheduler="constant" --lr_warmup_steps=0 \ --output_dir="sd-pokemon-model" ``` -------------------------------- ### Verify Kernels Installation Source: https://github.com/ascend/docs/blob/main/sources/kernels/quick_start.rst This snippet demonstrates how to download an optimized kernel from the Hugging Face hub and use it for a fast GELU computation on a CUDA device. Ensure you have the 'kernels' library installed. ```python import torch from kernels import get_kernel # Download optimized kernels from the Hugging Face hub activation = get_kernel("kernels-community/activation") # Create a random tensor x = torch.randn((10, 10), dtype=torch.float16, device="cuda") # Run the kernel y = torch.empty_like(x) activation.gelu_fast(y, x) print(y) ``` -------------------------------- ### Start Training with Llama3-8B Model Source: https://github.com/ascend/docs/blob/main/sources/torchtitan/quick_start.rst Execute the run_train.sh script, specifying the configuration file for the Llama3-8B model. ```shell CONFIG_FILE="./torchtitan/models/llama3/train_configs/llama3_8b.toml" ./run_train.sh ``` -------------------------------- ### Start Ray Worker Node Source: https://github.com/ascend/docs/blob/main/sources/Ray/quick_start.rst Start a Ray worker node and connect it to the head node. Replace `` with the actual IP address of the head node. ```bash ray start --address=':6379' ``` -------------------------------- ### Verify OpenCompass and NPU Installation Source: https://github.com/ascend/docs/blob/main/sources/opencompass/install.rst Run this Python script to verify the installation of OpenCompass and check NPU device availability. Successful execution prints the OpenCompass version and NPU device information. ```python import torch import opencompass print("opencompass version: ", opencompass.__version__) print("NPU devices: ", torch.npu.current_device()) ``` -------------------------------- ### Start Ray with Explicit Resource Declaration Source: https://github.com/ascend/docs/blob/main/sources/Ray/quick_start.rst Start a Ray head node and explicitly declare the number of CPUs and GPUs available to the Ray cluster. This is useful when you want to allocate only a portion of the machine's resources to Ray. ```bash ray start --head --num-cpus=4 --num-gpus=1 ``` -------------------------------- ### Start Ray with Custom Port Source: https://github.com/ascend/docs/blob/main/sources/Ray/quick_start.rst Start a Ray head node and specify a custom communication port (default is 6379). This is useful when running multiple Ray clusters on the same host. ```bash ray start --head --port=6380 ``` -------------------------------- ### Verify Ascend Operator Installation Source: https://github.com/ascend/docs/blob/main/sources/opencv/install.rst Execute the Ascend operator unit tests to confirm that OpenCV has been successfully compiled and installed with Ascend support. Successful completion indicates proper integration. ```shell cd path/to/opencv/build/bin ./opencv_test_cannops ``` -------------------------------- ### Example JSON Data Format Source: https://github.com/ascend/docs/blob/main/sources/wenet/quick_start.rst Illustrates the JSON format for each line in the data.list file, containing key, wav path, and text transcription. ```json {"key": "BAC009S0002W0122", "wav": "/export/data/asr-data/OpenSLR/33//data_aishell/wav/train/S0002/BAC009S0002W0122.wav", "txt": "而对楼市成交抑制作用最大的限购"} ``` -------------------------------- ### Expected Output for Verification Source: https://github.com/ascend/docs/blob/main/sources/timm/install.rst This is the expected output when the timm and NPU installation is successful on a single NPU card environment. ```shell timm version: 1.0.8.dev0 NPU devices: 0 ``` -------------------------------- ### Transformers with Local Kernels Source: https://github.com/ascend/docs/blob/main/sources/kernels/quick_start.rst This example demonstrates loading a causal language model using Transformers with local kernels. Performance can be compared by commenting out the `kernel_config` argument. Note: This requires Transformers to be compiled from source. ```python import time import logging from transformers import AutoModelForCausalLM, AutoTokenizer, KernelConfig # Set the level to `DEBUG` to see which kernels are being called. logging.basicConfig(level=logging.DEBUG) model_name = "/root/Qwen3" kernel_mapping = { "RMSNorm": "/kernels-ext-npu/rmsnorm:rmsnorm", } kernel_config = KernelConfig(kernel_mapping, use_local_kernel=True) # Load the tokenizer and the model tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype="auto", device_map="auto", kernel_config=kernel_config ) # Prepare the model input prompt = "What is the result of 100 + 100?" messages = [{"role": "user", "content": prompt}] text = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True, enable_thinking=False, ) model_inputs = tokenizer([text], return_tensors="pt").to(model.device) # Warm_up for _ in range(2): generated_ids = model.generate(**model_inputs, max_new_tokens=32768) output_ids = generated_ids[0][len(model_inputs.input_ids[0]) :].tolist() # Print Runtime for _ in range(5): start_time = time.time() generated_ids = model.generate(**model_inputs, max_new_tokens=32768) print("runtime: ", time.time() - start_time) output_ids = generated_ids[0][len(model_inputs.input_ids[0]) :].tolist() content = tokenizer.decode(output_ids, skip_special_tokens=True).strip("\n") print("content:", content) ``` -------------------------------- ### Create Small Datasets for Training and Evaluation Source: https://github.com/ascend/docs/blob/main/sources/transformers/fine-tune.rst Create smaller subsets of the tokenized dataset for training and evaluation by shuffling and selecting a specified number of examples. This is useful for faster iteration during development. The commented-out lines show how to select the full datasets. ```python small_train_dataset = tokenized_datasets["train"].shuffle(seed=42).select(range(1000)) small_eval_dataset = tokenized_datasets["test"].shuffle(seed=42).select(range(1000)) # 下面是加载全训练集和验证集 # full_train_dataset = tokenized_datasets["train"] # full_eval_dataset = tokenized_datasets["test"] ``` -------------------------------- ### Configure Ray Environment Variable Before Start Source: https://github.com/ascend/docs/blob/main/sources/Ray/quick_start.rst Set the `RAY_DEDUP_LOGS` environment variable to 0 before starting the Ray head node to disable log deduplication. This is recommended for debugging NPU issues. ```bash export RAY_DEDUP_LOGS=0 ray start --head ``` -------------------------------- ### llama.cpp Inference Log Output Source: https://github.com/ascend/docs/blob/main/sources/llama_cpp/quick_start.rst Example output from a successful llama.cpp inference run, detailing model loading, metadata, and vocabulary information. ```shell Log start main: build = 3520 (8e707118) main: built with cc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0 for aarch64-linux-gnu main: seed = 1728907816 llama_model_loader: loaded meta data with 22 key-value pairs and 291 tensors from /home/jiahao/models/llama3-8b-instruct-fp16.gguf (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = llama llama_model_loader: - kv 1: general.name str = Meta-Llama-3-8B-Instruct llama_model_loader: - kv 2: llama.block_count u32 = 32 llama_model_loader: - kv 3: llama.context_length u32 = 8192 llama_model_loader: - kv 4: llama.embedding_length u32 = 4096 llama_model_loader: - kv 5: llama.feed_forward_length u32 = 14336 llama_model_loader: - kv 6: llama.attention.head_count u32 = 32 llama_model_loader: - kv 7: llama.attention.head_count_kv u32 = 8 llama_model_loader: - kv 8: llama.rope.freq_base f32 = 500000.000000 llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010 llama_model_loader: - kv 10: general.file_type u32 = 1 llama_model_loader: - kv 11: llama.vocab_size u32 = 128256 llama_model_loader: - kv 12: llama.rope.dimension_count u32 = 128 llama_model_loader: - kv 13: tokenizer.ggml.model str = gpt2 llama_model_loader: - kv 14: tokenizer.ggml.pre str = llama-bpe llama_model_loader: - kv 15: tokenizer.ggml.tokens arr[str,128256] = ["!", "\"", "#", "$", "%", "&", "'", ... llama_model_loader: - kv 16: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 17: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "..." llama_model_loader: - kv 18: tokenizer.ggml.bos_token_id u32 = 128000 llama_model_loader: - kv 19: tokenizer.ggml.eos_token_id u32 = 128009 llama_model_loader: - kv 20: tokenizer.chat_template str = {% set loop_messages = messages %}{% ... llama_model_loader: - kv 21: general.quantization_version u32 = 2 llama_model_loader: - type f32: 65 tensors llama_model_loader: - type f16: 226 tensors llm_load_vocab: special tokens cache size = 256 llm_load_vocab: token to piece cache size = 0.8000 MB llm_load_print_meta: format = GGUF V3 (latest) llm_load_print_meta: arch = llama llm_load_print_meta: vocab type = BPE llm_load_print_meta: n_vocab = 128256 llm_load_print_meta: n_merges = 280147 llm_load_print_meta: vocab_only = 0 ``` -------------------------------- ### Transformers with Remote Kernels Source: https://github.com/ascend/docs/blob/main/sources/kernels/quick_start.rst This example shows how to load a causal language model using Transformers with remote kernels enabled. Set logging to DEBUG to see which kernels are called. Performance can be compared by commenting out `use_kernels=True`. ```python import time import logging from transformers import AutoModelForCausalLM, AutoTokenizer # Set the level to `DEBUG` to see which kernels are being called. logging.basicConfig(level=logging.DEBUG) model_name = "Qwen/Qwen3-0.6B" # load the tokenizer and the model tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype="auto", device_map="auto", use_kernels=True, ) # Prepare the model input prompt = "What is the result of 100 + 100?" messages = [{"role": "user", "content": prompt}] text = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True, enable_thinking=False, ) model_inputs = tokenizer([text], return_tensors="pt").to(model.device) # Warm_up for _ in range(2): generated_ids = model.generate(**model_inputs, max_new_tokens=32768) output_ids = generated_ids[0][len(model_inputs.input_ids[0]) :].tolist() # Print Runtime for _ in range(5): start_time = time.time() generated_ids = model.generate(**model_inputs, max_new_tokens=32768) print("runtime: ", time.time() - start_time) output_ids = generated_ids[0][len(model_inputs.input_ids[0]) :].tolist() content = tokenizer.decode(output_ids, skip_special_tokens=True).strip("\n") print("content:", content) ``` -------------------------------- ### Interactive Chat with Model Source: https://github.com/ascend/docs/blob/main/sources/torchchat/quick_start.rst Engage in an interactive chat session with a downloaded model. Specify the model name, e.g., `llama3.1`, to start the chat. ```shell # 交互式聊天 python torchchat.py chat llama3.1 ``` -------------------------------- ### Train NLP Model on Ascend NPU Source: https://github.com/ascend/docs/blob/main/sources/accelerate/quick_start.rst Set the HF_ENDPOINT to mirror HuggingFace domains for faster downloads in China. Navigate to the Accelerate examples directory and run the NLP training script. Successful training is indicated by specific log messages. ```bash # 替换HF域名,方便国内用户进行数据及模型的下载 export HF_ENDPOINT=https://hf-mirror.com # 进入项目目录 cd accelerate/examples # 模型训练 python nlp_example.py ``` -------------------------------- ### Model Inference Output and Performance Analysis Source: https://github.com/ascend/docs/blob/main/sources/torchchat/quick_start.rst This output shows an example of text generation, including device information, model loading time, generated text, and detailed performance metrics like token throughput and memory usage. ```text Using device=npu Ascend910B3 Loading model... Time to load model: 4.42 seconds ----------------------------------------------------------- write me a story about a boy and his bear friend Once upon a time, in a dense forest, there lived a young boy named Timmy. Timmy was a curious and adventurous boy who loved exploring the woods behind his village. One day, while wandering deeper into the forest than he had ever gone before, Timmy stumbled upon a magnificent brown bear. The bear was enormous, with a thick coat of fur and piercing yellow eyes. At first, Timmy was frightened, but to his surprise, the bear didn't seem to be threatening him. Instead, the bear gently approached Timmy and began to sniff him. As the days passed, Timmy and the bear, whom he named Boris, became inseparable friends. Boris was unlike any bear Timmy had ever seen before. He was incredibly intelligent and could understand human language. Boris would often sit by Timmy's side as he read books or helped with his chores. The villagers were initially wary of Boris, but as they saw how kind and gentle he was, they grew ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Generated 199 tokens Time for inference 1: 13.3118 sec total Time to first token: 0.6189 sec with parallel prefill. Total throughput: 15.0242 tokens/sec, 0.0666 s/token First token throughput: 1.6157 tokens/sec, 0.6189 s/token Next token throughput: 15.6781 tokens/sec, 0.0638 s/token Bandwidth achieved: 241.30 GB/s *** This first iteration will include cold start effects for dynamic import, hardware caches. *** ======================================== Warning: Excluding compile in calculations Average tokens/sec (total): 15.02 Average tokens/sec (first token): 1.62 Average tokens/sec (next tokens): 15.68 Memory used: 17.23 GB ``` -------------------------------- ### Train Model on Ascend NPU Source: https://github.com/ascend/docs/blob/main/sources/wenet/quick_start.rst Start model training on Ascend NPU. The script automatically acquires NPU card numbers and sets environment variables. Specify NPU card numbers by modifying ASCEND_RT_VISIBLE_DEVICES. ```shell bash run_npu.sh --stage 4 --stop_stage 4 ``` -------------------------------- ### SGLang Inference Verification Script Source: https://github.com/ascend/docs/blob/main/sources/sglang/quick_start.rst This Python script verifies SGLang installation by performing inference with multiple prompts on an Ascend NPU. Ensure the model path is correct. ```python # example.py import torch import sglang as sgl def main(): prompts = [ "Hello, my name is", "The Independence Day of the United States is", "The capital of Germany is", "The full form of AI is", ] * 1 llm = sgl.Engine(model_path="/Qwen2.5/Qwen2.5-0.5B-Instruct", device="npu", attention_backend="ascend") sampling_params = {"temperature": 0.8, "top_p": 0.95, "max_new_tokens": 100} outputs = llm.generate(prompts, sampling_params) for prompt, output in zip(prompts, outputs): print("===============================") print(f"Prompt: {prompt}\nGenerated text: {output['text']}") if __name__ == '__main__': main() ``` -------------------------------- ### Build llama.cpp with CANN Support Source: https://github.com/ascend/docs/blob/main/sources/llama_cpp/install.rst Compile llama.cpp using CMake, enabling CANN support and setting the build type to release. This step requires the CANN-toolkit to be installed and configured. ```shell cmake -B build -DGGML_CANN=on -DCMAKE_BUILD_TYPE=release cmake --build build --config release ``` -------------------------------- ### Import torch-npu Source: https://github.com/ascend/docs/blob/main/sources/timm/quick_start.rst Import torch and then torch-npu in your entry script. Ensure your Ascend environment and timm are set up. ```python import torch import torch-npu ``` -------------------------------- ### Prepare Training Data Source: https://github.com/ascend/docs/blob/main/sources/wenet/quick_start.rst Organize training data into wav.scp and text formats using the local/aishell_data_prep.sh script. wav.scp maps wav_id to wav_path, and text maps wav_id to text_label. ```shell bash run_npu.sh --stage 0 --stop_stage 0 ``` -------------------------------- ### Build HTML Documentation Source: https://github.com/ascend/docs/blob/main/README.md Execute this command to build the HTML version of the documentation locally. This command is part of the Sphinx build process. ```bash make html ``` -------------------------------- ### Launch SGLang Server on NPU Source: https://github.com/ascend/docs/blob/main/sources/sglang/quick_start.rst Use this command to launch the SGLang server, specifying the model, device, port, and attention backend for Ascend NPU. ```shell # Launch the SGLang server on NPU python -m sglang.launch_server --model Qwen/Qwen2.5-0.5B-Instruct \ --device npu --port 8000 --attention-backend ascend \ --host 0.0.0.0 --trust-remote-code ``` -------------------------------- ### Image Processing with Ascend NPU in Python Source: https://github.com/ascend/docs/blob/main/sources/opencv/quick_start.rst This Python snippet demonstrates image processing on Ascend NPUs using OpenCV. It covers CANN initialization, device selection, applying add, rotate, and flip operations, and finalization. Input and output image paths are specified via command-line arguments. ```python # This file is part of OpenCV project. # It is subject to the license terms in the LICENSE file found in the top-level directory # of this distribution and at http://opencv.org/license.html. import numpy as np import cv2 import argparse parser = argparse.ArgumentParser(description='This is a sample for image processing with Ascend NPU.') parser.add_argument('image', help='path to input image') parser.add_argument('output', help='path to output image') args = parser.parse_args() # 读取输入图像 img = cv2.imread(args.image) # 生成高斯噪声 gaussNoise = np.random.normal(0, 25,(img.shape[0], img.shape[1], img.shape[2])).astype(img.dtype) # cann 初始化及指定设备 cv2.cann.initAcl() cv2.cann.setDevice(0) # 添加高斯噪声到输入图像 output = cv2.cann.add(img, gaussNoise) # 旋转图像 (0, 1, 2, 分别代表旋转 90°, 180°, 270°) output = cv2.cann.rotate(output, 0) # 翻转图像 (0, 正数, 负数, 分别代表沿 x, y, x 和 y 轴进行翻转) output = cv2.cann.flip(output, 0) # 写入输出图像 cv2.imwrite(args.output, output) # cann 去初始化 cv2.cann.finalizeAcl() ``` -------------------------------- ### Configure Docker for Ascend Devices Source: https://github.com/ascend/docs/blob/main/sources/sglang/install.rst Use `drun` with specific device and volume mounts to prepare the Docker environment for Ascend hardware. Ensure necessary Ascend drivers and configurations are accessible within the container. ```bash --device=/dev/davinci12 --device=/dev/davinci13 --device=/dev/davinci14 --device=/dev/davinci15 \ --device=/dev/davinci_manager --device=/dev/hisi_hdc \ --volume /usr/local/sbin:/usr/local/sbin --volume /usr/local/Ascend/driver:/usr/local/Ascend/driver \ --volume /usr/local/Ascend/firmware:/usr/local/Ascend/firmware \ --volume /etc/ascend_install.info:/etc/ascend_install.info \ --volume /var/queue_schedule:/var/queue_schedule --volume ~/.cache/:/root/.cache/ ``` -------------------------------- ### View TorchChat Help Source: https://github.com/ascend/docs/blob/main/sources/torchchat/quick_start.rst Use the --help flag to see available commands and their usage. ```shell # 查看帮助 python torchchat.py --help ``` -------------------------------- ### Prepare WeNet Data Format Source: https://github.com/ascend/docs/blob/main/sources/wenet/quick_start.rst Generate the data.list file in the format required by WeNet. Each line is a JSON object containing key, wav file path, and text content. ```shell bash run_npu.sh --stage 3 --stop_stage 3 ``` -------------------------------- ### Text Generation Pipeline Source: https://github.com/ascend/docs/blob/main/sources/transformers/inference.rst Generate conversational responses or text completions. This example uses a conversational model and is designed for chat-like interactions. ```shell from transformers import pipeline generator = pipeline(model="HuggingFaceH4/zephyr-7b-beta") # Zephyr-beta is a conversational model, so let's pass it a chat instead of a single string generator([{"role": "user", "content": "What is the capital of France? Answer in one word."}], do_sample=False, max_new_tokens=2) ``` -------------------------------- ### Text Classification Pipeline Source: https://github.com/ascend/docs/blob/main/sources/transformers/inference.rst Classify text based on provided candidate labels. This example uses a large language model for classification. ```shell from transformers import pipeline classifier = pipeline(model="meta-llama/Meta-Llama-3-8B-Instruct") classifier( "I have a problem with my iphone that needs to be resolved asap!!", candidate_labels=["urgent", "not urgent", "phone", "tablet", "computer"], ) ``` -------------------------------- ### Evaluate Transformers Model on MMLU with Ascend Source: https://github.com/ascend/docs/blob/main/sources/lm_evaluation/quick_start.rst Use this command to evaluate transformers models like Qwen2-0.5B-Instruct on the MMLU task using Ascend NPU. Ensure the HF_ENDPOINT is set for efficient data and model downloads. ```shell # Replace HF domain name for domestic users to download data and models export HF_ENDPOINT=https://hf-mirror.com lm_eval --model hf \ --model_args pretrained=Qwen2-0.5B-Instruct \ --tasks MMLU \ --device npu:0 \ --batch_size 8 ``` -------------------------------- ### Create Conda Environment for OpenCV Source: https://github.com/ascend/docs/blob/main/sources/opencv/install.rst Use this command to create a new conda environment named 'opencv' with Python 3.10, which is required for the OpenCV installation. ```shell # 创建名为 opencv 的 python 3.10 的虚拟环境 conda create -y -n opencv python=3.10 # 激活虚拟环境 conda activate opencv ``` -------------------------------- ### Image-to-Image Upscaling Pipeline Source: https://github.com/ascend/docs/blob/main/sources/transformers/inference.rst Enhance image details and resolution using an image-to-image pipeline. This example uses a super-resolution model to upscale a low-resolution image. ```python from PIL import Image import requests from transformers import pipeline upscaler = pipeline("image-to-image", model="caidas/swin2SR-classical-sr-x2-64") img = Image.open(requests.get("http://images.cocodataset.org/val2017/000000039769.jpg", stream=True).raw) img = img.resize((64, 64)) upscaled_img = upscaler(img) #超分辨率处理 print(img.size) print(upscaled_img.size) ``` -------------------------------- ### 配置对话模型评估任务 Source: https://github.com/ascend/docs/blob/main/sources/opencompass/quick_start.rst 使用命令行配置和启动对话模型的评估任务。确保已安装 'opencompass' 和 'torch-npu'。--debug 模式会顺序执行任务并实时打印输出。 ```shell python run.py \ --models hf_internlm2_chat_1_8b hf_qwen2_1_5b_instruct \ --datasets demo_gsm8k_chat_gen demo_math_chat_gen \ --debug ``` -------------------------------- ### Create Python Virtual Environment Source: https://github.com/ascend/docs/blob/main/sources/timm/install.rst Use conda to create a new virtual environment named 'timm' with Python 3.10. Activate the environment before proceeding with installations. ```shell conda create -y -n timm python=3.10 conda activate ``` -------------------------------- ### 配置基座模型评估任务 Source: https://github.com/ascend/docs/blob/main/sources/opencompass/quick_start.rst 使用命令行配置和启动基座模型的评估任务。确保已安装 'opencompass' 和 'torch-npu'。--debug 模式会顺序执行任务并实时打印输出。 ```shell python run.py \ --models hf_internlm2_1_8b hf_qwen2_1_5b \ --datasets demo_gsm8k_base_gen demo_math_base_gen \ --debug ``` -------------------------------- ### Create Python 3.10 Virtual Environment Source: https://github.com/ascend/docs/blob/main/sources/opencompass/install.rst Use this command to create a conda virtual environment named 'opencompass' with Python 3.10. Activate it before proceeding with installations. ```shell # 创建 python 3.10 的虚拟环境 conda create -y -n opencompass python=3.10 # 激活虚拟环境 conda activate opencompass ``` -------------------------------- ### Single-Device Inference with llama.cpp Source: https://github.com/ascend/docs/blob/main/sources/llama_cpp/quick_start.rst Execute LLM inference on a single Ascend NPU. Specify the model path, prompt, and other parameters like number of tokens and GPU layers. ```shell ./build/bin/llama-cli -m path_to_model -p "Building a website can be done in 10 simple steps:" -n 400 -e -ngl 33 -sm none -mg 0 ``` -------------------------------- ### Visual Question Answering Pipeline Source: https://github.com/ascend/docs/blob/main/sources/transformers/inference.rst Answer questions about an image. Provide the image URL or local path and the question. This example uses a large language model for VQA. ```shell from transformers import pipeline vqa = pipeline(model="meta-llama/Meta-Llama-3-8B-Instruct") output = vqa( image="https://huggingface.co/spaces/impira/docquery/resolve/2359223c1837a7587402bda0f2643382a6eefeab/invoice.png", question="What is the invoice number?", ) output[0]["score"] = round(output[0]["score"], 3) ``` -------------------------------- ### Convert Hugging Face Model to GGUF Source: https://github.com/ascend/docs/blob/main/sources/llama_cpp/quick_start.rst Use this script to convert Hugging Face models to the GGUF format required by llama.cpp. Ensure you have the necessary environment setup. ```shell python convert_hf_to_gguf.py path/to/model ``` -------------------------------- ### Initialize Tokenizer Source: https://github.com/ascend/docs/blob/main/sources/transformers/fine-tune.rst Initialize an AutoTokenizer for the specified model (e.g., Meta-Llama-3-8B-Instruct). This tokenizer will be used to convert text into tokens that the model can understand. The example shows how to encode a sample sentence. ```python from transformers import AutoTokenizer tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3-8B-Instruct") #使用分词器处理文本 encoded_input = tokenizer("Do not meddle in the affairs of wizards, for they are subtle and quick to anger.") print(encoded_input) ``` -------------------------------- ### Download Aishell-1 Dataset Source: https://github.com/ascend/docs/blob/main/sources/wenet/quick_start.rst Execute the script to download the aishell-1 dataset. If data is already downloaded, adjust the \"$data\" variable in the script and start from the next stage. ```shell bash run_npu.sh --stage -1 --stop_stage -1 ``` -------------------------------- ### Download Entire Model Snapshot with Hugging Face Hub Source: https://github.com/ascend/docs/blob/main/sources/transformers/modeldownload.rst Use the `snapshot_download` function from the huggingface_hub library to download all files for a model repository. Specify the repository ID and a local cache directory. ```python from huggingface_hub import snapshot_download snapshot_download(repo_id="Qwen/Qwen2-7B-Instruct", cache_dir="./your/path/Qwen") ``` -------------------------------- ### Clone Model Repository with Git LFS Source: https://github.com/ascend/docs/blob/main/sources/transformers/modeldownload.rst Clone a model repository using Git LFS. Ensure Git LFS is installed. This command downloads all model files, including large ones. ```shell git lfs install git clone https://hf-mirror.com/Qwen/Qwen2-7B-Instruct ``` -------------------------------- ### Run llama.cpp Docker Container Source: https://github.com/ascend/docs/blob/main/sources/llama_cpp/install.rst Launch a Docker container for llama.cpp, mapping Ascend devices and necessary host directories. Mount your models directory to '/app/models' and specify the model path and GPU layers. ```shell docker run --name llamacpp \ --device /dev/davinci0 \ --device /dev/davinci_manager \ --device /dev/devmm_svm \ --device /dev/hisi_hdc \ -v /usr/local/dcmi:/usr/local/dcmi \ -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \ -v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \ -v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \ -v /PATH_TO_YOUR_MODELS/:/app/models \ -it llama-cpp-cann -m /app/models/MODEL_PATH -ngl 32 \ -p "Building a website can be done in 10 simple steps:" ``` -------------------------------- ### Download Model from Hugging Face CLI Source: https://github.com/ascend/docs/blob/main/sources/transformers/modeldownload.rst Use the Hugging Face CLI to download a specific model, including its original files. Ensure you have the necessary permissions after accepting the license. ```shell huggingface-cli download meta-llama/Meta-Llama-3-8B-Instruct --include “original/*” --local-dir meta-llama/Meta-Llama-3-8B-Instruct ``` -------------------------------- ### Launch SGLang Server with Ascend Backend Source: https://github.com/ascend/docs/blob/main/sources/sglang/install.rst Execute the SGLang server within a Docker container, specifying the image, environment variables, model path, and Ascend attention backend. The server listens on all interfaces. ```bash drun --env "HF_TOKEN=" \ \ python3 -m sglang.launch_server --model-path meta-llama/Llama-3.1-8B-Instruct --attention-backend ascend --host 0.0.0.0 --port 30000 ``` -------------------------------- ### Tokenize Dataset Function Source: https://github.com/ascend/docs/blob/main/sources/transformers/fine-tune.rst Define a function to tokenize the dataset using the initialized tokenizer. This function applies padding and truncation to ensure consistent input format for the model. The `batched=True` argument processes multiple examples simultaneously for efficiency. ```python def tokenize_function(examples): return tokenizer(examples["text"], padding="max_length", truncation=True) ``` -------------------------------- ### Single/Distributed Training with timm Source: https://github.com/ascend/docs/blob/main/sources/timm/quick_start.rst Launch training for timm-based image classification models on single or multi-NPU devices. Specify the number of NPUs, model, dataset path, and training parameters. ```shell num_npus=1 ./distributed_train.sh $num_npus path/to/dataset/ImageNet-1000 \ --device npu \ --model seresnet34 \ --sched cosine \ --epochs 150 \ --warmup-epochs 5 \ --lr 0.4 \ --reprob 0.5 \ --remode pixel \ --batch-size 256 \ --amp -j 4 ``` -------------------------------- ### Check Ascend Device Information Source: https://github.com/ascend/docs/blob/main/sources/llama_cpp/install.rst Run this command within the Ascend environment or container to display information about the available NPUs. ```shell npu-smi info ``` -------------------------------- ### FSDP Training Configuration Source: https://github.com/ascend/docs/blob/main/sources/torchtitan/quick_start.rst Configure for Fully Sharded Data Parallel (FSDP) training. data_parallel_shard_degree = -1 indicates default FSDP. ```shell data_parallel_replicate_degree = 1 data_parallel_shard_degree = -1 #-1为默认FSDP tensor_parallel_degree = 1 pipeline_parallel_degree = 1 context_parallel_degree = 1 ```