### Clone Trinity-RFT Repository Source: https://modelscope.github.io/Trinity-RFT/en/main/_sources/tutorial/trinity_installation Clones the Trinity-RFT GitHub repository and navigates into the project directory. This is the first step for installing from source. ```bash git clone https://github.com/modelscope/Trinity-RFT cd Trinity-RFT ``` -------------------------------- ### Clone Trinity-RFT Repository Source: https://modelscope.github.io/Trinity-RFT/en/main/tutorial/trinity_installation Clones the Trinity-RFT repository from GitHub and navigates into the project directory. This is the first step for installing from source. ```bash git clone https://github.com/modelscope/Trinity-RFT cd Trinity-RFT ``` -------------------------------- ### Install Trinity-RFT using uv Source: https://modelscope.github.io/Trinity-RFT/en/main/_sources/tutorial/trinity_installation Installs Trinity-RFT and its development/flash-attention dependencies using the 'uv' package installer. This is an alternative to pip for faster dependency management. ```bash uv sync --extra dev --extra flash_attn ``` -------------------------------- ### Install Trinity-RFT via PyPI Source: https://modelscope.github.io/Trinity-RFT/en/main/_sources/tutorial/trinity_installation Installs a specific version of Trinity-RFT (0.3.1) and flash-attn (2.8.1) directly from PyPI using pip. Suitable for users who only need to use the package. ```bash pip install trinity-rft==0.3.1 pip install flash-attn==2.8.1 ``` -------------------------------- ### Install Trinity-RFT and Flash-Attention with uv via PyPI Source: https://modelscope.github.io/Trinity-RFT/en/main/tutorial/trinity_installation Installs specific versions of Trinity-RFT and Flash-Attention using the 'uv' package installer. This offers a faster and more efficient way to manage PyPI dependencies. ```bash uv pip install trinity-rft==0.3.1 uv pip install flash-attn==2.8.1 ``` -------------------------------- ### Set Up Virtual Environment with venv Source: https://modelscope.github.io/Trinity-RFT/en/main/_sources/tutorial/trinity_installation Creates and activates a Python virtual environment using venv with Python 3.10, then installs Trinity-RFT with development and flash-attention dependencies. Includes a workaround for potential flash-attn installation issues. ```bash python3.10 -m venv .venv source .venv/bin/activate pip install -e ".[dev]" pip install -e ".[flash_attn]" # if you encounter issues when installing flash-attn, try: # pip install flash-attn==2.8.1 --no-build-isolation ``` -------------------------------- ### Install Trinity-RFT and Flash-Attention via PyPI Source: https://modelscope.github.io/Trinity-RFT/en/main/tutorial/trinity_installation Installs the specified version of Trinity-RFT and Flash-Attention directly from PyPI using pip. This method is suitable for users who do not need to modify the source code. ```bash pip install trinity-rft==0.3.1 pip install flash-attn==2.8.1 ``` -------------------------------- ### Install Trinity-RFT via PyPI with uv Source: https://modelscope.github.io/Trinity-RFT/en/main/_sources/tutorial/trinity_installation Installs a specific version of Trinity-RFT (0.3.1) and flash-attn (2.8.1) using the 'uv' package installer. This method is an alternative to pip for installing from PyPI. ```bash uv pip install trinity-rft==0.3.1 uv pip install flash-attn==2.8.1 ``` -------------------------------- ### Prepare Environments and Start Data Processor Servers Source: https://modelscope.github.io/Trinity-RFT/en/main/_sources/tutorial/example_data_functionalities Shell commands to set up the necessary split environments for Trinity-RFT and the Data-Juicer-based data processor, followed by starting all the split servers, including the data processor server. ```shell # prepare split environments, including the one of data processor python scripts/install.py # start all split servers python scripts/start_servers.py ``` -------------------------------- ### ALFWorld Environment Preparation Commands Source: https://modelscope.github.io/Trinity-RFT/en/main/_sources/tutorial/example_step_wise Provides commands to install the ALFWorld environment, export necessary data paths, and download the environment files. This setup is crucial for running ALFWorld-based experiments. ```bash pip install alfworld[full] export ALFWORLD_DATA=/path/to/alfworld/data alfworld-download ``` -------------------------------- ### Set Up venv Virtual Environment for Trinity-RFT Source: https://modelscope.github.io/Trinity-RFT/en/main/tutorial/trinity_installation Creates a Python virtual environment using venv with Python 3.10, activates it, and installs Trinity-RFT with development and flash-attn dependencies. Provides an alternative installation command for flash-attn if issues arise. ```bash python3.10 -m venv .venv source .venv/bin/activate pip install -e ".[dev]" pip install -e ".[flash_attn]" # if you encounter issues when installing flash-attn, try: # pip install flash-attn==2.8.1 --no-build-isolation ``` -------------------------------- ### Trinity-RFT GRPO Configuration Example Source: https://modelscope.github.io/Trinity-RFT/en/main/_sources/tutorial/example_reasoning_basic This YAML configuration defines parameters for running a GRPO (likely a reinforcement learning algorithm) experiment with Trinity-RFT. It includes settings for the project, model, optimizer, data buffers, and training/evaluation intervals. Ensure paths and names are correctly set for your environment. ```yaml project: name: checkpoint_root_dir: ${oc.env:TRINITY_CHECKPOINT_ROOT_DIR,./checkpoints} algorithm: algorithm_type: grpo repeat_times: 8 optimizer: lr: 1e-5 model: model_path: ${oc.env:TRINITY_MODEL_PATH,Qwen/Qwen2.5-1.5B-Instruct} cluster: node_num: 1 gpu_per_node: 2 buffer: total_epochs: 1 batch_size: 128 explorer_input: taskset: name: gsm8k storage_type: file path: 'openai/gsm8k' subset_name: 'main' split: 'train' format: prompt_key: 'question' response_key: 'answer' rollout_args: temperature: 1.0 default_workflow_type: 'math_workflow' eval_tasksets: - name: gsm8k-eval storage_type: file path: 'openai/gsm8k' subset_name: 'main' split: 'test' format: prompt_key: 'question' response_key: 'answer' default_workflow_type: 'math_workflow' trainer_input: experience_buffer: name: gsm8k_buffer storage_type: queue path: 'sqlite:///gsm8k.db' explorer: eval_interval: 50 runner_per_model: 16 rollout_model: engine_num: 1 synchronizer: sync_method: 'nccl' sync_interval: 1 trainer: save_interval: 100 ``` -------------------------------- ### Run Trinity-RFT Experiment Source: https://modelscope.github.io/Trinity-RFT/en/main/_sources/tutorial/example_reasoning_basic This command executes a Trinity-RFT experiment using a specified configuration file. Ensure the `trinity` command is available in your PATH and the provided configuration file path is correct. ```bash trinity run --config examples/grpo_gsm8k/gsm8k.yaml ``` -------------------------------- ### Set Up Virtual Environment with Conda Source: https://modelscope.github.io/Trinity-RFT/en/main/_sources/tutorial/trinity_installation Creates and activates a Conda virtual environment named 'trinity' with Python 3.10, then installs Trinity-RFT with development and flash-attention dependencies. Includes a workaround for potential flash-attn installation issues. ```bash conda create -n trinity python=3.10 conda activate trinity pip install -e ".[dev]" pip install -e ".[flash_attn]" # if you encounter issues when installing flash-attn, try: # pip install flash-attn==2.8.1 --no-build-isolation ``` -------------------------------- ### Download Models using Modelscope and Huggingface Source: https://modelscope.github.io/Trinity-RFT/en/main/_sources/tutorial/example_reasoning_basic This snippet demonstrates how to download language models locally using either the Modelscope CLI or Huggingface CLI. It specifies the model name and the local directory for storage. Ensure you have the respective CLIs installed and configured. ```bash # Using Modelscope modelscope download Qwen/Qwen2.5-1.5B-Instruct --local_dir $MODEL_PATH/Qwen2.5-1.5B-Instruct # Using Huggingface huggingface-cli download Qwen/Qwen2.5-1.5B-Instruct --local-dir $MODEL_PATH/Qwen2.5-1.5B-Instruct ``` -------------------------------- ### Build and Run Trinity-RFT Docker Image Source: https://modelscope.github.io/Trinity-RFT/en/main/tutorial/trinity_installation Builds a Docker image for Trinity-RFT from the provided Dockerfile and then runs the container with GPU access and shared memory. It mounts the current directory and a specified data path into the container. ```bash git clone https://github.com/modelscope/Trinity-RFT cd Trinity-RFT # Build the Docker image ## Tip: You can modify the Dockerfile to add mirrors or set API keys docker build -f scripts/docker/Dockerfile -t trinity-rft:latest . # Run the container, replacing with your actual path docker run -it \ --gpus all \ --shm-size="64g" \ --rm \ -v $PWD:/workspace \ -v :/data \ trinity-rft:latest ``` -------------------------------- ### Download Datasets using Modelscope and Huggingface Source: https://modelscope.github.io/Trinity-RFT/en/main/_sources/tutorial/example_reasoning_basic This snippet shows how to download datasets like GSM8K to a local directory using Modelscope or Huggingface CLIs. It includes commands for specifying the dataset and the target local path. Ensure the CLIs are installed and configured. ```bash # Using Modelscope modelscope download --dataset modelscope/gsm8k --local_dir $DATASET_PATH/gsm8k # Using Huggingface huggingface-cli download openai/gsm8k --repo-type dataset --local-dir $DATASET_PATH/gsm8k ``` -------------------------------- ### Install Trinity-RFT with uv Package Manager Source: https://modelscope.github.io/Trinity-RFT/en/main/tutorial/trinity_installation Installs Trinity-RFT and its 'dev' and 'flash_attn' extras using the 'uv' package installer. This provides a modern alternative for managing Python dependencies. ```bash uv sync --extra dev --extra flash_attn ``` -------------------------------- ### Install Megatron-LM Support (Bash) Source: https://modelscope.github.io/Trinity-RFT/en/main/_sources/tutorial/example_megatron Installs the project with Megatron-LM support in editable mode using pip. This is the primary method for setting up the Megatron backend. It also shows how to install NVIDIA's Apex library, which is crucial for mixed-precision training with Megatron-LM. ```bash pip install -e ".[megatron]" # for uv # uv sync -extra megatron pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation \ --config-settings "--build-option=--cpp_ext" \ --config-settings "--build-option=--cuda_ext" \ --resume-retries 10 git+https://github.com/NVIDIA/apex.git ``` -------------------------------- ### Build and Run Trinity-RFT Docker Image Source: https://modelscope.github.io/Trinity-RFT/en/main/_sources/tutorial/trinity_installation Builds a Docker image for Trinity-RFT from its Dockerfile and runs the container with specified GPU and shared memory configurations. It mounts the current directory and a data/checkpoints directory into the container. ```bash git clone https://github.com/modelscope/Trinity-RFT cd Trinity-RFT # Build the Docker image ## Tip: You can modify the Dockerfile to add mirrors or set API keys docker build -f scripts/docker/Dockerfile -t trinity-rft:latest . # Run the container, replacing with your actual path docker run -it \ --gpus all \ --shm-size="64g" \ --rm \ -v $PWD:/workspace \ -v :/data \ trinity-rft:latest ``` -------------------------------- ### Set Up Conda Virtual Environment for Trinity-RFT Source: https://modelscope.github.io/Trinity-RFT/en/main/tutorial/trinity_installation Creates and activates a Conda virtual environment named 'trinity' with Python 3.10. It then installs the Trinity-RFT package with development and flash-attn dependencies. Includes a fallback for flash-attn installation issues. ```bash conda create -n trinity python=3.10 conda activate trinity pip install -e ".[dev]" pip install -e ".[flash_attn]" # if you encounter issues when installing flash-attn, try: # pip install flash-attn==2.8.1 --no-build-isolation ``` -------------------------------- ### Basic Trinity-RFT YAML Configuration Structure Source: https://modelscope.github.io/Trinity-RFT/en/main/_sources/tutorial/trinity_configs Provides an example of a basic Trinity-RFT configuration file, outlining the main sections like project, algorithm, model, cluster, buffer, explorer, trainer, synchronizer, monitor, service, data_processor, log, and stages. This structure is loaded using OmegaConf. ```yaml project: Trinity-RFT name: example mode: both checkpoint_root_dir: ${oc.env:TRINITY_CHECKPOINT_ROOT_DIR,./checkpoints} continue_from_checkpoint: true algorithm: # Algorithm-related parameters ... model: # Model-specific configurations ... cluster: # Cluster node and GPU settings ... buffer: # Data buffer configurations ... explorer: # Explorer-related settings (rollout models, workflow runners) ... trainer: # Trainer-specific parameters ... synchronizer: # Model weight synchronization settings ... monitor: # Monitoring configurations (e.g., WandB, TensorBoard or MLFlow) ... service: # Services to use ... data_processor: # Preprocessing data settings ... log: # Ray actor logging ... stages: # Stages configuration ... ``` -------------------------------- ### Bash Command to Install AgentScope Source: https://modelscope.github.io/Trinity-RFT/en/main/_sources/tutorial/example_react Installs the AgentScope library with a minimum version requirement of 1.0.4 using pip. This is a prerequisite for running the Trinity-RFT example. ```bash pip install agentscope>=1.0.4 ``` -------------------------------- ### Basic Trinity-RFT Configuration Structure (YAML) Source: https://modelscope.github.io/Trinity-RFT/en/main/tutorial/trinity_configs An example of a basic configuration file for Trinity-RFT, showcasing the main sections for different modules. This structure helps organize parameters for various components of the system. ```yaml project: Trinity-RFT name: example mode: both checkpoint_root_dir: ${oc.env:TRINITY_CHECKPOINT_ROOT_DIR,./checkpoints} continue_from_checkpoint: true algorithm: # Algorithm-related parameters ... model: # Model-specific configurations ... cluster: # Cluster node and GPU settings ... buffer: # Data buffer configurations ... explorer: # Explorer-related settings (rollout models, workflow runners) ... trainer: # Trainer-specific parameters ... synchronizer: # Model weight synchronization settings ... monitor: # Monitoring configurations (e.g., WandB, TensorBoard or MLFlow) ... service: # Services to use ... data_processor: # Preprocessing data settings ... log: # Ray actor logging ... stages: # Stages configuration ... ``` -------------------------------- ### Download Models and Datasets (ModelScope & Huggingface) Source: https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_reasoning_basic This snippet shows how to download models and datasets using both ModelScope and Huggingface CLI. It specifies the model name or dataset path and the local directory for storage. Ensure ModelScope or Huggingface CLI is installed and configured. ```shell # Using Modelscope modelscope download Qwen/Qwen2.5-1.5B-Instruct --local_dir $MODEL_PATH/Qwen2.5-1.5B-Instruct # Using Huggingface huggingface-cli download Qwen/Qwen2.5-1.5B-Instruct --local-dir $MODEL_PATH/Qwen2.5-1.5B-Instruct # Using Modelscope modelscope download --dataset modelscope/gsm8k --local_dir $DATASET_PATH/gsm8k # Using Huggingface huggingface-cli download openai/gsm8k --repo-type dataset --local-dir $DATASET_PATH/gsm8k ``` -------------------------------- ### Megatron-LM Configuration Example (YAML) Source: https://modelscope.github.io/Trinity-RFT/en/main/_sources/tutorial/example_megatron Provides an example YAML configuration snippet for setting up Megatron-LM parameters within the Trinity-RFT framework. It covers model parallelism settings, offloading options, MBridge usage, distributed checkpointing, and recomputation configurations for actor, reference, and critic models. ```yaml actor_rollout_ref: ... actor: strategy: megatron # Kept for backward compatibility megatron: # Model parallelism settings tensor_model_parallel_size: 2 pipeline_model_parallel_size: 1 expert_model_parallel_size: 1 # Offloading (set to false unless you're memory-constrained) param_offload: false grad_offload: false optimizer_offload: false # Use mBridge for parameter import/export (optional) use_mbridge: false # Use Megatron checkpoint use_dist_checkpointing: false dist_checkpointing_path: null # Recomputation settings (helps save memory during training) override_transformer_config: recompute_granularity: full recompute_method: uniform recompute_num_layers: 1 ... ref: megatron: tensor_model_parallel_size: 2 pipeline_model_parallel_size: 1 expert_model_parallel_size: 1 param_offload: false grad_offload: false optimizer_offload: false use_mbridge: false use_dist_checkpointing: false dist_checkpointing_path: null override_transformer_config: recompute_granularity: full recompute_method: uniform recompute_num_layers: 1 ... critic: strategy: megatron megatron: tensor_model_parallel_size: 2 pipeline_model_parallel_size: 1 expert_model_parallel_size: 1 param_offload: false grad_offload: false optimizer_offload: false use_mbridge: false use_dist_checkpointing: false dist_checkpointing_path: null override_transformer_config: recompute_granularity: full recompute_method: uniform recompute_num_layers: 1 ... ``` -------------------------------- ### Install and Start Trinity-RFT Servers Source: https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_data_functionalities These bash scripts are used to prepare the Trinity-RFT environment, including setting up split environments for the data processor. The `start_servers.py` script then launches all necessary split servers, including the data processor server within its Data-Juicer environment. ```bash # prepare split environments, including the one of data processor python scripts/install.py # start all split servers python scripts/start_servers.py ``` -------------------------------- ### Example Trinity-RFT Taskset Configuration (YAML) Source: https://modelscope.github.io/Trinity-RFT/en/main/_sources/tutorial/develop_workflow An example YAML configuration snippet for setting up the task dataset in Trinity-RFT. It specifies the default workflow, dataset path, prompt/response keys for formatting, and rollout arguments like temperature. ```yaml # some config buffer: explorer_input: taskset: default_workflow: "math_workflow" path: ${oc.env:TRINITY_TASKSET_PATH} format: prompt_key: "question" response_key: "answer" rollout_args: temperature: 1.0 # some other configs ``` -------------------------------- ### Trinity-RFT Configuration with SFT Warmup Source: https://modelscope.github.io/Trinity-RFT/en/main/_sources/tutorial/example_reasoning_basic This YAML snippet shows how to add a Supervised Fine-Tuning (SFT) warmup stage before the main Reinforcement Fine-Tuning (RFT) process in Trinity-RFT. It defines the SFT stage, including the dataset path and training steps. This configuration should be merged into the main `gsm8k.yaml` file. ```yaml # Properly add the following configs in gsm8k.yaml stages: - stage_name: sft_warmup mode: train algorithm: algorithm_type: sft buffer: train_batch_size: 128 total_steps: 10 trainer_input: experience_buffer: name: sft_warmup_dataset path: /PATH/TO/YOUR/SFT/DATASET - stage_name: rft # leave empty to use the original configs for RFT ``` -------------------------------- ### Start Ray Cluster and Run RFT Process Source: https://modelscope.github.io/Trinity-RFT/en/main/_sources/tutorial/example_data_functionalities Commands to initiate a Ray cluster (master and worker nodes) and subsequently execute the Trinity-RFT process using a specified configuration file. This is the initial step for data preparation and exploration. ```shell # start the ray cluster # on master node ray start --head # on worker nodes ray start --address= # run RFT trinity run --config ``` -------------------------------- ### Install Megatron-LM Support with Pip Source: https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_megatron Installs the project with Megatron-LM support in editable mode using pip. This is the primary method for enabling Megatron-LM functionality. It may involve installing additional dependencies like Apex. ```bash pip install -e ".[megatron]" # for uv # uv sync -extra megatron ``` ```bash pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation \ --config-settings "--build-option=--cpp_ext" \ --config-settings "--build-option=--cuda_ext" \ --resume-retries 10 git+https://github.com/NVIDIA/apex.git ``` -------------------------------- ### Bash Commands to Download Models and Dataset Source: https://modelscope.github.io/Trinity-RFT/en/main/_sources/tutorial/example_react Downloads the Qwen3-8B model from Hugging Face and the gsm8k dataset. These are necessary resources for running the provided example. ```bash huggingface-cli download Qwen/Qwen3-8B huggingface-cli download openai/gsm8k --repo-type dataset ``` -------------------------------- ### Webshop Workflow Example Prompt - Python Source: https://modelscope.github.io/Trinity-RFT/en/main/_modules/trinity/common/workflows/envs/webshop/webshop_workflow Demonstrates a step-by-step interaction within a text-based webshop environment. It shows how an agent should interpret observations and take actions like searching and clicking to find and purchase a product based on specific criteria. ```python # -*- coding: utf-8 -*- from typing import List, Optional from trinity.common.experience import Experience from trinity.common.models.model import ModelWrapper from trinity.common.workflows.workflow import WORKFLOWS, MultiTurnWorkflow, Task SPARSE_REWARD = False EXAMPLE_PROMPT = """ Observation: Webshop Instruction: i would like a 3 ounce bottle of bright citrus deodorant for sensitive skin, and price lower than 50.00 dollars [button] Search [button_] Response: OK, let's search for 3 ounce bright citrus deodorant sensitive skinsearch[3 ounce bright citrus deodorant sensitive skin] Observation: Instruction: i would like a 3 ounce bottle of bright citrus deodorant for sensitive skin, and price lower than 50.00 dollars [button] Back to Search [button_] Page 1 (Total results: 50) [button] Next > [button_] [button] B078GWRC1J [button_] Bright Citrus Deodorant by Earth Mama | Natural and Safe for Sensitive Skin, Pregnancy and Breastfeeding, Contains Organic Calendula 3-Ounce $10.99 [button] B078GTKVXY [button_] Ginger Fresh Deodorant by Earth Mama | Natural and Safe for Sensitive Skin, Pregnancy and Breastfeeding, Contains Organic Calendula 3-Ounce $10.99 [button] B08KBVJ4XN [button_] Barrel and Oak - Aluminum-Free Deodorant, Deodorant for Men, Essential Oil-Based Scent, 24-Hour Odor Protection, Cedar & Patchouli Blend, Gentle on Sensitive Skin (Mountain Sage, 2.7 oz, 2-Pack) $15.95 Response: button B078GWRC1J and B078GTKVXY are bright citrus deodorant less then 50 dollars. I can check B078GWRC1J firstclick[B078GWRC1J] Observation: Instruction: i would like a 3 ounce bottle of bright citrus deodorant for sensitive skin, and price lower than 50.00 dollars [button] Back to Search [button_] [button] < Prev [button_] scent [button] assorted scents [button_][button] bright citrus [button_][button] calming lavender [button_][button] ginger fresh [button_][button] simply non-scents [button_] size [button] travel set (4-pack) [button_][button] 3 ounce (pack of 1) [button_][button] 3-ounce (2-pack) [button_] Bright Citrus Deodorant by Earth Mama | Natural and Safe for Sensitive Skin, Pregnancy and Breastfeeding, Contains Organic Calendula 3-Ounce Price: $10.99 Rating: N.A. [button] Description [button_] [button] Features [button_] [button] Reviews [button_] [button] Buy Now [button_] Response: For 3 ounce bottle of bright citrus deodorant for sensitive skin, the item has options 'bright citrus' and '3 ounce (pack of 1)' and seems good to buy. click[bright citrus] Observation: You have clicked bright citrus. ... Response: Now I should select the 3 ounce (pack of 1) optionclick[3 ounce (pack of 1)] Observation: You have clicked 3 ounce (pack of 1). ... Response: I can buy the itemclick[Buy Now] "" ``` -------------------------------- ### Webshop System Prompt with Example - Python Source: https://modelscope.github.io/Trinity-RFT/en/main/_modules/trinity/common/workflows/envs/webshop/webshop_workflow Defines the system prompt for an agent interacting with a text-based webshop environment. It includes instructions on action formats (search, click), an example interaction, and rules for task completion, emphasizing thinking and action steps within a step limit. ```python WebShop_SYSTEM_PROMPT_WITH_EXAMPLE = f""" You are an agent interacting with a virtual text-based web shopping environment to test out your ability. Your job is to find follow the Instruction provided and mimic the steps to buy the item that are closest to the Instruct provided. ## Action Format: You should give both the action_name and action_arg like the format `action_name[action_arg]`. You can execute two types of actions, search and click. - When the button `[button] Search [button_]` is available in the current observation, you can execute the action search[xxx] (you should type the query you want to search in the square brackets here). - You can click buttons `[button] xxx [button_]` that is available in the current observation, by execute the action click[xxx]. Below are some examples of action formats. - search[white shoes] - click[Buy Now] ## Example: Here is an example: ``` {EXAMPLE_PROMPT} ``` ## Notes: At each step, you should first think then perform action to fulfill the instruction. You should ALWAYS wrap your thinking with the tag and wrap your action with the tag. You should ALWAYS take one action each step. You should finish the task and buy the item within 15 steps. DONOT try to interact with the user at anytime. Finish the task and buy the item by yourself. """ ``` -------------------------------- ### Install Data Juicer Dependencies Source: https://modelscope.github.io/Trinity-RFT/en/main/_sources/tutorial/example_data_functionalities Installs the necessary dependencies for the data split, enabling Data-Juicer operators within Trinity-RFT. This command is crucial for activating the data processor. ```shell pip install -e ".[data]" ``` -------------------------------- ### Operator Configuration Example (YAML) Source: https://modelscope.github.io/Trinity-RFT/en/main/_sources/tutorial/develop_operator An example YAML configuration snippet demonstrating how to integrate a custom operator, 'reward_filter', into the experience pipeline. It specifies the operator name and its arguments, along with settings for dynamic synchronization. ```yaml # some other configs data_processor: experience_pipeline: operators: - name: "reward_filter" args: threshold: 0.1 synchronizer: sync_method: nccl sync_style: dynamic_by_explorer sync_interval: 2 # some other configs ``` -------------------------------- ### Download Qwen2.5 Model using Modelscope and Huggingface CLI Source: https://modelscope.github.io/Trinity-RFT/en/main/_sources/tutorial/example_dpo This snippet demonstrates how to download the Qwen2.5-1.5B-Instruct model to a local directory using either the Modelscope CLI or the Huggingface CLI. Ensure the respective CLIs are installed and authenticated. ```shell # Using Modelscope modelscope download Qwen/Qwen2.5-1.5B-Instruct --local-dir $MODEL_PATH/Qwen2.5-1.5B-Instruct # Using Huggingface huggingface-cli download Qwen/Qwen2.5-1.5B-Instruct --local-dir $MODEL_PATH/Qwen2.5-1.5B-Instruct ``` -------------------------------- ### Initialize MLflow Monitor Source: https://modelscope.github.io/Trinity-RFT/en/main/_modules/trinity/utils/monitor Initializes the MlflowMonitor with project, group, name, and role. It sets up MLflow tracking, logs parameters, and starts a new run. Requires MLflow to be installed and configured. ```python def __init__( self, project: str, group: str, name: str, role: str, config: Config = None, ) -> None: assert ( mlflow is not None ), "mlflow is not installed. Please install it to use MlflowMonitor." monitor_args = config.monitor.monitor_args or {} if username := monitor_args.get("username"): os.environ["MLFLOW_TRACKING_USERNAME"] = username if password := monitor_args.get("password"): os.environ["MLFLOW_TRACKING_PASSWORD"] = password mlflow.set_tracking_uri(config.monitor.monitor_args.get("uri", "http://localhost:5000")) mlflow.set_experiment(project) mlflow.start_run( run_name=f"{name}_{role}", tags={ "group": group, "role": role, }, ) mlflow.log_params(config.flatten()) self.console_logger = get_logger(__name__, in_ray_actor=True) ``` -------------------------------- ### SFT Configuration Example in YAML Source: https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_dpo This YAML snippet configures a Supervised Fine-Tuning (SFT) experiment. It defines the algorithm type as 'sft', specifies the model path, cluster settings, and buffer input format suitable for message-based datasets. The dataset path and format keys are crucial inputs. ```yaml project: name: mode: train algorithm: algorithm_type: sft checkpoint_root_dir: ${oc.env:TRINITY_CHECKPOINT_ROOT_DIR,./checkpoints} model: model_path: ${oc.env:TRINITY_MODEL_PATH,Qwen/Qwen2.5-1.5B-Instruct} cluster: node_num: 1 gpu_per_node: 2 buffer: total_epochs: 5 train_batch_size: 64 trainer_input: experience_buffer: name: storage_type: file path: $DATASET_PATH/Mixture-of-Thoughts split: train format: prompt_type: messages messages_key: messages trainer: save_interval: 50 trainer_config: ... # omitted here for simplicity ``` -------------------------------- ### Running Asynchronous RFT Example (Bash) Source: https://modelscope.github.io/Trinity-RFT/en/main/_sources/tutorial/example_async_mode Shell command to execute the asynchronous RFT example. This script typically orchestrates the launch of the explorer and trainer processes using the previously defined configuration files. ```bash bash examples/async_gsm8k/run.sh ``` -------------------------------- ### Running SFT Experiment with Trinity CLI Source: https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_dpo This command starts a Supervised Fine-Tuning (SFT) experiment via the Trinity CLI. It points to the specified SFT configuration YAML file, enabling the fine-tuning process. ```bash trinity run --config examples/sft_mot/sft.yaml ``` -------------------------------- ### Bash Command to Start Trinity-RFT Training Source: https://modelscope.github.io/Trinity-RFT/en/main/_sources/tutorial/example_react Navigates to the Trinity-RFT root directory and starts the training task using a specified configuration file (gsm8k.yaml) for the GSM8k dataset. This command initiates the agent training process. ```bash # Navigate to the Trinity-RFT root directory cd /path/to/Trinity-RFT # Run the training for GSM8k dataset: trinity run --config examples/agentscope_react/gsm8k.yaml ``` -------------------------------- ### Implement Trinity Workflow Run Method Source: https://modelscope.github.io/Trinity-RFT/en/main/_sources/tutorial/develop_workflow Provides an example of implementing the 'run' method for a workflow, which is responsible for generating responses and calculating rewards. It shows how to interact with the model to get responses and then construct a list of Experience objects. Dependencies include trinity.common.workflows.Experience. ```python from trinity.common.workflows.experience import Experience from typing import List class ExampleWorkflow(Workflow): # the __init__ function (assumed to be defined elsewhere) def calculate_reward(self, response: str, truth: str) -> float: if response == truth: return 1.0 else: return 0.0 def run(self) -> List[Experience]: # call the model to generate multiple responses responses = self.model.chat( [ { "role": "user", "content": f"Question:\n{self.question}", } ], temperature=self.rollout_args.temperature, ) response = responses[0] # there is only one response reward: float = self.calculate_reward(response.response_text, self.answer) return [ Experience( tokens=response.tokens, prompt_length=response.prompt_length, reward=reward, logprobs=response.logprobs, ) ] ``` -------------------------------- ### Run Trinity-RFT Experiment via CLI Source: https://modelscope.github.io/Trinity-RFT/en/main/_sources/tutorial/example_dpo These shell commands demonstrate how to initiate DPO and SFT experiments using the Trinity-RFT framework. The 'trinity run' command is used, with the '--config' flag pointing to the respective YAML configuration file for each experiment type. ```shell trinity run --config examples/dpo_humanlike/dpo.yaml ``` ```shell trinity run --config examples/sft_mot/sft.yaml ``` -------------------------------- ### Download DPO Dataset (Human-Like-DPO-Dataset) Source: https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_dpo Download the Human-Like-DPO-Dataset for DPO training using ModelScope or Huggingface CLI. The dataset is expected to have 'prompt', 'chosen', and 'rejected' keys. Ensure `$DATASET_PATH` is set. ```shell # Using Modelscope modelscope download --dataset HumanLLMs/Human-Like-DPO-Dataset --local_dir $DATASET_PATH/human_like_dpo_dataset # Using Huggingface huggingface-cli download HumanLLMs/Human-Like-DPO-Dataset --repo-type dataset --local-dir $DATASET_PATH/human_like_dpo_dataset ``` -------------------------------- ### Initialize AgentScope V1 React Search Workflow Source: https://modelscope.github.io/Trinity-RFT/en/main/_modules/trinity/common/workflows/envs/agentscope/agentscopev1_search_workflow Initializes the AgentScopeV1ReactSearchWorkflow, setting up the AgentScope models and formatters. It requires a Task, a ModelWrapper, and optionally auxiliary OpenAI clients. Includes error handling for missing AgentScope installation. ```python import os import re from typing import List, Optional import openai from trinity.common.models.model import ModelWrapper from trinity.common.workflows.workflow import WORKFLOWS, Task, Workflow @WORKFLOWS.register_module("agentscope_v1_react_search_workflow") class AgentScopeV1ReactSearchWorkflow(Workflow): """ This workflow serves as an example of how to use the agentscope framework within the trinity workflow. """ can_reset: bool = True is_async: bool = True def __init__( self, *, task: Task, model: ModelWrapper, auxiliary_models: Optional[List[openai.OpenAI]] = None, ): super().__init__( task=task, model=model, auxiliary_models=auxiliary_models, ) try: from agentscope.formatter import OpenAIChatFormatter from agentscope.model import OpenAIChatModel except ImportError as e: error_message = f"AgentScope is not installed. Please install the agentscope framework first before running the workflow. Error: {str(e)}" self.logger.error(error_message) raise ImportError(error_message) self.openai_async_client = model.get_openai_async_client() self.model_name = self.openai_async_client.model_path self.agent_model = OpenAIChatModel( api_key="EMPTY", model_name=self.model_name, stream=False, generate_kwargs={ "temperature": self.task.rollout_args.temperature, "max_tokens": self.task.rollout_args.max_tokens or 4096, }, ) self.agent_model.client = self.openai_async_client self.agent_model_formatter = OpenAIChatFormatter() self.reset(task) ``` -------------------------------- ### Build Docker Image for Megatron-LM (Bash) Source: https://modelscope.github.io/Trinity-RFT/en/main/_sources/tutorial/example_megatron Builds a Docker image specifically for Megatron-LM training within the Trinity-RFT project. This command uses a provided Dockerfile to create a self-contained environment, simplifying dependency management and ensuring consistent training setups. ```bash docker build -f scripts/docker_for_megatron/Dockerfile -t trinity-rft-megatron:latest . ``` -------------------------------- ### Configure Data Processor for Task Pipeline (YAML) Source: https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_data_functionalities Example YAML configuration for the Trinity-RFT data processor, focusing on the task pipeline. It specifies the number of processes, operators (like 'llm_difficulty_score_filter'), input files, and target fields. It also shows how to enable the automatic start of the Data-Juicer service. ```yaml data_processor: # task pipeline related task_pipeline: num_process: 32 operators: - name: "llm_difficulty_score_filter" args: api_or_hf_model: "qwen2.5-7b-instruct" min_score: 0.0 input_keys: ["question", "answer"] field_names: ["Question", "Answer"] inputs: # the output will be set to the explorer input automatically - ${oc.env:TRINITY_TASKSET_PATH} target_fields: ["question", "answer"] service: data_juicer: auto_start: true ``` -------------------------------- ### DPO Configuration Example in YAML Source: https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_dpo This YAML snippet shows the configuration for a Direct Preference Optimization (DPO) experiment. It specifies the experiment mode, algorithm type, dataset path and format, and model details. Dependencies include the Trinity RFT environment and specified dataset paths. ```yaml project: name: mode: train algorithm: algorithm_type: dpo kl_loss_fn: k1 kl_loss_fn_args: kl_coef: 0.1 # value of beta in DPO checkpoint_root_dir: ${oc.env:TRINITY_CHECKPOINT_ROOT_DIR,./checkpoints} model: model_path: ${oc.env:TRINITY_MODEL_PATH,Qwen/Qwen2.5-1.5B-Instruct} cluster: node_num: 1 gpu_per_node: 8 buffer: total_epochs: 2 train_batch_size: 64 trainer_input: experience_buffer: name: human_like_dpo storage_type: file path: $DATASET_PATH/human_like_dpo_dataset format: prompt_type: plaintext prompt_key: prompt chosen_key: chosen rejected_key: rejected trainer: save_interval: 30 trainer_config: ... # omitted here for simplicity ``` -------------------------------- ### Configure Trinity-RFT Data Processor (YAML) Source: https://modelscope.github.io/Trinity-RFT/en/main/_sources/tutorial/example_data_functionalities Example YAML configuration for Trinity-RFT's data processor, focusing on the task pipeline. It defines the number of processes, operators like 'llm_difficulty_score_filter', input sources, and target fields. It also shows how to enable automatic Data-Juicer service startup. ```yaml data_processor: # task pipeline related task_pipeline: num_process: 32 operators: - name: "llm_difficulty_score_filter" args: api_or_hf_model: "qwen2.5-7b-instruct" min_score: 0.0 input_keys: ["question", "answer"] field_names: ["Question", "Answer"] inputs: # the output will be set to the explorer input automatically - ${oc.env:TRINITY_TASKSET_PATH} target_fields: ["question", "answer"] service: data_juicer: auto_start: true ``` -------------------------------- ### WebShop Workflow Class Initialization (Python) Source: https://modelscope.github.io/Trinity-RFT/en/main/_modules/trinity/common/workflows/envs/webshop/webshop_workflow Initializes the WebShopWorkflow class, setting up maximum environment steps, task description, and optionally resetting the task. It also handles the import and creation of the WebAgentTextEnv gym environment. ```python @WORKFLOWS.register_module("webshop_workflow") class WebShopWorkflow(MultiTurnWorkflow): """A workflow for webshop task.""" can_reset: bool = True is_async: bool = True def __init__( self, model: ModelWrapper, task: Task, auxiliary_models: Optional[List] = None, ): super().__init__( model=model, task=task, ) self.max_env_steps = 15 self.reset(task) # TODO: Make parallel envs try: import gym from web_agent_site.envs import WebAgentTextEnv # noqa: F401 except Exception as e: print("Please make sure you have installed the web_agent_site package.") error_message = f"Error importing WebAgentTextEnv {str(e)}. Please make sure you have installed the web_agent_site package, following the instructions in https://github.com/princeton-nlp/WebShop" raise ImportError(error_message) print("Making GYM env") # NOTE: Hosting the env require ~15GB CPU memory. # If you want easier env, you can set the num_products to 1000 or 100000. self.env = gym.make( "WebAgentTextEnv-v0", observation_mode="text_rich", num_products=None, human_goals=True ) ``` -------------------------------- ### Setup Ray Cluster (Python) Source: https://modelscope.github.io/Trinity-RFT/en/main/_modules/trinity/utils/dlc_utils Initializes a Ray cluster, either starting a new one or reusing an existing one, based on DLC environment variables. It handles master and worker node startup commands and performs necessary waits for cluster readiness and worker node joining. Returns the Ray cluster address. ```python import os import subprocess import sys import time import ray from trinity.utils.log import get_logger logger = get_logger(__name__) CLUSTER_ACTOR_NAME = "cluster_status" def get_dlc_env_vars() -> dict: envs = { "RANK": int(os.environ.get("RANK", -1)), # type: ignore "WORLD_SIZE": int(os.environ.get("WORLD_SIZE", -1)), # type: ignore "MASTER_ADDR": os.environ.get("MASTER_ADDR", None), "MASTER_PORT": os.environ.get("MASTER_PORT", None), } for key, value in envs.items(): if value is None or value == -1: logger.error(f"DLC env var `{key}` is not set.") raise ValueError(f"DLC env var `{key}` is not set.") return envs def is_running() -> bool: """Check if ray cluster is running.""" ret = subprocess.run("ray status", shell=True, capture_output=True) return ret.returncode == 0 def wait_for_ray_setup() -> None: while True: if is_running(): break else: logger.info("Waiting for ray cluster to be ready...") time.sleep(1) def wait_for_ray_worker_nodes(world_size: int) -> None: while True: alive_nodes = [node for node in ray.nodes() if node["Alive"]] if len(alive_nodes) >= world_size: break else: logger.info( f"{len(alive_nodes)} nodes have joined so far, waiting for {world_size - len(alive_nodes)} nodes..." ) time.sleep(1) class ClusterStatus: def __init__(self): self.finished = False def finish(self) -> None: self.finished = True def running(self) -> bool: return not self.finished def setup_ray_cluster(namespace: str) -> str: """Setup a ray cluster in DLC environment. This function will start a ray cluster if it is not running, otherwise it will reuse the existing ray cluster. Returns: str: The address of the ray cluster. """ env_vars = get_dlc_env_vars() is_master = env_vars["RANK"] == 0 if is_running(): # reuse existing ray cluster return "auto" else: if is_master: cmd = f"ray start --head --port={env_vars['MASTER_PORT']} --node-ip-address={env_vars['MASTER_ADDR']}" else: cmd = f"ray start --address={env_vars['MASTER_ADDR']}:{env_vars['MASTER_PORT']}" ret = subprocess.run(cmd, shell=True, capture_output=True) logger.info(f"Starting ray cluster: {cmd}") if ret.returncode != 0: logger.error(f"Failed to start ray cluster: {cmd}") logger.error(f"ret.stdout: {ret.stdout!r}") logger.error(f"ret.stderr: {ret.stderr!r}") sys.exit(1) wait_for_ray_setup() time.sleep(5) ray.init( address=f"{env_vars['MASTER_ADDR']}:{env_vars['MASTER_PORT']}", namespace=namespace, ignore_reinit_error=True, ) if is_master: # master wait for worker nodes to join wait_for_ray_worker_nodes(env_vars["WORLD_SIZE"]) ray.shutdown() return f"{env_vars['MASTER_ADDR']}:{env_vars['MASTER_PORT']}" else: # worker wait on the cluster status actor cluster_status = ( ray.remote(ClusterStatus) .options( name=CLUSTER_ACTOR_NAME, namespace=namespace, get_if_exists=True, ) .remote() ) while True: if ray.get(cluster_status.running.remote()): ret = subprocess.run("ray status", shell=True, capture_output=True) print(ret.stdout.decode()) time.sleep(5) else: logger.info("Ray cluster is not running, exiting.") break sys.exit(0) ``` -------------------------------- ### Trainer Input Configuration (YAML) Source: https://modelscope.github.io/Trinity-RFT/en/main/_sources/tutorial/trinity_configs Defines the main experience buffer and optional auxiliary datasets for the trainer. It specifies buffer names, storage types (queue, sql, file), paths, and formatting options for SFT/DPO data. Includes settings for read timeouts and replay buffers. ```yaml buffer: ... trainer_input: experience_buffer: name: countdown_buffer storage_type: queue path: sqlite:///countdown_buffer.db max_read_timeout: 1800 auxiliary_buffers: sft_dataset: name: sft_dataset storage_type: file path: ${oc.env:TRINITY_SFT_DATASET_PATH} format: prompt_key: 'question' response_key: 'answer' other_buffer: ... ```