### Clone Trinity-RFT Repository

Source: https://modelscope.github.io/Trinity-RFT/en/main/_sources/tutorial/trinity_installation

Clones the Trinity-RFT GitHub repository and navigates into the project directory. This is the first step for installing from source.

```bash
git clone https://github.com/modelscope/Trinity-RFT
cd Trinity-RFT
```

--------------------------------

### Clone Trinity-RFT Repository

Source: https://modelscope.github.io/Trinity-RFT/en/main/tutorial/trinity_installation

Clones the Trinity-RFT repository from GitHub and navigates into the project directory. This is the first step for installing from source.

```bash
git clone https://github.com/modelscope/Trinity-RFT
cd Trinity-RFT

```

--------------------------------

### Install Trinity-RFT using uv

Source: https://modelscope.github.io/Trinity-RFT/en/main/_sources/tutorial/trinity_installation

Installs Trinity-RFT and its development/flash-attention dependencies using the 'uv' package installer. This is an alternative to pip for faster dependency management.

```bash
uv sync --extra dev --extra flash_attn
```

--------------------------------

### Install Trinity-RFT via PyPI

Source: https://modelscope.github.io/Trinity-RFT/en/main/_sources/tutorial/trinity_installation

Installs a specific version of Trinity-RFT (0.3.1) and flash-attn (2.8.1) directly from PyPI using pip. Suitable for users who only need to use the package.

```bash
pip install trinity-rft==0.3.1
pip install flash-attn==2.8.1
```

--------------------------------

### Install Trinity-RFT and Flash-Attention with uv via PyPI

Source: https://modelscope.github.io/Trinity-RFT/en/main/tutorial/trinity_installation

Installs specific versions of Trinity-RFT and Flash-Attention using the 'uv' package installer. This offers a faster and more efficient way to manage PyPI dependencies.

```bash
uv pip install trinity-rft==0.3.1
uv pip install flash-attn==2.8.1

```

--------------------------------

### Set Up Virtual Environment with venv

Source: https://modelscope.github.io/Trinity-RFT/en/main/_sources/tutorial/trinity_installation

Creates and activates a Python virtual environment using venv with Python 3.10, then installs Trinity-RFT with development and flash-attention dependencies. Includes a workaround for potential flash-attn installation issues.

```bash
python3.10 -m venv .venv
source .venv/bin/activate

pip install -e ".[dev]"
pip install -e ".[flash_attn]"
# if you encounter issues when installing flash-attn, try:
# pip install flash-attn==2.8.1 --no-build-isolation
```

--------------------------------

### Install Trinity-RFT and Flash-Attention via PyPI

Source: https://modelscope.github.io/Trinity-RFT/en/main/tutorial/trinity_installation

Installs the specified version of Trinity-RFT and Flash-Attention directly from PyPI using pip. This method is suitable for users who do not need to modify the source code.

```bash
pip install trinity-rft==0.3.1
pip install flash-attn==2.8.1

```

--------------------------------

### Install Trinity-RFT via PyPI with uv

Source: https://modelscope.github.io/Trinity-RFT/en/main/_sources/tutorial/trinity_installation

Installs a specific version of Trinity-RFT (0.3.1) and flash-attn (2.8.1) using the 'uv' package installer. This method is an alternative to pip for installing from PyPI.

```bash
uv pip install trinity-rft==0.3.1
uv pip install flash-attn==2.8.1
```

--------------------------------

### Prepare Environments and Start Data Processor Servers

Source: https://modelscope.github.io/Trinity-RFT/en/main/_sources/tutorial/example_data_functionalities

Shell commands to set up the necessary split environments for Trinity-RFT and the Data-Juicer-based data processor, followed by starting all the split servers, including the data processor server.

```shell
# prepare split environments, including the one of data processor
python scripts/install.py

# start all split servers
python scripts/start_servers.py
```

--------------------------------

### ALFWorld Environment Preparation Commands

Source: https://modelscope.github.io/Trinity-RFT/en/main/_sources/tutorial/example_step_wise

Provides commands to install the ALFWorld environment, export necessary data paths, and download the environment files. This setup is crucial for running ALFWorld-based experiments.

```bash
pip install alfworld[full]
export ALFWORLD_DATA=/path/to/alfworld/data
alfworld-download
```

--------------------------------

### Set Up venv Virtual Environment for Trinity-RFT

Source: https://modelscope.github.io/Trinity-RFT/en/main/tutorial/trinity_installation

Creates a Python virtual environment using venv with Python 3.10, activates it, and installs Trinity-RFT with development and flash-attn dependencies. Provides an alternative installation command for flash-attn if issues arise.

```bash
python3.10 -m venv .venv
source .venv/bin/activate

pip install -e ".[dev]"
pip install -e ".[flash_attn]"
# if you encounter issues when installing flash-attn, try:
# pip install flash-attn==2.8.1 --no-build-isolation

```

--------------------------------

### Trinity-RFT GRPO Configuration Example

Source: https://modelscope.github.io/Trinity-RFT/en/main/_sources/tutorial/example_reasoning_basic

This YAML configuration defines parameters for running a GRPO (likely a reinforcement learning algorithm) experiment with Trinity-RFT. It includes settings for the project, model, optimizer, data buffers, and training/evaluation intervals. Ensure paths and names are correctly set for your environment.

```yaml
project: <project_name>
name: <experiment_name>
checkpoint_root_dir: ${oc.env:TRINITY_CHECKPOINT_ROOT_DIR,./checkpoints}
algorithm:
  algorithm_type: grpo
  repeat_times: 8
  optimizer:
    lr: 1e-5
model:
  model_path: ${oc.env:TRINITY_MODEL_PATH,Qwen/Qwen2.5-1.5B-Instruct}
cluster:
  node_num: 1
  gpu_per_node: 2
buffer:
  total_epochs: 1
  batch_size: 128
  explorer_input:
    taskset:
      name: gsm8k
      storage_type: file
      path: 'openai/gsm8k'
      subset_name: 'main'
      split: 'train'
      format:
        prompt_key: 'question'
        response_key: 'answer'
      rollout_args:
        temperature: 1.0
      default_workflow_type: 'math_workflow'
    eval_tasksets:
    - name: gsm8k-eval
      storage_type: file
      path: 'openai/gsm8k'
      subset_name: 'main'
      split: 'test'
      format:
        prompt_key: 'question'
        response_key: 'answer'
      default_workflow_type: 'math_workflow'
  trainer_input:
    experience_buffer:
      name: gsm8k_buffer
      storage_type: queue
      path: 'sqlite:///gsm8k.db'
explorer:
  eval_interval: 50
  runner_per_model: 16
  rollout_model:
    engine_num: 1
synchronizer:
  sync_method: 'nccl'
  sync_interval: 1
trainer:
  save_interval: 100
```

--------------------------------

### Run Trinity-RFT Experiment

Source: https://modelscope.github.io/Trinity-RFT/en/main/_sources/tutorial/example_reasoning_basic

This command executes a Trinity-RFT experiment using a specified configuration file. Ensure the `trinity` command is available in your PATH and the provided configuration file path is correct.

```bash
trinity run --config examples/grpo_gsm8k/gsm8k.yaml
```

--------------------------------

### Set Up Virtual Environment with Conda

Source: https://modelscope.github.io/Trinity-RFT/en/main/_sources/tutorial/trinity_installation

Creates and activates a Conda virtual environment named 'trinity' with Python 3.10, then installs Trinity-RFT with development and flash-attention dependencies. Includes a workaround for potential flash-attn installation issues.

```bash
conda create -n trinity python=3.10
conda activate trinity

pip install -e ".[dev]"
pip install -e ".[flash_attn]"
# if you encounter issues when installing flash-attn, try:
# pip install flash-attn==2.8.1 --no-build-isolation
```

--------------------------------

### Download Models using Modelscope and Huggingface

Source: https://modelscope.github.io/Trinity-RFT/en/main/_sources/tutorial/example_reasoning_basic

This snippet demonstrates how to download language models locally using either the Modelscope CLI or Huggingface CLI. It specifies the model name and the local directory for storage. Ensure you have the respective CLIs installed and configured.

```bash
# Using Modelscope
modelscope download Qwen/Qwen2.5-1.5B-Instruct --local_dir $MODEL_PATH/Qwen2.5-1.5B-Instruct

# Using Huggingface
huggingface-cli download Qwen/Qwen2.5-1.5B-Instruct --local-dir $MODEL_PATH/Qwen2.5-1.5B-Instruct
```

--------------------------------

### Build and Run Trinity-RFT Docker Image

Source: https://modelscope.github.io/Trinity-RFT/en/main/tutorial/trinity_installation

Builds a Docker image for Trinity-RFT from the provided Dockerfile and then runs the container with GPU access and shared memory. It mounts the current directory and a specified data path into the container.

```bash
git clone https://github.com/modelscope/Trinity-RFT
cd Trinity-RFT

# Build the Docker image
## Tip: You can modify the Dockerfile to add mirrors or set API keys
docker build -f scripts/docker/Dockerfile -t trinity-rft:latest .

# Run the container, replacing <path_to_your_data_and_checkpoints> with your actual path
docker run -it \
  --gpus all \
  --shm-size="64g" \
  --rm \
  -v $PWD:/workspace \
  -v <path_to_your_data_and_checkpoints>:/data \
  trinity-rft:latest

```

--------------------------------

### Download Datasets using Modelscope and Huggingface

Source: https://modelscope.github.io/Trinity-RFT/en/main/_sources/tutorial/example_reasoning_basic

This snippet shows how to download datasets like GSM8K to a local directory using Modelscope or Huggingface CLIs. It includes commands for specifying the dataset and the target local path. Ensure the CLIs are installed and configured.

```bash
# Using Modelscope
modelscope download --dataset modelscope/gsm8k --local_dir $DATASET_PATH/gsm8k

# Using Huggingface
huggingface-cli download openai/gsm8k --repo-type dataset --local-dir $DATASET_PATH/gsm8k
```

--------------------------------

### Install Trinity-RFT with uv Package Manager

Source: https://modelscope.github.io/Trinity-RFT/en/main/tutorial/trinity_installation

Installs Trinity-RFT and its 'dev' and 'flash_attn' extras using the 'uv' package installer. This provides a modern alternative for managing Python dependencies.

```bash
uv sync --extra dev --extra flash_attn

```

--------------------------------

### Install Megatron-LM Support (Bash)

Source: https://modelscope.github.io/Trinity-RFT/en/main/_sources/tutorial/example_megatron

Installs the project with Megatron-LM support in editable mode using pip. This is the primary method for setting up the Megatron backend. It also shows how to install NVIDIA's Apex library, which is crucial for mixed-precision training with Megatron-LM.

```bash
pip install -e ".[megatron]"

# for uv
# uv sync -extra megatron

pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation \
    --config-settings "--build-option=--cpp_ext" \
    --config-settings "--build-option=--cuda_ext" \
    --resume-retries 10 git+https://github.com/NVIDIA/apex.git
```

--------------------------------

### Build and Run Trinity-RFT Docker Image

Source: https://modelscope.github.io/Trinity-RFT/en/main/_sources/tutorial/trinity_installation

Builds a Docker image for Trinity-RFT from its Dockerfile and runs the container with specified GPU and shared memory configurations. It mounts the current directory and a data/checkpoints directory into the container.

```bash
git clone https://github.com/modelscope/Trinity-RFT
cd Trinity-RFT

# Build the Docker image
## Tip: You can modify the Dockerfile to add mirrors or set API keys
docker build -f scripts/docker/Dockerfile -t trinity-rft:latest .

# Run the container, replacing <path_to_your_data_and_checkpoints> with your actual path
docker run -it \
  --gpus all \
  --shm-size="64g" \
  --rm \
  -v $PWD:/workspace \
  -v <path_to_your_data_and_checkpoints>:/data \
  trinity-rft:latest
```

--------------------------------

### Set Up Conda Virtual Environment for Trinity-RFT

Source: https://modelscope.github.io/Trinity-RFT/en/main/tutorial/trinity_installation

Creates and activates a Conda virtual environment named 'trinity' with Python 3.10. It then installs the Trinity-RFT package with development and flash-attn dependencies. Includes a fallback for flash-attn installation issues.

```bash
conda create -n trinity python=3.10
conda activate trinity

pip install -e ".[dev]"
pip install -e ".[flash_attn]"
# if you encounter issues when installing flash-attn, try:
# pip install flash-attn==2.8.1 --no-build-isolation

```

--------------------------------

### Basic Trinity-RFT YAML Configuration Structure

Source: https://modelscope.github.io/Trinity-RFT/en/main/_sources/tutorial/trinity_configs

Provides an example of a basic Trinity-RFT configuration file, outlining the main sections like project, algorithm, model, cluster, buffer, explorer, trainer, synchronizer, monitor, service, data_processor, log, and stages. This structure is loaded using OmegaConf.

```yaml
project: Trinity-RFT
name: example
mode: both
checkpoint_root_dir: ${oc.env:TRINITY_CHECKPOINT_ROOT_DIR,./checkpoints}
continue_from_checkpoint: true

algorithm:
  # Algorithm-related parameters
  ...
model:
  # Model-specific configurations
  ...
cluster:
  # Cluster node and GPU settings
  ...
buffer:
  # Data buffer configurations
  ...
explorer:
  # Explorer-related settings (rollout models, workflow runners)
  ...
trainer:
  # Trainer-specific parameters
  ...
synchronizer:
  # Model weight synchronization settings
  ...
monitor:
  # Monitoring configurations (e.g., WandB, TensorBoard or MLFlow)
  ...
service:
  # Services to use
  ...
data_processor:
  # Preprocessing data settings
  ...
log:
  # Ray actor logging
  ...

stages:
  # Stages configuration
  ...
```

--------------------------------

### Bash Command to Install AgentScope

Source: https://modelscope.github.io/Trinity-RFT/en/main/_sources/tutorial/example_react

Installs the AgentScope library with a minimum version requirement of 1.0.4 using pip. This is a prerequisite for running the Trinity-RFT example.

```bash
pip install agentscope>=1.0.4
```

--------------------------------

### Basic Trinity-RFT Configuration Structure (YAML)

Source: https://modelscope.github.io/Trinity-RFT/en/main/tutorial/trinity_configs

An example of a basic configuration file for Trinity-RFT, showcasing the main sections for different modules. This structure helps organize parameters for various components of the system.

```yaml
project: Trinity-RFT
name: example
mode: both
checkpoint_root_dir: ${oc.env:TRINITY_CHECKPOINT_ROOT_DIR,./checkpoints}
continue_from_checkpoint: true

algorithm:
  # Algorithm-related parameters
  ...
model:
  # Model-specific configurations
  ...
cluster:
  # Cluster node and GPU settings
  ...
buffer:
  # Data buffer configurations
  ...
explorer:
  # Explorer-related settings (rollout models, workflow runners)
  ...
trainer:
  # Trainer-specific parameters
  ...
synchronizer:
  # Model weight synchronization settings
  ...
monitor:
  # Monitoring configurations (e.g., WandB, TensorBoard or MLFlow)
  ...
service:
  # Services to use
  ...
data_processor:
  # Preprocessing data settings
  ...
log:
  # Ray actor logging
  ...

stages:
  # Stages configuration
  ...


```

--------------------------------

### Download Models and Datasets (ModelScope & Huggingface)

Source: https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_reasoning_basic

This snippet shows how to download models and datasets using both ModelScope and Huggingface CLI. It specifies the model name or dataset path and the local directory for storage. Ensure ModelScope or Huggingface CLI is installed and configured.

```shell
# Using Modelscope
modelscope download Qwen/Qwen2.5-1.5B-Instruct --local_dir $MODEL_PATH/Qwen2.5-1.5B-Instruct

# Using Huggingface
huggingface-cli download Qwen/Qwen2.5-1.5B-Instruct --local-dir $MODEL_PATH/Qwen2.5-1.5B-Instruct

# Using Modelscope
modelscope download --dataset modelscope/gsm8k --local_dir $DATASET_PATH/gsm8k

# Using Huggingface
huggingface-cli download openai/gsm8k --repo-type dataset --local-dir $DATASET_PATH/gsm8k
```

--------------------------------

### Megatron-LM Configuration Example (YAML)

Source: https://modelscope.github.io/Trinity-RFT/en/main/_sources/tutorial/example_megatron

Provides an example YAML configuration snippet for setting up Megatron-LM parameters within the Trinity-RFT framework. It covers model parallelism settings, offloading options, MBridge usage, distributed checkpointing, and recomputation configurations for actor, reference, and critic models.

```yaml
actor_rollout_ref:
  ...
  actor:
    strategy: megatron  # Kept for backward compatibility
    megatron:
      # Model parallelism settings
      tensor_model_parallel_size: 2
      pipeline_model_parallel_size: 1
      expert_model_parallel_size: 1

      # Offloading (set to false unless you're memory-constrained)
      param_offload: false
      grad_offload: false
      optimizer_offload: false

      # Use mBridge for parameter import/export (optional)
      use_mbridge: false

      # Use Megatron checkpoint
      use_dist_checkpointing: false
      dist_checkpointing_path: null

      # Recomputation settings (helps save memory during training)
      override_transformer_config:
        recompute_granularity: full
        recompute_method: uniform
        recompute_num_layers: 1
  ...
  ref:
    megatron:
      tensor_model_parallel_size: 2
      pipeline_model_parallel_size: 1
      expert_model_parallel_size: 1
      param_offload: false
      grad_offload: false
      optimizer_offload: false
      use_mbridge: false
      use_dist_checkpointing: false
      dist_checkpointing_path: null
      override_transformer_config:
        recompute_granularity: full
        recompute_method: uniform
        recompute_num_layers: 1
  ...

critic:
  strategy: megatron
  megatron:
    tensor_model_parallel_size: 2
    pipeline_model_parallel_size: 1
    expert_model_parallel_size: 1
    param_offload: false
    grad_offload: false
    optimizer_offload: false
    use_mbridge: false
    use_dist_checkpointing: false
    dist_checkpointing_path: null
    override_transformer_config:
      recompute_granularity: full
      recompute_method: uniform
      recompute_num_layers: 1
  ...

```

--------------------------------

### Install and Start Trinity-RFT Servers

Source: https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_data_functionalities

These bash scripts are used to prepare the Trinity-RFT environment, including setting up split environments for the data processor. The `start_servers.py` script then launches all necessary split servers, including the data processor server within its Data-Juicer environment.

```bash
# prepare split environments, including the one of data processor
python scripts/install.py

# start all split servers
python scripts/start_servers.py

```

--------------------------------

### Example Trinity-RFT Taskset Configuration (YAML)

Source: https://modelscope.github.io/Trinity-RFT/en/main/_sources/tutorial/develop_workflow

An example YAML configuration snippet for setting up the task dataset in Trinity-RFT. It specifies the default workflow, dataset path, prompt/response keys for formatting, and rollout arguments like temperature.

```yaml
# some config
buffer:
  explorer_input:
    taskset:
      default_workflow: "math_workflow"
      path: ${oc.env:TRINITY_TASKSET_PATH}
      format:
        prompt_key: "question"
        response_key: "answer"
      rollout_args:
        temperature: 1.0
      # some other configs
```

--------------------------------

### Trinity-RFT Configuration with SFT Warmup

Source: https://modelscope.github.io/Trinity-RFT/en/main/_sources/tutorial/example_reasoning_basic

This YAML snippet shows how to add a Supervised Fine-Tuning (SFT) warmup stage before the main Reinforcement Fine-Tuning (RFT) process in Trinity-RFT. It defines the SFT stage, including the dataset path and training steps. This configuration should be merged into the main `gsm8k.yaml` file.

```yaml
# Properly add the following configs in gsm8k.yaml
stages:
  - stage_name: sft_warmup
    mode: train
    algorithm:
      algorithm_type: sft
    buffer:
      train_batch_size: 128
      total_steps: 10
      trainer_input:
        experience_buffer:
          name: sft_warmup_dataset
          path: /PATH/TO/YOUR/SFT/DATASET
  - stage_name: rft  # leave empty to use the original configs for RFT
```

--------------------------------

### Start Ray Cluster and Run RFT Process

Source: https://modelscope.github.io/Trinity-RFT/en/main/_sources/tutorial/example_data_functionalities

Commands to initiate a Ray cluster (master and worker nodes) and subsequently execute the Trinity-RFT process using a specified configuration file. This is the initial step for data preparation and exploration.

```shell
# start the ray cluster
# on master node
ray start --head
# on worker nodes
ray start --address=<master_address>

# run RFT
trinity run --config <Trinity-RFT_config_path>
```

--------------------------------

### Install Megatron-LM Support with Pip

Source: https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_megatron

Installs the project with Megatron-LM support in editable mode using pip. This is the primary method for enabling Megatron-LM functionality. It may involve installing additional dependencies like Apex.

```bash
pip install -e ".[megatron]"

# for uv
# uv sync -extra megatron

```

```bash
pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation \
    --config-settings "--build-option=--cpp_ext" \
    --config-settings "--build-option=--cuda_ext" \
    --resume-retries 10 git+https://github.com/NVIDIA/apex.git

```

--------------------------------

### Bash Commands to Download Models and Dataset

Source: https://modelscope.github.io/Trinity-RFT/en/main/_sources/tutorial/example_react

Downloads the Qwen3-8B model from Hugging Face and the gsm8k dataset. These are necessary resources for running the provided example.

```bash
huggingface-cli download Qwen/Qwen3-8B
huggingface-cli download openai/gsm8k --repo-type dataset
```

--------------------------------

### Webshop Workflow Example Prompt - Python

Source: https://modelscope.github.io/Trinity-RFT/en/main/_modules/trinity/common/workflows/envs/webshop/webshop_workflow

Demonstrates a step-by-step interaction within a text-based webshop environment. It shows how an agent should interpret observations and take actions like searching and clicking to find and purchase a product based on specific criteria.

```python
# -*- coding: utf-8 -*-
from typing import List, Optional

from trinity.common.experience import Experience
from trinity.common.models.model import ModelWrapper
from trinity.common.workflows.workflow import WORKFLOWS, MultiTurnWorkflow, Task

SPARSE_REWARD = False

EXAMPLE_PROMPT = """
Observation:
Webshop
Instruction:
i would like a 3 ounce bottle of bright citrus deodorant for sensitive skin, and price lower than 50.00 dollars
[button] Search [button_]

Response:
<think>OK, let's search for 3 ounce bright citrus deodorant sensitive skin</think><action>search[3 ounce bright citrus deodorant sensitive skin]</action>

Observation:
Instruction:
i would like a 3 ounce bottle of bright citrus deodorant for sensitive skin, and price lower than 50.00 dollars
[button] Back to Search [button_]
Page 1 (Total results: 50)
[button] Next > [button_]
[button] B078GWRC1J [button_]
Bright Citrus Deodorant by Earth Mama | Natural and Safe for Sensitive Skin, Pregnancy and Breastfeeding, Contains Organic Calendula 3-Ounce
$10.99
[button] B078GTKVXY [button_]
Ginger Fresh Deodorant by Earth Mama | Natural and Safe for Sensitive Skin, Pregnancy and Breastfeeding, Contains Organic Calendula 3-Ounce
$10.99
[button] B08KBVJ4XN [button_]
Barrel and Oak - Aluminum-Free Deodorant, Deodorant for Men, Essential Oil-Based Scent, 24-Hour Odor Protection, Cedar & Patchouli Blend, Gentle on Sensitive Skin (Mountain Sage, 2.7 oz, 2-Pack)
$15.95

Response:
<think>button B078GWRC1J and B078GTKVXY are bright citrus deodorant less then 50 dollars. I can check B078GWRC1J first</think><action>click[B078GWRC1J]</action>

Observation:
Instruction:
i would like a 3 ounce bottle of bright citrus deodorant for sensitive skin, and price lower than 50.00 dollars
[button] Back to Search [button_]
[button] < Prev [button_]
scent [button] assorted scents [button_][button] bright citrus [button_][button] calming lavender [button_][button] ginger fresh [button_][button] simply non-scents [button_]
size [button] travel set (4-pack) [button_][button] 3 ounce (pack of 1) [button_][button] 3-ounce (2-pack) [button_]
Bright Citrus Deodorant by Earth Mama | Natural and Safe for Sensitive Skin, Pregnancy and Breastfeeding, Contains Organic Calendula 3-Ounce
Price: $10.99
Rating: N.A.
[button] Description [button_]
[button] Features [button_]
[button] Reviews [button_]
[button] Buy Now [button_]

Response:
<think>For 3 ounce bottle of bright citrus deodorant for sensitive skin, the item has options 'bright citrus' and '3 ounce (pack of 1)' and seems good to buy. </think><action>click[bright citrus]</action>

Observation:
You have clicked bright citrus.
...

Response:
<think>Now I should select the 3 ounce (pack of 1) option</think><action>click[3 ounce (pack of 1)]</action>

Observation:
You have clicked 3 ounce (pack of 1).
...

Response:
<think>I can buy the item</think><action>click[Buy Now]</action>
""

```

--------------------------------

### Webshop System Prompt with Example - Python

Source: https://modelscope.github.io/Trinity-RFT/en/main/_modules/trinity/common/workflows/envs/webshop/webshop_workflow

Defines the system prompt for an agent interacting with a text-based webshop environment. It includes instructions on action formats (search, click), an example interaction, and rules for task completion, emphasizing thinking and action steps within a step limit.

```python
WebShop_SYSTEM_PROMPT_WITH_EXAMPLE = f"""
You are an agent interacting with a virtual text-based web shopping environment to test out your ability. Your job is to find follow the Instruction provided and mimic the steps to buy the item that are closest to the Instruct provided.

## Action Format:
You should give both the action_name and action_arg like the format `action_name[action_arg]`. You can execute two types of actions, search and click.
- When the button `[button] Search [button_]` is available in the current observation, you can execute the action <action>search[xxx]</action> (you should type the query you want to search in the square brackets here).
- You can click buttons `[button] xxx [button_]` that is available in the current observation, by execute the action <action>click[xxx]</action>.

Below are some examples of action formats.
- <action>search[white shoes]</action>
- <action>click[Buy Now]</action>

## Example:
Here is an example:
```
{EXAMPLE_PROMPT}
```

## Notes:
At each step, you should first think then perform action to fulfill the instruction. You should ALWAYS wrap your thinking with the <think> </think> tag and wrap your action with the <action> </action> tag.
You should ALWAYS take one action each step.
You should finish the task and buy the item within 15 steps.
DONOT try to interact with the user at anytime. Finish the task and buy the item by yourself.
"""

```

--------------------------------

### Install Data Juicer Dependencies

Source: https://modelscope.github.io/Trinity-RFT/en/main/_sources/tutorial/example_data_functionalities

Installs the necessary dependencies for the data split, enabling Data-Juicer operators within Trinity-RFT. This command is crucial for activating the data processor.

```shell
pip install -e ".[data]"
```

--------------------------------

### Operator Configuration Example (YAML)

Source: https://modelscope.github.io/Trinity-RFT/en/main/_sources/tutorial/develop_operator

An example YAML configuration snippet demonstrating how to integrate a custom operator, 'reward_filter', into the experience pipeline. It specifies the operator name and its arguments, along with settings for dynamic synchronization.

```yaml
# some other configs
data_processor:
  experience_pipeline:
    operators:
      - name: "reward_filter"
        args:
          threshold: 0.1
synchronizer:
  sync_method: nccl
  sync_style: dynamic_by_explorer
  sync_interval: 2
# some other configs

```

--------------------------------

### Download Qwen2.5 Model using Modelscope and Huggingface CLI

Source: https://modelscope.github.io/Trinity-RFT/en/main/_sources/tutorial/example_dpo

This snippet demonstrates how to download the Qwen2.5-1.5B-Instruct model to a local directory using either the Modelscope CLI or the Huggingface CLI. Ensure the respective CLIs are installed and authenticated.

```shell
# Using Modelscope
modelscope download Qwen/Qwen2.5-1.5B-Instruct --local-dir $MODEL_PATH/Qwen2.5-1.5B-Instruct

# Using Huggingface
huggingface-cli download Qwen/Qwen2.5-1.5B-Instruct --local-dir $MODEL_PATH/Qwen2.5-1.5B-Instruct
```

--------------------------------

### Initialize MLflow Monitor

Source: https://modelscope.github.io/Trinity-RFT/en/main/_modules/trinity/utils/monitor

Initializes the MlflowMonitor with project, group, name, and role. It sets up MLflow tracking, logs parameters, and starts a new run. Requires MLflow to be installed and configured.

```python
def __init__(
        self,
        project: str,
        group: str,
        name: str,
        role: str,
        config: Config = None,
    ) -> None:
        assert (
            mlflow is not None
        ), "mlflow is not installed. Please install it to use MlflowMonitor."
        monitor_args = config.monitor.monitor_args or {}
        if username := monitor_args.get("username"):
            os.environ["MLFLOW_TRACKING_USERNAME"] = username
        if password := monitor_args.get("password"):
            os.environ["MLFLOW_TRACKING_PASSWORD"] = password
        mlflow.set_tracking_uri(config.monitor.monitor_args.get("uri", "http://localhost:5000"))
        mlflow.set_experiment(project)
        mlflow.start_run(
            run_name=f"{name}_{role}",
            tags={
                "group": group,
                "role": role,
            },
        )
        mlflow.log_params(config.flatten())
        self.console_logger = get_logger(__name__, in_ray_actor=True)
```

--------------------------------

### SFT Configuration Example in YAML

Source: https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_dpo

This YAML snippet configures a Supervised Fine-Tuning (SFT) experiment. It defines the algorithm type as 'sft', specifies the model path, cluster settings, and buffer input format suitable for message-based datasets. The dataset path and format keys are crucial inputs.

```yaml
project: <project_name>
name: <experiment_name>
mode: train
algorithm:
  algorithm_type: sft
checkpoint_root_dir: ${oc.env:TRINITY_CHECKPOINT_ROOT_DIR,./checkpoints}
model:
  model_path: ${oc.env:TRINITY_MODEL_PATH,Qwen/Qwen2.5-1.5B-Instruct}
cluster:
  node_num: 1
  gpu_per_node: 2
buffer:
  total_epochs: 5
  train_batch_size: 64
  trainer_input:
    experience_buffer:
      name: <sft_dataset_name>
      storage_type: file
      path: $DATASET_PATH/Mixture-of-Thoughts
      split: train
      format:
        prompt_type: messages
        messages_key: messages
trainer:
  save_interval: 50
  trainer_config:
    ... # omitted here for simplicity

```

--------------------------------

### Running Asynchronous RFT Example (Bash)

Source: https://modelscope.github.io/Trinity-RFT/en/main/_sources/tutorial/example_async_mode

Shell command to execute the asynchronous RFT example. This script typically orchestrates the launch of the explorer and trainer processes using the previously defined configuration files.

```bash
bash examples/async_gsm8k/run.sh

```

--------------------------------

### Running SFT Experiment with Trinity CLI

Source: https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_dpo

This command starts a Supervised Fine-Tuning (SFT) experiment via the Trinity CLI. It points to the specified SFT configuration YAML file, enabling the fine-tuning process.

```bash
trinity run --config examples/sft_mot/sft.yaml

```

--------------------------------

### Bash Command to Start Trinity-RFT Training

Source: https://modelscope.github.io/Trinity-RFT/en/main/_sources/tutorial/example_react

Navigates to the Trinity-RFT root directory and starts the training task using a specified configuration file (gsm8k.yaml) for the GSM8k dataset. This command initiates the agent training process.

```bash
# Navigate to the Trinity-RFT root directory
cd /path/to/Trinity-RFT

# Run the training for GSM8k dataset:
trinity run --config examples/agentscope_react/gsm8k.yaml
```

--------------------------------

### Implement Trinity Workflow Run Method

Source: https://modelscope.github.io/Trinity-RFT/en/main/_sources/tutorial/develop_workflow

Provides an example of implementing the 'run' method for a workflow, which is responsible for generating responses and calculating rewards. It shows how to interact with the model to get responses and then construct a list of Experience objects. Dependencies include trinity.common.workflows.Experience.

```python
from trinity.common.workflows.experience import Experience
from typing import List

class ExampleWorkflow(Workflow):

    # the __init__ function (assumed to be defined elsewhere)

    def calculate_reward(self, response: str, truth: str) -> float:
        if response == truth:
            return 1.0
        else:
            return 0.0

    def run(self) -> List[Experience]:
        # call the model to generate multiple responses
        responses = self.model.chat(
            [
                {
                    "role": "user",
                    "content": f"Question:\n{self.question}",
                }
            ],
            temperature=self.rollout_args.temperature,
        )
        response = responses[0]  # there is only one response
        reward: float = self.calculate_reward(response.response_text, self.answer)
        return [
            Experience(
                tokens=response.tokens,
                prompt_length=response.prompt_length,
                reward=reward,
                logprobs=response.logprobs,
            )
        ]

```

--------------------------------

### Run Trinity-RFT Experiment via CLI

Source: https://modelscope.github.io/Trinity-RFT/en/main/_sources/tutorial/example_dpo

These shell commands demonstrate how to initiate DPO and SFT experiments using the Trinity-RFT framework. The 'trinity run' command is used, with the '--config' flag pointing to the respective YAML configuration file for each experiment type.

```shell
trinity run --config examples/dpo_humanlike/dpo.yaml
```

```shell
trinity run --config examples/sft_mot/sft.yaml
```

--------------------------------

### Download DPO Dataset (Human-Like-DPO-Dataset)

Source: https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_dpo

Download the Human-Like-DPO-Dataset for DPO training using ModelScope or Huggingface CLI. The dataset is expected to have 'prompt', 'chosen', and 'rejected' keys. Ensure `$DATASET_PATH` is set.

```shell
# Using Modelscope
modelscope download --dataset HumanLLMs/Human-Like-DPO-Dataset --local_dir $DATASET_PATH/human_like_dpo_dataset

# Using Huggingface
huggingface-cli download HumanLLMs/Human-Like-DPO-Dataset --repo-type dataset --local-dir $DATASET_PATH/human_like_dpo_dataset
```

--------------------------------

### Initialize AgentScope V1 React Search Workflow

Source: https://modelscope.github.io/Trinity-RFT/en/main/_modules/trinity/common/workflows/envs/agentscope/agentscopev1_search_workflow

Initializes the AgentScopeV1ReactSearchWorkflow, setting up the AgentScope models and formatters. It requires a Task, a ModelWrapper, and optionally auxiliary OpenAI clients. Includes error handling for missing AgentScope installation.

```python
import os
import re
from typing import List, Optional

import openai

from trinity.common.models.model import ModelWrapper
from trinity.common.workflows.workflow import WORKFLOWS, Task, Workflow


@WORKFLOWS.register_module("agentscope_v1_react_search_workflow")
class AgentScopeV1ReactSearchWorkflow(Workflow):
    """
    This workflow serves as an example of how to use the agentscope framework within the trinity workflow.
    """

    can_reset: bool = True
    is_async: bool = True


    def __init__(
        self,
        *,
        task: Task,
        model: ModelWrapper,
        auxiliary_models: Optional[List[openai.OpenAI]] = None,
    ):
        super().__init__(
            task=task,
            model=model,
            auxiliary_models=auxiliary_models,
        )
        try:
            from agentscope.formatter import OpenAIChatFormatter
            from agentscope.model import OpenAIChatModel
        except ImportError as e:
            error_message = f"AgentScope is not installed. Please install the agentscope framework first before running the workflow. Error: {str(e)}"
            self.logger.error(error_message)
            raise ImportError(error_message)

        self.openai_async_client = model.get_openai_async_client()
        self.model_name = self.openai_async_client.model_path

        self.agent_model = OpenAIChatModel(
            api_key="EMPTY",
            model_name=self.model_name,
            stream=False,
            generate_kwargs={
                "temperature": self.task.rollout_args.temperature,
                "max_tokens": self.task.rollout_args.max_tokens or 4096,
            },
        )
        self.agent_model.client = self.openai_async_client
        self.agent_model_formatter = OpenAIChatFormatter()

        self.reset(task)

```

--------------------------------

### Build Docker Image for Megatron-LM (Bash)

Source: https://modelscope.github.io/Trinity-RFT/en/main/_sources/tutorial/example_megatron

Builds a Docker image specifically for Megatron-LM training within the Trinity-RFT project. This command uses a provided Dockerfile to create a self-contained environment, simplifying dependency management and ensuring consistent training setups.

```bash
docker build -f scripts/docker_for_megatron/Dockerfile -t trinity-rft-megatron:latest .
```

--------------------------------

### Configure Data Processor for Task Pipeline (YAML)

Source: https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_data_functionalities

Example YAML configuration for the Trinity-RFT data processor, focusing on the task pipeline. It specifies the number of processes, operators (like 'llm_difficulty_score_filter'), input files, and target fields. It also shows how to enable the automatic start of the Data-Juicer service.

```yaml
data_processor:
  # task pipeline related
  task_pipeline:
    num_process: 32
    operators:
      - name: "llm_difficulty_score_filter"
        args:
          api_or_hf_model: "qwen2.5-7b-instruct"
          min_score: 0.0
          input_keys: ["question", "answer"]
          field_names: ["Question", "Answer"]
    inputs:  # the output will be set to the explorer input automatically
      - ${oc.env:TRINITY_TASKSET_PATH}
    target_fields: ["question", "answer"]
service:
  data_juicer:
    auto_start: true

```

--------------------------------

### DPO Configuration Example in YAML

Source: https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_dpo

This YAML snippet shows the configuration for a Direct Preference Optimization (DPO) experiment. It specifies the experiment mode, algorithm type, dataset path and format, and model details. Dependencies include the Trinity RFT environment and specified dataset paths.

```yaml
project: <project_name>
name: <experiment_name>
mode: train
algorithm:
  algorithm_type: dpo
  kl_loss_fn: k1
  kl_loss_fn_args:
    kl_coef: 0.1  # value of beta in DPO
checkpoint_root_dir: ${oc.env:TRINITY_CHECKPOINT_ROOT_DIR,./checkpoints}
model:
  model_path: ${oc.env:TRINITY_MODEL_PATH,Qwen/Qwen2.5-1.5B-Instruct}
cluster:
  node_num: 1
  gpu_per_node: 8
buffer:
  total_epochs: 2
  train_batch_size: 64
  trainer_input:
    experience_buffer:
      name: human_like_dpo
      storage_type: file
      path: $DATASET_PATH/human_like_dpo_dataset
      format:
        prompt_type: plaintext
        prompt_key: prompt
        chosen_key: chosen
        rejected_key: rejected
trainer:
  save_interval: 30
  trainer_config:
    ... # omitted here for simplicity

```

--------------------------------

### Configure Trinity-RFT Data Processor (YAML)

Source: https://modelscope.github.io/Trinity-RFT/en/main/_sources/tutorial/example_data_functionalities

Example YAML configuration for Trinity-RFT's data processor, focusing on the task pipeline. It defines the number of processes, operators like 'llm_difficulty_score_filter', input sources, and target fields. It also shows how to enable automatic Data-Juicer service startup.

```yaml
data_processor:
  # task pipeline related
  task_pipeline:
    num_process: 32
    operators:
      - name: "llm_difficulty_score_filter"
        args:
          api_or_hf_model: "qwen2.5-7b-instruct"
          min_score: 0.0
          input_keys: ["question", "answer"]
          field_names: ["Question", "Answer"]
    inputs:  # the output will be set to the explorer input automatically
      - ${oc.env:TRINITY_TASKSET_PATH}
    target_fields: ["question", "answer"]
service:
  data_juicer:
    auto_start: true
```

--------------------------------

### WebShop Workflow Class Initialization (Python)

Source: https://modelscope.github.io/Trinity-RFT/en/main/_modules/trinity/common/workflows/envs/webshop/webshop_workflow

Initializes the WebShopWorkflow class, setting up maximum environment steps, task description, and optionally resetting the task. It also handles the import and creation of the WebAgentTextEnv gym environment.

```python
@WORKFLOWS.register_module("webshop_workflow")
class WebShopWorkflow(MultiTurnWorkflow):
    """A workflow for webshop task."""

    can_reset: bool = True
    is_async: bool = True


    def __init__(
        self,
        model: ModelWrapper,
        task: Task,
        auxiliary_models: Optional[List] = None,
    ):
        super().__init__(
            model=model,
            task=task,
        )
        self.max_env_steps = 15
        self.reset(task)

        # TODO: Make parallel envs
        try:
            import gym
            from web_agent_site.envs import WebAgentTextEnv  # noqa: F401
        except Exception as e:
            print("Please make sure you have installed the web_agent_site package.")
            error_message = f"Error importing WebAgentTextEnv {str(e)}. Please make sure you have installed the web_agent_site package, following the instructions in https://github.com/princeton-nlp/WebShop"
            raise ImportError(error_message)
        print("Making GYM env")
        # NOTE: Hosting the env require ~15GB CPU memory.
        # If you want easier env, you can set the num_products to 1000 or 100000.
        self.env = gym.make(
            "WebAgentTextEnv-v0", observation_mode="text_rich", num_products=None, human_goals=True
        )
```

--------------------------------

### Setup Ray Cluster (Python)

Source: https://modelscope.github.io/Trinity-RFT/en/main/_modules/trinity/utils/dlc_utils

Initializes a Ray cluster, either starting a new one or reusing an existing one, based on DLC environment variables. It handles master and worker node startup commands and performs necessary waits for cluster readiness and worker node joining. Returns the Ray cluster address.

```python
import os
import subprocess
import sys
import time
import ray
from trinity.utils.log import get_logger

logger = get_logger(__name__)

CLUSTER_ACTOR_NAME = "cluster_status"

def get_dlc_env_vars() -> dict:
    envs = {
        "RANK": int(os.environ.get("RANK", -1)),  # type: ignore
        "WORLD_SIZE": int(os.environ.get("WORLD_SIZE", -1)),  # type: ignore
        "MASTER_ADDR": os.environ.get("MASTER_ADDR", None),
        "MASTER_PORT": os.environ.get("MASTER_PORT", None),
    }
    for key, value in envs.items():
        if value is None or value == -1:
            logger.error(f"DLC env var `{key}` is not set.")
            raise ValueError(f"DLC env var `{key}` is not set.")
    return envs

def is_running() -> bool:
    """Check if ray cluster is running."""
    ret = subprocess.run("ray status", shell=True, capture_output=True)
    return ret.returncode == 0

def wait_for_ray_setup() -> None:
    while True:
        if is_running():
            break
        else:
            logger.info("Waiting for ray cluster to be ready...")
            time.sleep(1)

def wait_for_ray_worker_nodes(world_size: int) -> None:
    while True:
        alive_nodes = [node for node in ray.nodes() if node["Alive"]]
        if len(alive_nodes) >= world_size:
            break
        else:
            logger.info(
                f"{len(alive_nodes)} nodes have joined so far, waiting for {world_size - len(alive_nodes)} nodes..."
            )
            time.sleep(1)

class ClusterStatus:
    def __init__(self):
        self.finished = False

    def finish(self) -> None:
        self.finished = True

    def running(self) -> bool:
        return not self.finished

def setup_ray_cluster(namespace: str) -> str:
    """Setup a ray cluster in DLC environment.

    This function will start a ray cluster if it is not running, otherwise it will reuse the existing ray cluster.

    Returns:
        str: The address of the ray cluster.
    """
    env_vars = get_dlc_env_vars()
    is_master = env_vars["RANK"] == 0

    if is_running():
        # reuse existing ray cluster
        return "auto"
    else:
        if is_master:
            cmd = f"ray start --head --port={env_vars['MASTER_PORT']} --node-ip-address={env_vars['MASTER_ADDR']}"
        else:
            cmd = f"ray start --address={env_vars['MASTER_ADDR']}:{env_vars['MASTER_PORT']}"
        ret = subprocess.run(cmd, shell=True, capture_output=True)
        logger.info(f"Starting ray cluster: {cmd}")
        if ret.returncode != 0:
            logger.error(f"Failed to start ray cluster: {cmd}")
            logger.error(f"ret.stdout: {ret.stdout!r}")
            logger.error(f"ret.stderr: {ret.stderr!r}")
            sys.exit(1)

        wait_for_ray_setup()
        time.sleep(5)
        ray.init(
            address=f"{env_vars['MASTER_ADDR']}:{env_vars['MASTER_PORT']}",
            namespace=namespace,
            ignore_reinit_error=True,
        )
        if is_master:
            # master wait for worker nodes to join
            wait_for_ray_worker_nodes(env_vars["WORLD_SIZE"])
            ray.shutdown()
            return f"{env_vars['MASTER_ADDR']}:{env_vars['MASTER_PORT']}"
        else:
            # worker wait on the cluster status actor
            cluster_status = (
                ray.remote(ClusterStatus)
                .options(
                    name=CLUSTER_ACTOR_NAME,
                    namespace=namespace,
                    get_if_exists=True,
                )
                .remote()
            )
            while True:
                if ray.get(cluster_status.running.remote()):
                    ret = subprocess.run("ray status", shell=True, capture_output=True)
                    print(ret.stdout.decode())
                    time.sleep(5)
                else:
                    logger.info("Ray cluster is not running, exiting.")
                    break
            sys.exit(0)
```

--------------------------------

### Trainer Input Configuration (YAML)

Source: https://modelscope.github.io/Trinity-RFT/en/main/_sources/tutorial/trinity_configs

Defines the main experience buffer and optional auxiliary datasets for the trainer. It specifies buffer names, storage types (queue, sql, file), paths, and formatting options for SFT/DPO data. Includes settings for read timeouts and replay buffers.

```yaml
buffer:
  ...
  trainer_input:
    experience_buffer:
      name: countdown_buffer
      storage_type: queue
      path: sqlite:///countdown_buffer.db
      max_read_timeout: 1800

    auxiliary_buffers:
      sft_dataset:
        name: sft_dataset
        storage_type: file
        path: ${oc.env:TRINITY_SFT_DATASET_PATH}
        format:
          prompt_key: 'question'
          response_key: 'answer'
      other_buffer:
        ...
```