### Install PyTorch

Source: https://github.com/ai4finance-foundation/elegantrl/blob/master/helloworld/README.md

Install PyTorch for deep learning. This is a prerequisite for running the ElegantRL examples.

```bash
pip3 install torch
```

--------------------------------

### Install Gym for Reinforcement Learning (Windows)

Source: https://github.com/ai4finance-foundation/elegantrl/blob/master/helloworld/README.md

Install the gym library for reinforcement learning on Windows systems. This includes installing swig and specific versions of gym and its Box2D dependencies.

```bash
# WindowOS
python -m pip install --upgrade pip
pip3 install -i http://pypi.douban.com/simple/ --trusted-host pypi.douban.com gym==0.23.1
pip3 install -i http://pypi.douban.com/simple/ --trusted-host pypi.douban.com swig gym[Box2D] 

```

--------------------------------

### Install ElegantRL using pip

Source: https://github.com/ai4finance-foundation/elegantrl/blob/master/docs/source/index.md

Install ElegantRL from PyPI for a quick setup. Ensure you have Python 3.6+ and PyTorch 1.0.2+.

```bash
pip3 install erl --upgrade
```

--------------------------------

### Install StarCraft II Environment

Source: https://github.com/ai4finance-foundation/elegantrl/blob/master/README.md

Install the StarCraft II environment by first executing the provided shell script and then installing its specific requirements.

```bash
bash ./elegantrl/envs/installsc2.sh
pip install -r sc2_requirements.txt
```

--------------------------------

### Install ElegantRL Library

Source: https://github.com/ai4finance-foundation/elegantrl/blob/master/tutorial_BipedalWalker_v3.ipynb

Installs the ElegantRL library from its GitHub repository. This is the first step before using any of its functionalities.

```python
# install elegantrl library
!pip install git+https://github.com/AI4Finance-LLC/ElegantRL.git
```

--------------------------------

### Install ElegantRL

Source: https://context7.com/ai4finance-foundation/elegantrl/llms.txt

Install ElegantRL from PyPI or from source. Includes common dependencies.

```bash
pip3 install erl --upgrade
```

```bash
git clone https://github.com/AI4Finance-Foundation/ElegantRL.git
cd ElegantRL
pip3 install .
```

```bash
pip3 install gym==0.17.0 pybullet Box2D matplotlib torch
```

--------------------------------

### Install Gym for Reinforcement Learning (Linux)

Source: https://github.com/ai4finance-foundation/elegantrl/blob/master/helloworld/README.md

Install the gym library for reinforcement learning on Linux systems. This includes installing swig and specific versions of gym and its Box2D dependencies.

```bash
# LinuxOS (Ubuntu) 
sudo apt install swig
python3 -m pip install --upgrade pip --no-warn-script-location
pip3 install -i http://pypi.douban.com/simple/ --trusted-host pypi.douban.com --user gym==0.23.1
pip3 install -i http://pypi.douban.com/simple/ --trusted-host pypi.douban.com --user gym[Box2D] 

```

--------------------------------

### Run PPO Single File Example

Source: https://github.com/ai4finance-foundation/elegantrl/blob/master/helloworld/README.md

Execute the single-file PPO example. PPO is an off-policy DRL algorithm suitable for continuous action spaces.

```bash
python helloworld/helloworld_PPO_single_file.py
```

--------------------------------

### Training Output Example

Source: https://github.com/ai4finance-foundation/elegantrl/blob/master/tutorial_LunarLanderContinuous_v2.ipynb

This is an example of the output generated during training, showing the directory where the agent's checkpoints and logs are saved.

```text
Output:
| Arguments Remove cwd: ./LunarLanderContinuous-v2_ModSAC_2022
```

--------------------------------

### Install ElegantRL from GitHub

Source: https://github.com/ai4finance-foundation/elegantrl/blob/master/docs/source/index.md

Install the latest version of ElegantRL directly from its GitHub repository. This method is useful for accessing the newest features or contributing to the project.

```bash
git clone https://github.com/AI4Finance-Foundation/ElegantRL.git
cd ElegantRL
pip3 install .
```

--------------------------------

### Run DQN Single File Example

Source: https://github.com/ai4finance-foundation/elegantrl/blob/master/helloworld/README.md

Execute the single-file DQN example. DQN is an off-policy DRL algorithm suitable for discrete action spaces.

```bash
python helloworld/helloworld_DQN_single_file.py
```

--------------------------------

### DQN Agent Initialization and Training Setup

Source: https://github.com/ai4finance-foundation/elegantrl/blob/master/tutorial_helloworld_DQN_DDPG_PPO.ipynb

Initializes the DQN agent, environment, and arguments for CartPole. Hyperparameters like reward scaling, network dimensions, and exploration rate are set.

```python
from elegantrl_helloworld.agent import AgentDQN
agent_class = AgentDQN
env_name = "CartPole-v0"

import gym
gym.logger.set_level(40)  # Block warning
env = gym.make(env_name)
env_func = gym.make
env_args = get_gym_env_args(env, if_print=True)

args = Arguments(agent_class, env_func, env_args)

'''reward shaping'''
args.reward_scale = 2 ** 0  # an approximate target reward usually be closed to 256
args.gamma = 0.97  # discount factor of future rewards

'''network update'''
args.target_step = args.max_step * 2  # collect target_step, then update network
args.net_dim = 2 ** 7  # the middle layer dimension of Fully Connected Network
args.num_layer = 3  # the layer number of MultiLayer Perceptron, `assert num_layer >= 2`
args.batch_size = 2 ** 7  # num of transitions sampled from replay buffer.
args.repeat_times = 2 ** 0  # repeatedly update network using ReplayBuffer to keep critic's loss small
args.explore_rate = 0.25  # epsilon-greedy for exploration.

'''evaluate'''
args.eval_gap = 2 ** 5  # number of times that get episode return
args.eval_times = 2 ** 3  # number of times that get episode return
args.break_step = int(8e4)  # break training if 'total_step > break_step'
```

--------------------------------

### Install Core Dependencies

Source: https://github.com/ai4finance-foundation/elegantrl/blob/master/README.md

Install necessary Python packages for core functionality, including gym, pybullet, Box2D, and matplotlib. This command can be used as an alternative to installing from a requirements file.

```bash
pip3 install gym==0.17.0 pybullet Box2D matplotlib
```

--------------------------------

### Import ElegantRL Helloworld Modules

Source: https://github.com/ai4finance-foundation/elegantrl/blob/master/tutorial_helloworld_DQN_DDPG_PPO.ipynb

Import necessary functions and classes from the ElegantRL Helloworld modules for training, evaluation, environment setup, and argument parsing.

```python
from elegantrl_helloworld.run import train_agent, evaluate_agent
from elegantrl_helloworld.env import get_gym_env_args
from elegantrl_helloworld.config import Arguments
```

--------------------------------

### PendulumEnv

Source: https://context7.com/ai4finance-foundation/elegantrl/llms.txt

Custom environment example demonstrating how to wrap an existing gym environment with custom reward shaping and normalized action space.

```APIDOC
## PendulumEnv 

### Description
Custom Environment Example. Demonstrates how to wrap an existing gym environment with custom reward shaping and normalized action space. Follows the required interface: must expose `env_name`, `state_dim`, `action_dim`, `if_discrete`, and implement `reset()` / `step()`.

### Usage
```python
import gymnasium as gym
from elegantrl.envs.CustomGymEnv import PendulumEnv
from elegantrl import Config, train_agent
from elegantrl.agents.AgentSAC import AgentSAC
from elegantrl.train.config import get_gym_env_args

# PendulumEnv wraps 'Pendulum-v1', rescales action from (-2,2) to (-1,1)
# and scales reward by 0.5 for stable learning
env = PendulumEnv()
print(f"state_dim={env.state_dim}, action_dim={env.action_dim}, discrete={env.if_discrete}")

state, info = env.reset()
action = env.action_space.sample()   # shape (1,)
next_state, reward, terminated, truncated, info = env.step(action)
print(f"next_state.shape={next_state.shape}, reward={reward:.3f}")

# Train on the custom env
env_args = get_gym_env_args(env, if_print=False)
args = Config(agent_class=AgentSAC, env_class=PendulumEnv, env_args=env_args)
args.break_step  = int(6e4)
args.break_score = -150.0
train_agent(args, if_single_process=True)
env.close()
```
```

--------------------------------

### Run DDPG Single File Example

Source: https://github.com/ai4finance-foundation/elegantrl/blob/master/helloworld/README.md

Execute the single-file DDPG example. DDPG is an off-policy DRL algorithm designed for continuous action spaces.

```bash
python helloworld/helloworld_DDPG_single_file.py
```

--------------------------------

### Custom Environment Example: PendulumEnv

Source: https://context7.com/ai4finance-foundation/elegantrl/llms.txt

Demonstrates wrapping a gym environment with custom reward shaping and action space normalization. The custom environment must implement reset() and step() and expose specific attributes.

```python
import gymnasium as gym
from elegantrl.envs.CustomGymEnv import PendulumEnv
from elegantrl import Config, train_agent
from elegantrl.agents.AgentSAC import AgentSAC

# PendulumEnv wraps 'Pendulum-v1', rescales action from (-2,2) to (-1,1)
# and scales reward by 0.5 for stable learning
env = PendulumEnv()
print(f"state_dim={env.state_dim}, action_dim={env.action_dim}, discrete={env.if_discrete}")

state, info = env.reset()
action = env.action_space.sample()   # shape (1,)
next_state, reward, terminated, truncated, info = env.step(action)
print(f"next_state.shape={next_state.shape}, reward={reward:.3f}")

# Train on the custom env
from elegantrl.train.config import get_gym_env_args
env_args = get_gym_env_args(env, if_print=False)
args = Config(agent_class=AgentSAC, env_class=PendulumEnv, env_args=env_args)
args.break_step  = int(6e4)
args.break_score = -150.0
train_agent(args, if_single_process=True)
env.close()
```

--------------------------------

### Check ChasingVecEnv Functionality

Source: https://github.com/ai4finance-foundation/elegantrl/blob/master/tutorial_Creating_ChasingVecEnv.ipynb

Tests the ChasingVecEnv by running a simulation for a fixed number of steps. It initializes the environment, gets actions, steps through the environment, and prints intermediate step counts.

```python
def check_chasing_vec_env():
    env = ChasingVecEnv(dim=2, env_num=4096, device_id=0)
    print("Env num:", env.env_num)

    reward_sums = [
        0.0,
    ] * env.env_num  # episode returns
    reward_sums_list = [
        [],
    ] * env.env_num

    states = env.reset()
    for _ in range(env.max_step * 4):
        actions = env.get_action(states)
        states, rewards, dones, _ = env.step(actions)
        print("Steps:", env.cur_steps)
        for env_i in range(env.env_num):
            reward_sums[env_i] += rewards[env_i].item()

            if dones[env_i]:
                print(

```

--------------------------------

### Configure and Initialize Training - Python

Source: https://github.com/ai4finance-foundation/elegantrl/blob/master/helloworld/singlfile.rst

Sets up the agent, environment, and hyperparameters for training. This includes defining the agent and environment classes, their arguments, and key training parameters like break step, network dimensions, and discount factor.

```python
agent_class = AgentPPO  # DRL algorithm name
env_class = StockTradingVmapEnv # run a finance env with massive parallel simulation.
env_args = {
    'env_name': 'StockTradingVmapEnv',  # Store the environment class in the hyperparmeters.
    'state_dim':   # number of shares + price + technique factors + amount dimension
    'action_dim':  # number of shares
    'if_discrete': False  # continuous action space
}
get_gym_env_args(env=StockTradingVmapEnv(), if_print=True)  # return env_args

args = Config(agent_class, env_class, env_args)  # see `config.py Arguments()` for hyperparameter explanation
args.break_step = int(2e5)  # break training if 'total_step > break_step'
args.net_dims = (64, 32)  # the middle layer dimension of MultiLayer Perceptron
args.gamma = 0.97  # discount factor of future rewards
args.repeat_times = 16  # repeatedly update network using ReplayBuffer to keep critic's loss small.

train_agent(args) # Pass the hyperparameters and start the training flow.

```

--------------------------------

### Configure Agent and Environment

Source: https://github.com/ai4finance-foundation/elegantrl/blob/master/tutorial_BipedalWalker_v3.ipynb

Sets up the agent (PPO) and environment (BipedalWalker-v3) for training. It defines environment arguments like dimensions, maximum steps, and target return, then creates a configuration object for the agent.

```python
env_func = gym.make
env_args = {
    "env_num": 1,
    "env_name": "BipedalWalker-v3",
    "max_step": 1600,
    "state_dim": 24,
    "action_dim": 4,
    "if_discrete": False,
    "target_return": 300,
    "id": "BipedalWalker-v3",
}
# env = build_env(env_class=env_func, env_args=env_args)
args = Config(AgentPPO, env_class=env_func, env_args=env_args)
```

--------------------------------

### Download ElegantRL Helloworld Repository

Source: https://github.com/ai4finance-foundation/elegantrl/blob/master/tutorial_helloworld_DQN_DDPG_PPO.ipynb

Use these commands to remove any existing directory and download the ElegantRL Helloworld repository from GitHub.

```bash
!rm -r -f /content/elegantrl_helloworld  # remove if the directory exists
!wget https://github.com/AI4Finance-Foundation/ElegantRL/raw/master/elegantrl_helloworld -P /content/
```

--------------------------------

### Train PPO for Pendulum

Source: https://github.com/ai4finance-foundation/elegantrl/blob/master/helloworld/README.md

Initiates training of the PPO algorithm for the Pendulum environment. This function is part of the tutorial examples.

```python
train_ppo_for_pendulum()
```

--------------------------------

### Train DDPG for Pendulum

Source: https://github.com/ai4finance-foundation/elegantrl/blob/master/helloworld/README.md

Initiates training of the DDPG algorithm for the Pendulum environment. This function is part of the tutorial examples.

```python
train_ddpg_for_pendulum()
```

--------------------------------

### Train DQN for CartPole

Source: https://github.com/ai4finance-foundation/elegantrl/blob/master/helloworld/README.md

Initiates training of the DQN algorithm for the CartPole environment. This function is part of the tutorial examples.

```python
train_dqn_for_cartpole()
```

--------------------------------

### Configure Agent and Environment

Source: https://github.com/ai4finance-foundation/elegantrl/blob/master/tutorial_Pendulum_v1.ipynb

Sets up the agent (SAC) and environment configuration, including environment ID, dimensions, and reward scaling. Ensure 'gpu_id' is set correctly if a GPU is available.

```python
from elegantrl.train.config import Config
from elegantrl.agents.AgentSAC import AgentSAC

env = gym.make

env_args = {
    "id": "Pendulum-v1",
    "env_name": "Pendulum-v1",
    "num_envs": 1,
    "max_step": 1000,
    "state_dim": 3,
    "action_dim": 1,
    "if_discrete": False,
    "reward_scale": 2**-1,
    "gpu_id": 0, # if you have GPU
}
args = Config(AgentSAC, env_class=env, env_args=env_args)
```

--------------------------------

### Train PPO for Lunar Lander

Source: https://github.com/ai4finance-foundation/elegantrl/blob/master/helloworld/README.md

Initiates training of the PPO algorithm for the Lunar Lander environment. This function is part of the tutorial examples.

```python
train_ppo_for_lunar_lander()
```

--------------------------------

### Initialize Agent and Environment

Source: https://github.com/ai4finance-foundation/elegantrl/blob/master/docs/source/tutorial/LunarLanderContinuous-v2.md

Initialize the training arguments, specifying the agent (e.g., AgentModSAC), the environment function (gym.make), and the environment arguments obtained previously. This sets up the core components for training.

```python
env_func = gym.make
env_args = {
    'env_num': 1,
    'env_name': 'LunarLanderContinuous-v2',
    'max_step': 1000,
    'state_dim': 8,
    'action_dim': 4,
    'if_discrete': True,
    'target_return': 200,
    'id': 'LunarLanderContinuous-v2'
}

args = Arguments(AgentModSAC, env_func=env_func, env_args=env_args)
```

--------------------------------

### Initialize Agent and Environment Arguments

Source: https://github.com/ai4finance-foundation/elegantrl/blob/master/docs/source/tutorial/BipedalWalker-v3.md

Set up the environment function and arguments, including environment name, dimensions, and maximum steps. An Arguments object is created to configure the agent and environment.

```python
env_func = gym.make
env_args = {
    'env_num': 1,
    'env_name': 'BipedalWalker-v3',
    'max_step': 1600,
    'state_dim': 24,
    'action_dim': 4,
    'if_discrete': False,
    'target_return': 300,
    'id': 'BipedalWalker-v3',
}

args = Arguments(AgentPPO, env_func=env_func, env_args=env_args)
```

--------------------------------

### Train DQN for Lunar Lander

Source: https://github.com/ai4finance-foundation/elegantrl/blob/master/helloworld/README.md

Initiates training of the DQN algorithm for the Lunar Lander environment. This function is part of the tutorial examples.

```python
train_dqn_for_lunar_lander()
```

--------------------------------

### Run All Unit Tests

Source: https://github.com/ai4finance-foundation/elegantrl/blob/master/README.md

Discover and run all available unit tests sequentially. This command should be executed from the project's root directory.

```bash
python -m unittest discover
```

--------------------------------

### Configure Agent and Training for Isaac Gym

Source: https://github.com/ai4finance-foundation/elegantrl/blob/master/docs/source/tutorial/isaacgym.md

Set up the agent, environment arguments for evaluation, and key hyper-parameters for training with Isaac Gym environments. This includes network dimensions, batch size, and evaluation intervals.

```python
from elegantrl.agents.AgentPPO import AgentPPO
from elegantrl.run import train_and_evaluate_mp

args = Arguments(agent=AgentPPO, env_func=env_func, env_args=env_args)

'''set one env for evaluator'''
args.eval_env_func = IsaacOneEnv
args.eval_env_args = args.env_args.copy()
args.eval_env_args['env_num'] = 1

'''set other hyper-parameters'''
args.net_dim = 2 ** 9
args.batch_size = args.net_dim * 4
args.target_step = args.max_step
args.repeat_times = 2 ** 4

args.save_gap = 2 ** 9
args.eval_gap = 2 ** 8
args.eval_times1 = 2 ** 0
args.eval_times2 = 2 ** 2

args.worker_num = 1
args.learner_gpus = 0
train_and_evaluate_mp(args)
```

--------------------------------

### Train and Evaluate REDQ Agent

Source: https://github.com/ai4finance-foundation/elegantrl/blob/master/docs/source/algorithms/redq.md

This snippet demonstrates how to train and evaluate a REDQ agent for a given environment. It initializes the agent, sets up training arguments, and then proceeds with training and evaluation. It also includes a section for testing the trained agent by loading its weights and performing inference.

```python
import torch
from elegantrl.run import train_and_evaluate
from elegantrl.config import Arguments
from elegantrl.train.config import build_env
from elegantrl.agents.AgentREDQ import AgentREDQ

# train and save
args = Arguments(env=build_env('Hopper-v2'), agent=AgentREDQ())
args.cwd = 'demo_Hopper_REDQ'
train_and_evaluate(args)

# test
agent = AgentREDQ()
agent.init(args.net_dim, args.state_dim, args.action_dim)
agent.save_or_load_agent(cwd=args.cwd, if_save=False)

env = build_env('Pendulum-v0')
state = env.reset()
episode_reward = 0
for i in range(125000):
    action = agent.select_action(state)
    next_state, reward, done, _ = env.step(action)

    episode_reward += reward
    if done:
        print(f'Step {i:>6}, Episode return {episode_reward:8.3f}')
        break
    else:
        state = next_state
    env.render()
```

--------------------------------

### Get Environment Information

Source: https://github.com/ai4finance-foundation/elegantrl/blob/master/docs/source/tutorial/BipedalWalker-v3.md

Retrieve and print environment arguments for BipedalWalker-v3. This helps in understanding the environment's state and action dimensions, and target return.

```python
get_gym_env_args(gym.make('BipedalWalker-v3'), if_print=False)
```

--------------------------------

### Initialize Isaac Gym Environment Wrapper

Source: https://github.com/ai4finance-foundation/elegantrl/blob/master/docs/source/tutorial/isaacgym.md

Import and configure the IsaacVecEnv wrapper for parallel Isaac Gym environments. Ensure torch is imported after IsaacGym modules.

```python
from elegantrl.envs.IsaacGym import IsaacVecEnv, IsaacOneEnv
import isaacgym
import torch  # import torch after import IsaacGym modules

env_func = IsaacVecEnv
env_args = {
    'env_num': 4096,
    'env_name': 'Ant',
    'max_step': 1000,
    'state_dim': 60,
    'action_dim': 8,
    'if_discrete': False,
    'target_return': 14000.0,

    'device_id': None,  # set by worker
    'if_print': False,  # if_print=False in default
}
```

--------------------------------

### Get Environment Information

Source: https://github.com/ai4finance-foundation/elegantrl/blob/master/docs/source/tutorial/LunarLanderContinuous-v2.md

Retrieve and print the arguments for the LunarLanderContinuous-v2 environment. This helps in understanding the environment's state and action dimensions, maximum steps, and target return.

```python
get_gym_env_args(gym.make('LunarLanderContinuous-v2'), if_print=True)
```

--------------------------------

### Multi-process Training with run.py

Source: https://github.com/ai4finance-foundation/elegantrl/blob/master/docs/source/api/run.md

Demonstrates the two-step procedure for training an agent using multi-process execution. Ensure `Arguments` are properly configured before calling `train_and_evaluate_mp`.

```python
from elegantrl.train.config import Arguments
from elegantrl.train.run import train_and_evaluate, train_and_evaluate_mp
from elegantrl.envs.Chasing import ChasingEnv
from elegantrl.agents.AgentPPO import AgentPPO

# Step 1
args = Arguments(agent=AgentPPO(), env_func=ChasingEnv)

# Step 2
train_and_evaluate_mp(args)
```

--------------------------------

### Specify Agent and Environment

Source: https://github.com/ai4finance-foundation/elegantrl/blob/master/docs/source/helloworld/quickstart.md

Instantiate the environment and set up the agent with initial arguments. The environment is 'Pendulum-v0' with a target return of -500.

```python
env = PendulumEnv('Pendulum-v0', target_return=-500)
args = Arguments(AgentSAC, env)
```

--------------------------------

### Set LD_LIBRARY_PATH for Isaac Gym

Source: https://github.com/ai4finance-foundation/elegantrl/blob/master/docs/source/other/faq.md

Use this command in bash to add the path of your Isaac Gym conda environment if you encounter an ImportError related to libpython3.7m.so.1.0. Replace the example path with your actual environment path.

```bash
export LD_LIBRARY_PATH=/xfs/home/podracer_steven/anaconda3/envs/rlgpu/lib
```

--------------------------------

### Unified Training Entry Point

Source: https://context7.com/ai4finance-foundation/elegantrl/llms.txt

The `train_agent` function serves as the unified entry point for training. It dispatches to single-process, multi-GPU, or multi-process single-GPU training based on `args.learner_gpu_ids`. Configure `args.learner_gpu_ids` to control the training mode.

```python
import gym
from elegantrl import Config, train_agent
from elegantrl.agents.AgentSAC import AgentSAC
from elegantrl.train.config import get_gym_env_args

env_args = get_gym_env_args(gym.make('Pendulum-v1'), if_print=False)
args = Config(agent_class=AgentSAC, env_class=gym.make, env_args=env_args)
args.net_dims     = [128, 128]
args.break_step   = int(1e5)
args.break_score  = -200.0
args.gpu_id       = 0
args.num_workers  = 2

# Single-process training (simplest, good for debugging)
train_agent(args, if_single_process=True)

# Multiprocessing training on one GPU (default when learner_gpu_ids is empty)
args.learner_gpu_ids = ()
train_agent(args)

# Multi-GPU training across GPUs 0 and 1
args.learner_gpu_ids = (0, 1)
train_agent(args)
```

--------------------------------

### Get Environment Information

Source: https://github.com/ai4finance-foundation/elegantrl/blob/master/tutorial_BipedalWalker_v3.ipynb

Retrieves and prints key information about the BipedalWalker-v3 environment, such as state and action dimensions, maximum steps, and whether it's a discrete environment. This helps in configuring the agent and training process.

```python
get_gym_env_args(gym.make("BipedalWalker-v3"), if_print=False)
```

--------------------------------

### PPO Training Log for LunarLanderContinuous-v2

Source: https://github.com/ai4finance-foundation/elegantrl/blob/master/helloworld/README.md

Logs the training performance of the PPO algorithm on the LunarLanderContinuous-v2 environment. It provides environment configuration and training metrics over various steps.

```text
env_args = {'env_name': 'LunarLanderContinuous-v2',
            'state_dim': 8,
            'action_dim': 2,
            'if_discrete': False}
|     step      time  |     avgR    stdR    avgS  |     objC      objA
| 2.00e+04        53  |  -232.54   75.45     197  |    11.75      0.13
| 1.00e+05       689  |   143.02   66.60     828  |     1.91      0.14
| 2.00e+05      1401  |    61.57  133.74     534  |     3.92      0.15
| 3.00e+05      2088  |   108.64  103.73     668  |     2.44      0.18
| 4.00e+05      2724  |   159.55   96.49     522  |     2.37      0.19
```

--------------------------------

### Get Environment Information

Source: https://github.com/ai4finance-foundation/elegantrl/blob/master/tutorial_LunarLanderContinuous_v2.ipynb

Retrieves and prints key information about the specified Gym environment. This helps in understanding the environment's state and action dimensions, maximum steps, and whether it's a discrete or continuous environment.

```python
get_gym_env_args(gym.make("LunarLanderContinuous-v2"), if_print=False)
```

--------------------------------

### Specify Agent and Environment Configuration

Source: https://github.com/ai4finance-foundation/elegantrl/blob/master/tutorial_LunarLanderContinuous_v2.ipynb

Configure the environment function, arguments, and agent using the Config class. Ensure 'env_name' matches the desired gym environment.

```python
env_func = gym.make
env_args = {
    "env_num": 1,
    "env_name": "LunarLanderContinuous-v2",
    "max_step": 1000,
    "state_dim": 8,
    "action_dim": 2,
    "if_discrete": False,
    "target_return": 200,
    "id": "LunarLanderContinuous-v2",
}
args = Config(AgentModSAC, env_class=env_func, env_args=env_args)
```

--------------------------------

### Train and Evaluate PPO Agent

Source: https://github.com/ai4finance-foundation/elegantrl/blob/master/docs/source/algorithms/ppo.md

This snippet demonstrates how to train and evaluate a PPO agent for the BipedalWalker-v3 environment. It configures arguments, sets a target return, and scales rewards before training. It also includes a section to test the trained agent by loading its weights and running it in the environment.

```python
import torch
from elegantrl.run import train_and_evaluate
from elegantrl.config import Arguments
from elegantrl.train.config import build_env
from elegantrl.agents.AgentPPO import AgentPPO

# train and save
args = Arguments(env=build_env('BipedalWalker-v3'), agent=AgentPPO())
args.cwd = 'demo_BipedalWalker_PPO'
args.env.target_return = 300
args.reward_scale = 2 ** -2
train_and_evaluate(args)

# test
agent = AgentPPO()
agent.init(args.net_dim, args.state_dim, args.action_dim)
agent.save_or_load_agent(cwd=args.cwd, if_save=False)

env = build_env('BipedalWalker-v3')
state = env.reset()
episode_reward = 0
for i in range(2 ** 10):
    action = agent.select_action(state)
    next_state, reward, done, _ = env.step(action)

    episode_reward += reward
    if done:
        print(f'Step {i:>6}, Episode return {episode_reward:8.3f}')
        break
    else:
        state = next_state
    env.render()
```

--------------------------------

### Import Necessary Packages

Source: https://github.com/ai4finance-foundation/elegantrl/blob/master/tutorial_LunarLanderContinuous_v2.ipynb

Imports the required libraries for reinforcement learning, including gym for environment interaction and specific agents and utilities from ElegantRL.

```python
import gym
from elegantrl.agents import AgentModSAC
from elegantrl.train.config import get_gym_env_args, Config
from elegantrl.train.run import *

gym.logger.set_level(40)  # Block warning
```

--------------------------------

### Run Tournament-Based Ensemble Training in ElegantRL

Source: https://github.com/ai4finance-foundation/elegantrl/blob/master/docs/source/tutorial/elegantrl-podracer.md

This code demonstrates how to set up and run tournament-based ensemble training for the Isaac Gym Ant environment using ElegantRL. It configures arguments for vectorized environments, evaluation environments, and hyper-parameters for the PPO agent.

```python
import isaacgym
import torch  # import torch after import IsaacGym modules
from elegantrl.train.config import Arguments
from elegantrl.train.run import train_and_evaluate_mp
from elegantrl.envs.IsaacGym import IsaacVecEnv, IsaacOneEnv
from elegantrl.agents.AgentPPO import AgentPPO

'''set vec env for worker'''
env_func = IsaacVecEnv
env_args = {
     'env_num': 2 ** 10,
     'env_name': 'Ant',
     'max_step': 1000,
     'state_dim': 60,
     'action_dim': 8,
     'if_discrete': False,
     'target_return': 14000.0,

     'device_id': None,  # set by worker
     'if_print': False,  # if_print=False in default
}

args = Arguments(agent=AgentPPO(), env_func=env_func, env_args=env_args)
args.agent.if_use_old_traj = False  # todo

'''set one env for evaluator'''
args.eval_env_func = IsaacOneEnv
args.eval_env_args = args.env_args.copy()
args.eval_env_args['env_num'] = 1

'''set other hyper-parameters'''
args.net_dim = 2 ** 9
args.batch_size = args.net_dim * 4
args.target_step = args.max_step
args.repeat_times = 2 ** 4

args.save_gap = 2 ** 9
args.eval_gap = 2 ** 8
args.eval_times1 = 2 ** 0
args.eval_times2 = 2 ** 2

args.worker_num = 1  # VecEnv, worker number = 1
args.learner_gpus = [(i,) for i in range(0, 8)]  # 8 agents (1 GPU per agent) performing tournament-based ensemble training

train_and_evaluate_mp(args, python_path='.../bin/python3')
```

--------------------------------

### Import Packages for ElegantRL

Source: https://github.com/ai4finance-foundation/elegantrl/blob/master/docs/source/helloworld/quickstart.md

Import necessary modules from ElegantRL and configure gym logger. This is the first step before setting up the agent and environment.

```python
from elegantrl_helloworld.demo import *

gym.logger.set_level(40) # Block warning
```

--------------------------------

### Import Necessary Packages

Source: https://github.com/ai4finance-foundation/elegantrl/blob/master/tutorial_BipedalWalker_v3.ipynb

Imports the required libraries for reinforcement learning, including OpenAI Gym for the environment and specific components from ElegantRL for agents and training.

```python
import gym
from elegantrl.agents import AgentPPO
from elegantrl.train.config import get_gym_env_args, Config
from elegantrl.train.run import *

gym.logger.set_level(40) # Block warning
```

--------------------------------

### AgentPPO: Proximal Policy Optimization for Continuous and Discrete Spaces

Source: https://context7.com/ai4finance-foundation/elegantrl/llms.txt

Implements PPO with GAE. Use `AgentPPO` for continuous action spaces and `AgentDiscretePPO` for discrete action spaces. Configure parameters like `ratio_clip`, `lambda_gae_adv`, and `lambda_entropy` for tuning. `if_use_v_trace` can be enabled for sparse rewards.

```python
import gym
from elegantrl import Config, train_agent
from elegantrl.agents.AgentPPO import AgentPPO, AgentDiscretePPO
from elegantrl.train.config import get_gym_env_args

# Continuous action space (BipedalWalker)
env_args = get_gym_env_args(gym.make('BipedalWalker-v3'), if_print=False)
args = Config(agent_class=AgentPPO, env_class=gym.make, env_args=env_args)
args.net_dims       = [256, 256]
args.horizon_len    = 4096
args.batch_size     = 512
args.repeat_times   = 8.0
args.gamma          = 0.98
args.break_step     = int(4e6)
args.break_score    = 300.0
args.ratio_clip     = 0.25    # PPO clip epsilon
args.lambda_gae_adv = 0.95    # GAE lambda
args.lambda_entropy = 0.001   # entropy bonus weight
args.if_use_v_trace = True    # V-trace advantage estimation
train_agent(args, if_single_process=True)

# Discrete action space (CartPole)
import gymnasium as gym as gymnasium
env_args2 = {
    'env_name': 'CartPole-v1', 'num_envs': 1,
    'max_step': 500, 'state_dim': 4,
    'action_dim': 2, 'if_discrete': True
}
args2 = Config(agent_class=AgentDiscretePPO, env_class=gymnasium.make, env_args=env_args2)
args2.break_step  = int(5e4)
args2.break_score = 450.0
train_agent(args2, if_single_process=True)
```