### Install PyTorch Source: https://github.com/ai4finance-foundation/elegantrl/blob/master/helloworld/README.md Install PyTorch for deep learning. This is a prerequisite for running the ElegantRL examples. ```bash pip3 install torch ``` -------------------------------- ### Install Gym for Reinforcement Learning (Windows) Source: https://github.com/ai4finance-foundation/elegantrl/blob/master/helloworld/README.md Install the gym library for reinforcement learning on Windows systems. This includes installing swig and specific versions of gym and its Box2D dependencies. ```bash # WindowOS python -m pip install --upgrade pip pip3 install -i http://pypi.douban.com/simple/ --trusted-host pypi.douban.com gym==0.23.1 pip3 install -i http://pypi.douban.com/simple/ --trusted-host pypi.douban.com swig gym[Box2D] ``` -------------------------------- ### Install ElegantRL using pip Source: https://github.com/ai4finance-foundation/elegantrl/blob/master/docs/source/index.md Install ElegantRL from PyPI for a quick setup. Ensure you have Python 3.6+ and PyTorch 1.0.2+. ```bash pip3 install erl --upgrade ``` -------------------------------- ### Install StarCraft II Environment Source: https://github.com/ai4finance-foundation/elegantrl/blob/master/README.md Install the StarCraft II environment by first executing the provided shell script and then installing its specific requirements. ```bash bash ./elegantrl/envs/installsc2.sh pip install -r sc2_requirements.txt ``` -------------------------------- ### Install ElegantRL Library Source: https://github.com/ai4finance-foundation/elegantrl/blob/master/tutorial_BipedalWalker_v3.ipynb Installs the ElegantRL library from its GitHub repository. This is the first step before using any of its functionalities. ```python # install elegantrl library !pip install git+https://github.com/AI4Finance-LLC/ElegantRL.git ``` -------------------------------- ### Install ElegantRL Source: https://context7.com/ai4finance-foundation/elegantrl/llms.txt Install ElegantRL from PyPI or from source. Includes common dependencies. ```bash pip3 install erl --upgrade ``` ```bash git clone https://github.com/AI4Finance-Foundation/ElegantRL.git cd ElegantRL pip3 install . ``` ```bash pip3 install gym==0.17.0 pybullet Box2D matplotlib torch ``` -------------------------------- ### Install Gym for Reinforcement Learning (Linux) Source: https://github.com/ai4finance-foundation/elegantrl/blob/master/helloworld/README.md Install the gym library for reinforcement learning on Linux systems. This includes installing swig and specific versions of gym and its Box2D dependencies. ```bash # LinuxOS (Ubuntu) sudo apt install swig python3 -m pip install --upgrade pip --no-warn-script-location pip3 install -i http://pypi.douban.com/simple/ --trusted-host pypi.douban.com --user gym==0.23.1 pip3 install -i http://pypi.douban.com/simple/ --trusted-host pypi.douban.com --user gym[Box2D] ``` -------------------------------- ### Run PPO Single File Example Source: https://github.com/ai4finance-foundation/elegantrl/blob/master/helloworld/README.md Execute the single-file PPO example. PPO is an off-policy DRL algorithm suitable for continuous action spaces. ```bash python helloworld/helloworld_PPO_single_file.py ``` -------------------------------- ### Training Output Example Source: https://github.com/ai4finance-foundation/elegantrl/blob/master/tutorial_LunarLanderContinuous_v2.ipynb This is an example of the output generated during training, showing the directory where the agent's checkpoints and logs are saved. ```text Output: | Arguments Remove cwd: ./LunarLanderContinuous-v2_ModSAC_2022 ``` -------------------------------- ### Install ElegantRL from GitHub Source: https://github.com/ai4finance-foundation/elegantrl/blob/master/docs/source/index.md Install the latest version of ElegantRL directly from its GitHub repository. This method is useful for accessing the newest features or contributing to the project. ```bash git clone https://github.com/AI4Finance-Foundation/ElegantRL.git cd ElegantRL pip3 install . ``` -------------------------------- ### Run DQN Single File Example Source: https://github.com/ai4finance-foundation/elegantrl/blob/master/helloworld/README.md Execute the single-file DQN example. DQN is an off-policy DRL algorithm suitable for discrete action spaces. ```bash python helloworld/helloworld_DQN_single_file.py ``` -------------------------------- ### DQN Agent Initialization and Training Setup Source: https://github.com/ai4finance-foundation/elegantrl/blob/master/tutorial_helloworld_DQN_DDPG_PPO.ipynb Initializes the DQN agent, environment, and arguments for CartPole. Hyperparameters like reward scaling, network dimensions, and exploration rate are set. ```python from elegantrl_helloworld.agent import AgentDQN agent_class = AgentDQN env_name = "CartPole-v0" import gym gym.logger.set_level(40) # Block warning env = gym.make(env_name) env_func = gym.make env_args = get_gym_env_args(env, if_print=True) args = Arguments(agent_class, env_func, env_args) '''reward shaping''' args.reward_scale = 2 ** 0 # an approximate target reward usually be closed to 256 args.gamma = 0.97 # discount factor of future rewards '''network update''' args.target_step = args.max_step * 2 # collect target_step, then update network args.net_dim = 2 ** 7 # the middle layer dimension of Fully Connected Network args.num_layer = 3 # the layer number of MultiLayer Perceptron, `assert num_layer >= 2` args.batch_size = 2 ** 7 # num of transitions sampled from replay buffer. args.repeat_times = 2 ** 0 # repeatedly update network using ReplayBuffer to keep critic's loss small args.explore_rate = 0.25 # epsilon-greedy for exploration. '''evaluate''' args.eval_gap = 2 ** 5 # number of times that get episode return args.eval_times = 2 ** 3 # number of times that get episode return args.break_step = int(8e4) # break training if 'total_step > break_step' ``` -------------------------------- ### Install Core Dependencies Source: https://github.com/ai4finance-foundation/elegantrl/blob/master/README.md Install necessary Python packages for core functionality, including gym, pybullet, Box2D, and matplotlib. This command can be used as an alternative to installing from a requirements file. ```bash pip3 install gym==0.17.0 pybullet Box2D matplotlib ``` -------------------------------- ### Import ElegantRL Helloworld Modules Source: https://github.com/ai4finance-foundation/elegantrl/blob/master/tutorial_helloworld_DQN_DDPG_PPO.ipynb Import necessary functions and classes from the ElegantRL Helloworld modules for training, evaluation, environment setup, and argument parsing. ```python from elegantrl_helloworld.run import train_agent, evaluate_agent from elegantrl_helloworld.env import get_gym_env_args from elegantrl_helloworld.config import Arguments ``` -------------------------------- ### PendulumEnv Source: https://context7.com/ai4finance-foundation/elegantrl/llms.txt Custom environment example demonstrating how to wrap an existing gym environment with custom reward shaping and normalized action space. ```APIDOC ## PendulumEnv ### Description Custom Environment Example. Demonstrates how to wrap an existing gym environment with custom reward shaping and normalized action space. Follows the required interface: must expose `env_name`, `state_dim`, `action_dim`, `if_discrete`, and implement `reset()` / `step()`. ### Usage ```python import gymnasium as gym from elegantrl.envs.CustomGymEnv import PendulumEnv from elegantrl import Config, train_agent from elegantrl.agents.AgentSAC import AgentSAC from elegantrl.train.config import get_gym_env_args # PendulumEnv wraps 'Pendulum-v1', rescales action from (-2,2) to (-1,1) # and scales reward by 0.5 for stable learning env = PendulumEnv() print(f"state_dim={env.state_dim}, action_dim={env.action_dim}, discrete={env.if_discrete}") state, info = env.reset() action = env.action_space.sample() # shape (1,) next_state, reward, terminated, truncated, info = env.step(action) print(f"next_state.shape={next_state.shape}, reward={reward:.3f}") # Train on the custom env env_args = get_gym_env_args(env, if_print=False) args = Config(agent_class=AgentSAC, env_class=PendulumEnv, env_args=env_args) args.break_step = int(6e4) args.break_score = -150.0 train_agent(args, if_single_process=True) env.close() ``` ``` -------------------------------- ### Run DDPG Single File Example Source: https://github.com/ai4finance-foundation/elegantrl/blob/master/helloworld/README.md Execute the single-file DDPG example. DDPG is an off-policy DRL algorithm designed for continuous action spaces. ```bash python helloworld/helloworld_DDPG_single_file.py ``` -------------------------------- ### Custom Environment Example: PendulumEnv Source: https://context7.com/ai4finance-foundation/elegantrl/llms.txt Demonstrates wrapping a gym environment with custom reward shaping and action space normalization. The custom environment must implement reset() and step() and expose specific attributes. ```python import gymnasium as gym from elegantrl.envs.CustomGymEnv import PendulumEnv from elegantrl import Config, train_agent from elegantrl.agents.AgentSAC import AgentSAC # PendulumEnv wraps 'Pendulum-v1', rescales action from (-2,2) to (-1,1) # and scales reward by 0.5 for stable learning env = PendulumEnv() print(f"state_dim={env.state_dim}, action_dim={env.action_dim}, discrete={env.if_discrete}") state, info = env.reset() action = env.action_space.sample() # shape (1,) next_state, reward, terminated, truncated, info = env.step(action) print(f"next_state.shape={next_state.shape}, reward={reward:.3f}") # Train on the custom env from elegantrl.train.config import get_gym_env_args env_args = get_gym_env_args(env, if_print=False) args = Config(agent_class=AgentSAC, env_class=PendulumEnv, env_args=env_args) args.break_step = int(6e4) args.break_score = -150.0 train_agent(args, if_single_process=True) env.close() ``` -------------------------------- ### Check ChasingVecEnv Functionality Source: https://github.com/ai4finance-foundation/elegantrl/blob/master/tutorial_Creating_ChasingVecEnv.ipynb Tests the ChasingVecEnv by running a simulation for a fixed number of steps. It initializes the environment, gets actions, steps through the environment, and prints intermediate step counts. ```python def check_chasing_vec_env(): env = ChasingVecEnv(dim=2, env_num=4096, device_id=0) print("Env num:", env.env_num) reward_sums = [ 0.0, ] * env.env_num # episode returns reward_sums_list = [ [], ] * env.env_num states = env.reset() for _ in range(env.max_step * 4): actions = env.get_action(states) states, rewards, dones, _ = env.step(actions) print("Steps:", env.cur_steps) for env_i in range(env.env_num): reward_sums[env_i] += rewards[env_i].item() if dones[env_i]: print( ``` -------------------------------- ### Configure and Initialize Training - Python Source: https://github.com/ai4finance-foundation/elegantrl/blob/master/helloworld/singlfile.rst Sets up the agent, environment, and hyperparameters for training. This includes defining the agent and environment classes, their arguments, and key training parameters like break step, network dimensions, and discount factor. ```python agent_class = AgentPPO # DRL algorithm name env_class = StockTradingVmapEnv # run a finance env with massive parallel simulation. env_args = { 'env_name': 'StockTradingVmapEnv', # Store the environment class in the hyperparmeters. 'state_dim': # number of shares + price + technique factors + amount dimension 'action_dim': # number of shares 'if_discrete': False # continuous action space } get_gym_env_args(env=StockTradingVmapEnv(), if_print=True) # return env_args args = Config(agent_class, env_class, env_args) # see `config.py Arguments()` for hyperparameter explanation args.break_step = int(2e5) # break training if 'total_step > break_step' args.net_dims = (64, 32) # the middle layer dimension of MultiLayer Perceptron args.gamma = 0.97 # discount factor of future rewards args.repeat_times = 16 # repeatedly update network using ReplayBuffer to keep critic's loss small. train_agent(args) # Pass the hyperparameters and start the training flow. ``` -------------------------------- ### Configure Agent and Environment Source: https://github.com/ai4finance-foundation/elegantrl/blob/master/tutorial_BipedalWalker_v3.ipynb Sets up the agent (PPO) and environment (BipedalWalker-v3) for training. It defines environment arguments like dimensions, maximum steps, and target return, then creates a configuration object for the agent. ```python env_func = gym.make env_args = { "env_num": 1, "env_name": "BipedalWalker-v3", "max_step": 1600, "state_dim": 24, "action_dim": 4, "if_discrete": False, "target_return": 300, "id": "BipedalWalker-v3", } # env = build_env(env_class=env_func, env_args=env_args) args = Config(AgentPPO, env_class=env_func, env_args=env_args) ``` -------------------------------- ### Download ElegantRL Helloworld Repository Source: https://github.com/ai4finance-foundation/elegantrl/blob/master/tutorial_helloworld_DQN_DDPG_PPO.ipynb Use these commands to remove any existing directory and download the ElegantRL Helloworld repository from GitHub. ```bash !rm -r -f /content/elegantrl_helloworld # remove if the directory exists !wget https://github.com/AI4Finance-Foundation/ElegantRL/raw/master/elegantrl_helloworld -P /content/ ``` -------------------------------- ### Train PPO for Pendulum Source: https://github.com/ai4finance-foundation/elegantrl/blob/master/helloworld/README.md Initiates training of the PPO algorithm for the Pendulum environment. This function is part of the tutorial examples. ```python train_ppo_for_pendulum() ``` -------------------------------- ### Train DDPG for Pendulum Source: https://github.com/ai4finance-foundation/elegantrl/blob/master/helloworld/README.md Initiates training of the DDPG algorithm for the Pendulum environment. This function is part of the tutorial examples. ```python train_ddpg_for_pendulum() ``` -------------------------------- ### Train DQN for CartPole Source: https://github.com/ai4finance-foundation/elegantrl/blob/master/helloworld/README.md Initiates training of the DQN algorithm for the CartPole environment. This function is part of the tutorial examples. ```python train_dqn_for_cartpole() ``` -------------------------------- ### Configure Agent and Environment Source: https://github.com/ai4finance-foundation/elegantrl/blob/master/tutorial_Pendulum_v1.ipynb Sets up the agent (SAC) and environment configuration, including environment ID, dimensions, and reward scaling. Ensure 'gpu_id' is set correctly if a GPU is available. ```python from elegantrl.train.config import Config from elegantrl.agents.AgentSAC import AgentSAC env = gym.make env_args = { "id": "Pendulum-v1", "env_name": "Pendulum-v1", "num_envs": 1, "max_step": 1000, "state_dim": 3, "action_dim": 1, "if_discrete": False, "reward_scale": 2**-1, "gpu_id": 0, # if you have GPU } args = Config(AgentSAC, env_class=env, env_args=env_args) ``` -------------------------------- ### Train PPO for Lunar Lander Source: https://github.com/ai4finance-foundation/elegantrl/blob/master/helloworld/README.md Initiates training of the PPO algorithm for the Lunar Lander environment. This function is part of the tutorial examples. ```python train_ppo_for_lunar_lander() ``` -------------------------------- ### Initialize Agent and Environment Source: https://github.com/ai4finance-foundation/elegantrl/blob/master/docs/source/tutorial/LunarLanderContinuous-v2.md Initialize the training arguments, specifying the agent (e.g., AgentModSAC), the environment function (gym.make), and the environment arguments obtained previously. This sets up the core components for training. ```python env_func = gym.make env_args = { 'env_num': 1, 'env_name': 'LunarLanderContinuous-v2', 'max_step': 1000, 'state_dim': 8, 'action_dim': 4, 'if_discrete': True, 'target_return': 200, 'id': 'LunarLanderContinuous-v2' } args = Arguments(AgentModSAC, env_func=env_func, env_args=env_args) ``` -------------------------------- ### Initialize Agent and Environment Arguments Source: https://github.com/ai4finance-foundation/elegantrl/blob/master/docs/source/tutorial/BipedalWalker-v3.md Set up the environment function and arguments, including environment name, dimensions, and maximum steps. An Arguments object is created to configure the agent and environment. ```python env_func = gym.make env_args = { 'env_num': 1, 'env_name': 'BipedalWalker-v3', 'max_step': 1600, 'state_dim': 24, 'action_dim': 4, 'if_discrete': False, 'target_return': 300, 'id': 'BipedalWalker-v3', } args = Arguments(AgentPPO, env_func=env_func, env_args=env_args) ``` -------------------------------- ### Train DQN for Lunar Lander Source: https://github.com/ai4finance-foundation/elegantrl/blob/master/helloworld/README.md Initiates training of the DQN algorithm for the Lunar Lander environment. This function is part of the tutorial examples. ```python train_dqn_for_lunar_lander() ``` -------------------------------- ### Run All Unit Tests Source: https://github.com/ai4finance-foundation/elegantrl/blob/master/README.md Discover and run all available unit tests sequentially. This command should be executed from the project's root directory. ```bash python -m unittest discover ``` -------------------------------- ### Configure Agent and Training for Isaac Gym Source: https://github.com/ai4finance-foundation/elegantrl/blob/master/docs/source/tutorial/isaacgym.md Set up the agent, environment arguments for evaluation, and key hyper-parameters for training with Isaac Gym environments. This includes network dimensions, batch size, and evaluation intervals. ```python from elegantrl.agents.AgentPPO import AgentPPO from elegantrl.run import train_and_evaluate_mp args = Arguments(agent=AgentPPO, env_func=env_func, env_args=env_args) '''set one env for evaluator''' args.eval_env_func = IsaacOneEnv args.eval_env_args = args.env_args.copy() args.eval_env_args['env_num'] = 1 '''set other hyper-parameters''' args.net_dim = 2 ** 9 args.batch_size = args.net_dim * 4 args.target_step = args.max_step args.repeat_times = 2 ** 4 args.save_gap = 2 ** 9 args.eval_gap = 2 ** 8 args.eval_times1 = 2 ** 0 args.eval_times2 = 2 ** 2 args.worker_num = 1 args.learner_gpus = 0 train_and_evaluate_mp(args) ``` -------------------------------- ### Train and Evaluate REDQ Agent Source: https://github.com/ai4finance-foundation/elegantrl/blob/master/docs/source/algorithms/redq.md This snippet demonstrates how to train and evaluate a REDQ agent for a given environment. It initializes the agent, sets up training arguments, and then proceeds with training and evaluation. It also includes a section for testing the trained agent by loading its weights and performing inference. ```python import torch from elegantrl.run import train_and_evaluate from elegantrl.config import Arguments from elegantrl.train.config import build_env from elegantrl.agents.AgentREDQ import AgentREDQ # train and save args = Arguments(env=build_env('Hopper-v2'), agent=AgentREDQ()) args.cwd = 'demo_Hopper_REDQ' train_and_evaluate(args) # test agent = AgentREDQ() agent.init(args.net_dim, args.state_dim, args.action_dim) agent.save_or_load_agent(cwd=args.cwd, if_save=False) env = build_env('Pendulum-v0') state = env.reset() episode_reward = 0 for i in range(125000): action = agent.select_action(state) next_state, reward, done, _ = env.step(action) episode_reward += reward if done: print(f'Step {i:>6}, Episode return {episode_reward:8.3f}') break else: state = next_state env.render() ``` -------------------------------- ### Get Environment Information Source: https://github.com/ai4finance-foundation/elegantrl/blob/master/docs/source/tutorial/BipedalWalker-v3.md Retrieve and print environment arguments for BipedalWalker-v3. This helps in understanding the environment's state and action dimensions, and target return. ```python get_gym_env_args(gym.make('BipedalWalker-v3'), if_print=False) ``` -------------------------------- ### Initialize Isaac Gym Environment Wrapper Source: https://github.com/ai4finance-foundation/elegantrl/blob/master/docs/source/tutorial/isaacgym.md Import and configure the IsaacVecEnv wrapper for parallel Isaac Gym environments. Ensure torch is imported after IsaacGym modules. ```python from elegantrl.envs.IsaacGym import IsaacVecEnv, IsaacOneEnv import isaacgym import torch # import torch after import IsaacGym modules env_func = IsaacVecEnv env_args = { 'env_num': 4096, 'env_name': 'Ant', 'max_step': 1000, 'state_dim': 60, 'action_dim': 8, 'if_discrete': False, 'target_return': 14000.0, 'device_id': None, # set by worker 'if_print': False, # if_print=False in default } ``` -------------------------------- ### Get Environment Information Source: https://github.com/ai4finance-foundation/elegantrl/blob/master/docs/source/tutorial/LunarLanderContinuous-v2.md Retrieve and print the arguments for the LunarLanderContinuous-v2 environment. This helps in understanding the environment's state and action dimensions, maximum steps, and target return. ```python get_gym_env_args(gym.make('LunarLanderContinuous-v2'), if_print=True) ``` -------------------------------- ### Multi-process Training with run.py Source: https://github.com/ai4finance-foundation/elegantrl/blob/master/docs/source/api/run.md Demonstrates the two-step procedure for training an agent using multi-process execution. Ensure `Arguments` are properly configured before calling `train_and_evaluate_mp`. ```python from elegantrl.train.config import Arguments from elegantrl.train.run import train_and_evaluate, train_and_evaluate_mp from elegantrl.envs.Chasing import ChasingEnv from elegantrl.agents.AgentPPO import AgentPPO # Step 1 args = Arguments(agent=AgentPPO(), env_func=ChasingEnv) # Step 2 train_and_evaluate_mp(args) ``` -------------------------------- ### Specify Agent and Environment Source: https://github.com/ai4finance-foundation/elegantrl/blob/master/docs/source/helloworld/quickstart.md Instantiate the environment and set up the agent with initial arguments. The environment is 'Pendulum-v0' with a target return of -500. ```python env = PendulumEnv('Pendulum-v0', target_return=-500) args = Arguments(AgentSAC, env) ``` -------------------------------- ### Set LD_LIBRARY_PATH for Isaac Gym Source: https://github.com/ai4finance-foundation/elegantrl/blob/master/docs/source/other/faq.md Use this command in bash to add the path of your Isaac Gym conda environment if you encounter an ImportError related to libpython3.7m.so.1.0. Replace the example path with your actual environment path. ```bash export LD_LIBRARY_PATH=/xfs/home/podracer_steven/anaconda3/envs/rlgpu/lib ``` -------------------------------- ### Unified Training Entry Point Source: https://context7.com/ai4finance-foundation/elegantrl/llms.txt The `train_agent` function serves as the unified entry point for training. It dispatches to single-process, multi-GPU, or multi-process single-GPU training based on `args.learner_gpu_ids`. Configure `args.learner_gpu_ids` to control the training mode. ```python import gym from elegantrl import Config, train_agent from elegantrl.agents.AgentSAC import AgentSAC from elegantrl.train.config import get_gym_env_args env_args = get_gym_env_args(gym.make('Pendulum-v1'), if_print=False) args = Config(agent_class=AgentSAC, env_class=gym.make, env_args=env_args) args.net_dims = [128, 128] args.break_step = int(1e5) args.break_score = -200.0 args.gpu_id = 0 args.num_workers = 2 # Single-process training (simplest, good for debugging) train_agent(args, if_single_process=True) # Multiprocessing training on one GPU (default when learner_gpu_ids is empty) args.learner_gpu_ids = () train_agent(args) # Multi-GPU training across GPUs 0 and 1 args.learner_gpu_ids = (0, 1) train_agent(args) ``` -------------------------------- ### Get Environment Information Source: https://github.com/ai4finance-foundation/elegantrl/blob/master/tutorial_BipedalWalker_v3.ipynb Retrieves and prints key information about the BipedalWalker-v3 environment, such as state and action dimensions, maximum steps, and whether it's a discrete environment. This helps in configuring the agent and training process. ```python get_gym_env_args(gym.make("BipedalWalker-v3"), if_print=False) ``` -------------------------------- ### PPO Training Log for LunarLanderContinuous-v2 Source: https://github.com/ai4finance-foundation/elegantrl/blob/master/helloworld/README.md Logs the training performance of the PPO algorithm on the LunarLanderContinuous-v2 environment. It provides environment configuration and training metrics over various steps. ```text env_args = {'env_name': 'LunarLanderContinuous-v2', 'state_dim': 8, 'action_dim': 2, 'if_discrete': False} | step time | avgR stdR avgS | objC objA | 2.00e+04 53 | -232.54 75.45 197 | 11.75 0.13 | 1.00e+05 689 | 143.02 66.60 828 | 1.91 0.14 | 2.00e+05 1401 | 61.57 133.74 534 | 3.92 0.15 | 3.00e+05 2088 | 108.64 103.73 668 | 2.44 0.18 | 4.00e+05 2724 | 159.55 96.49 522 | 2.37 0.19 ``` -------------------------------- ### Get Environment Information Source: https://github.com/ai4finance-foundation/elegantrl/blob/master/tutorial_LunarLanderContinuous_v2.ipynb Retrieves and prints key information about the specified Gym environment. This helps in understanding the environment's state and action dimensions, maximum steps, and whether it's a discrete or continuous environment. ```python get_gym_env_args(gym.make("LunarLanderContinuous-v2"), if_print=False) ``` -------------------------------- ### Specify Agent and Environment Configuration Source: https://github.com/ai4finance-foundation/elegantrl/blob/master/tutorial_LunarLanderContinuous_v2.ipynb Configure the environment function, arguments, and agent using the Config class. Ensure 'env_name' matches the desired gym environment. ```python env_func = gym.make env_args = { "env_num": 1, "env_name": "LunarLanderContinuous-v2", "max_step": 1000, "state_dim": 8, "action_dim": 2, "if_discrete": False, "target_return": 200, "id": "LunarLanderContinuous-v2", } args = Config(AgentModSAC, env_class=env_func, env_args=env_args) ``` -------------------------------- ### Train and Evaluate PPO Agent Source: https://github.com/ai4finance-foundation/elegantrl/blob/master/docs/source/algorithms/ppo.md This snippet demonstrates how to train and evaluate a PPO agent for the BipedalWalker-v3 environment. It configures arguments, sets a target return, and scales rewards before training. It also includes a section to test the trained agent by loading its weights and running it in the environment. ```python import torch from elegantrl.run import train_and_evaluate from elegantrl.config import Arguments from elegantrl.train.config import build_env from elegantrl.agents.AgentPPO import AgentPPO # train and save args = Arguments(env=build_env('BipedalWalker-v3'), agent=AgentPPO()) args.cwd = 'demo_BipedalWalker_PPO' args.env.target_return = 300 args.reward_scale = 2 ** -2 train_and_evaluate(args) # test agent = AgentPPO() agent.init(args.net_dim, args.state_dim, args.action_dim) agent.save_or_load_agent(cwd=args.cwd, if_save=False) env = build_env('BipedalWalker-v3') state = env.reset() episode_reward = 0 for i in range(2 ** 10): action = agent.select_action(state) next_state, reward, done, _ = env.step(action) episode_reward += reward if done: print(f'Step {i:>6}, Episode return {episode_reward:8.3f}') break else: state = next_state env.render() ``` -------------------------------- ### Import Necessary Packages Source: https://github.com/ai4finance-foundation/elegantrl/blob/master/tutorial_LunarLanderContinuous_v2.ipynb Imports the required libraries for reinforcement learning, including gym for environment interaction and specific agents and utilities from ElegantRL. ```python import gym from elegantrl.agents import AgentModSAC from elegantrl.train.config import get_gym_env_args, Config from elegantrl.train.run import * gym.logger.set_level(40) # Block warning ``` -------------------------------- ### Run Tournament-Based Ensemble Training in ElegantRL Source: https://github.com/ai4finance-foundation/elegantrl/blob/master/docs/source/tutorial/elegantrl-podracer.md This code demonstrates how to set up and run tournament-based ensemble training for the Isaac Gym Ant environment using ElegantRL. It configures arguments for vectorized environments, evaluation environments, and hyper-parameters for the PPO agent. ```python import isaacgym import torch # import torch after import IsaacGym modules from elegantrl.train.config import Arguments from elegantrl.train.run import train_and_evaluate_mp from elegantrl.envs.IsaacGym import IsaacVecEnv, IsaacOneEnv from elegantrl.agents.AgentPPO import AgentPPO '''set vec env for worker''' env_func = IsaacVecEnv env_args = { 'env_num': 2 ** 10, 'env_name': 'Ant', 'max_step': 1000, 'state_dim': 60, 'action_dim': 8, 'if_discrete': False, 'target_return': 14000.0, 'device_id': None, # set by worker 'if_print': False, # if_print=False in default } args = Arguments(agent=AgentPPO(), env_func=env_func, env_args=env_args) args.agent.if_use_old_traj = False # todo '''set one env for evaluator''' args.eval_env_func = IsaacOneEnv args.eval_env_args = args.env_args.copy() args.eval_env_args['env_num'] = 1 '''set other hyper-parameters''' args.net_dim = 2 ** 9 args.batch_size = args.net_dim * 4 args.target_step = args.max_step args.repeat_times = 2 ** 4 args.save_gap = 2 ** 9 args.eval_gap = 2 ** 8 args.eval_times1 = 2 ** 0 args.eval_times2 = 2 ** 2 args.worker_num = 1 # VecEnv, worker number = 1 args.learner_gpus = [(i,) for i in range(0, 8)] # 8 agents (1 GPU per agent) performing tournament-based ensemble training train_and_evaluate_mp(args, python_path='.../bin/python3') ``` -------------------------------- ### Import Packages for ElegantRL Source: https://github.com/ai4finance-foundation/elegantrl/blob/master/docs/source/helloworld/quickstart.md Import necessary modules from ElegantRL and configure gym logger. This is the first step before setting up the agent and environment. ```python from elegantrl_helloworld.demo import * gym.logger.set_level(40) # Block warning ``` -------------------------------- ### Import Necessary Packages Source: https://github.com/ai4finance-foundation/elegantrl/blob/master/tutorial_BipedalWalker_v3.ipynb Imports the required libraries for reinforcement learning, including OpenAI Gym for the environment and specific components from ElegantRL for agents and training. ```python import gym from elegantrl.agents import AgentPPO from elegantrl.train.config import get_gym_env_args, Config from elegantrl.train.run import * gym.logger.set_level(40) # Block warning ``` -------------------------------- ### AgentPPO: Proximal Policy Optimization for Continuous and Discrete Spaces Source: https://context7.com/ai4finance-foundation/elegantrl/llms.txt Implements PPO with GAE. Use `AgentPPO` for continuous action spaces and `AgentDiscretePPO` for discrete action spaces. Configure parameters like `ratio_clip`, `lambda_gae_adv`, and `lambda_entropy` for tuning. `if_use_v_trace` can be enabled for sparse rewards. ```python import gym from elegantrl import Config, train_agent from elegantrl.agents.AgentPPO import AgentPPO, AgentDiscretePPO from elegantrl.train.config import get_gym_env_args # Continuous action space (BipedalWalker) env_args = get_gym_env_args(gym.make('BipedalWalker-v3'), if_print=False) args = Config(agent_class=AgentPPO, env_class=gym.make, env_args=env_args) args.net_dims = [256, 256] args.horizon_len = 4096 args.batch_size = 512 args.repeat_times = 8.0 args.gamma = 0.98 args.break_step = int(4e6) args.break_score = 300.0 args.ratio_clip = 0.25 # PPO clip epsilon args.lambda_gae_adv = 0.95 # GAE lambda args.lambda_entropy = 0.001 # entropy bonus weight args.if_use_v_trace = True # V-trace advantage estimation train_agent(args, if_single_process=True) # Discrete action space (CartPole) import gymnasium as gym as gymnasium env_args2 = { 'env_name': 'CartPole-v1', 'num_envs': 1, 'max_step': 500, 'state_dim': 4, 'action_dim': 2, 'if_discrete': True } args2 = Config(agent_class=AgentDiscretePPO, env_class=gymnasium.make, env_args=env_args2) args2.break_step = int(5e4) args2.break_score = 450.0 train_agent(args2, if_single_process=True) ```