### Install Softlearning with Conda Source: https://context7.com/rail-berkeley/softlearning/llms.txt Clone the repository, set up a Conda environment using the provided `environment.yml` file, activate the environment, and install the package in editable mode. ```bash git clone https://github.com/rail-berkeley/softlearning.git cd softlearning conda env create -f environment.yml conda activate softlearning pip install -e . ``` -------------------------------- ### Get Help for Development Script Source: https://github.com/rail-berkeley/softlearning/blob/master/README.md Run this command to view all available arguments and their descriptions for the development script, allowing for detailed customization of training and simulation parameters. ```bash python ./examples/development/main.py --help ``` -------------------------------- ### Set up Softlearning with Docker Source: https://context7.com/rail-berkeley/softlearning/llms.txt Build and start the Docker container for development (CPU). Access the running container using `docker exec`. Teardown the container and associated volumes when finished. ```bash export MJKEY=$(cat ~/.mujoco/mjkey.txt) docker-compose -f ./docker/docker-compose.dev.cpu.yml up -d --force-recreate ``` ```bash docker exec -it softlearning bash ``` ```bash docker-compose -f ./docker/docker-compose.dev.cpu.yml down --rmi all --volumes ``` -------------------------------- ### `run_example_local` — Programmatic API Source: https://context7.com/rail-berkeley/softlearning/llms.txt Programmatically launches an experiment locally using Ray and Ray Tune. Parses the example module's argument spec, builds the variant, and calls `tune.run`. ```APIDOC ## `run_example_local` — Programmatic API ### Description Programmatically launches an experiment locally using Ray and Ray Tune. Parses the example module's argument spec, builds the variant, and calls `tune.run`. ### Method Signature ```python from examples.instrument import run_example_local run_example_local(example_module_name: str, example_argv: list) ``` ### Parameters - **example_module_name** (str) - The name of the example module to run. - **example_argv** (list) - A list of strings representing the command-line arguments to pass to the example module. ### Request Example ```python from examples.instrument import run_example_local # Run SAC on Pendulum-v0 locally in non-parallel mode run_example_local( example_module_name='examples.development', example_argv=[ '--algorithm', 'SAC', '--universe', 'gym', '--domain', 'Pendulum', '--task', 'v0', '--exp-name', 'pendulum-test', '--checkpoint-frequency', '100', '--trial-cpus', '2', ] ) ``` ``` -------------------------------- ### Resume Training from Checkpoint Source: https://github.com/rail-berkeley/softlearning/blob/master/README.md To resume training from a saved checkpoint, run the original example main-script with the --restore flag, providing the path to the checkpoint. ```bash softlearning run_example_local examples.development \ --algorithm SAC \ --universe gym \ --domain HalfCheetah \ --task v3 \ --exp-name my-sac-experiment-1 \ --checkpoint-frequency 1000 \ --restore ${SAC_CHECKPOINT_PATH} ``` -------------------------------- ### CLI: softlearning run_example_local Source: https://context7.com/rail-berkeley/softlearning/llms.txt The primary entry point for launching training experiments locally. It initializes a Ray cluster, wraps the experiment in Ray Tune's `Trainable` interface, and runs the specified example module with the given hyperparameters. ```APIDOC ## CLI: `softlearning run_example_local` ### Description The primary entry point for launching training experiments locally. It initializes a Ray cluster, wraps the experiment in Ray Tune's `Trainable` interface, and runs the specified example module with the given hyperparameters. ### Usage Examples Train SAC on HalfCheetah-v3 with checkpointing every 1000 iterations: ```bash softlearning run_example_local examples.development \ --algorithm SAC \ --universe gym \ --domain HalfCheetah \ --task v3 \ --exp-name my-sac-experiment \ --checkpoint-frequency 1000 \ --trial-gpus 1 \ --num-samples 3 ``` Train SAC on a dm_control task: ```bash softlearning run_example_local examples.development \ --algorithm SAC \ --universe dm_control \ --domain cheetah \ --task run \ --exp-name dm-cheetah-run \ --checkpoint-frequency 500 ``` Train SQL on Hopper-v3: ```bash softlearning run_example_local examples.development \ --algorithm SQL \ --universe gym \ --domain Hopper \ --task v3 \ --exp-name sql-hopper ``` Resume from a saved checkpoint (currently broken — see README): ```bash softlearning run_example_local examples.development \ --algorithm SAC \ --universe gym \ --domain HalfCheetah \ --task v3 \ --exp-name resumed-experiment \ --checkpoint-frequency 1000 \ --restore ~/ray_results/gym/HalfCheetah/v3/2024-01-01T00-00-00-my-sac-experiment-0/checkpoint_1000/ ``` ``` -------------------------------- ### Create and Activate Conda Environment Source: https://github.com/rail-berkeley/softlearning/blob/master/README.md Create a conda environment from the provided environment.yml file, activate it, and install softlearning in editable mode for command-line interface access. ```bash cd ${SOFTLEARNING_PATH} conda env create -f environment.yml conda activate softlearning pip install -e ${SOFTLEARNING_PATH} ``` -------------------------------- ### Instantiate and Train Soft Actor-Critic (SAC) Source: https://context7.com/rail-berkeley/softlearning/llms.txt This snippet demonstrates the setup and training loop for the Soft Actor-Critic (SAC) algorithm. It requires defining environment parameters, policy, Q-functions, replay pool, and sampler. ```python from softlearning.algorithms.sac import SAC from softlearning.environments.utils import get_environment_from_params from softlearning.policies.gaussian_policy import FeedforwardGaussianPolicy from softlearning import value_functions, replay_pools, samplers # Build environment env_params = {'universe': 'gym', 'domain': 'HalfCheetah', 'task': 'v3', 'kwargs': {}} training_env = get_environment_from_params({'training': env_params}['training']) eval_env = get_environment_from_params({'training': env_params}['training']) # Build policy policy = FeedforwardGaussianPolicy( hidden_layer_sizes=(256, 256), squash=True, input_shapes=training_env.observation_shape, output_shape=training_env.action_shape, action_range=(training_env.action_space.low, training_env.action_space.high), ) # Build Q-functions (double Q to reduce overestimation) Qs = value_functions.get({ 'class_name': 'double_feedforward_Q_function', 'config': { 'hidden_layer_sizes': (256, 256), 'input_shapes': (training_env.observation_shape, training_env.action_shape), } }) # Build replay pool and sampler pool = replay_pools.get({ 'class_name': 'SimpleReplayPool', 'config': {'max_size': int(1e6), 'environment': training_env}, }) sampler = samplers.get({ 'class_name': 'SimpleSampler', 'config': {'environment': training_env, 'policy': policy, 'pool': pool, 'max_path_length': 1000}, }) # Instantiate SAC sac = SAC( training_environment=training_env, evaluation_environment=eval_env, policy=policy, Qs=Qs, pool=pool, sampler=sampler, policy_lr=3e-4, Q_lr=3e-4, alpha_lr=3e-4, target_entropy='auto', # heuristic: -|A| discount=0.99, tau=5e-3, n_epochs=3000, epoch_length=1000, batch_size=256, ) # Run training (generator-based) for diagnostics in sac.train(): if diagnostics.get('done'): break print(f"epoch={diagnostics['epoch']} " f"eval_reward={diagnostics['evaluation']['episode-reward-mean']:.2f} " f"alpha={diagnostics['alpha']:.4f}") ``` -------------------------------- ### Clean Up Docker Setup Source: https://github.com/rail-berkeley/softlearning/blob/master/README.md Stop and remove all services defined in the docker-compose file, including associated images and volumes. ```bash docker-compose \ -f ./docker/docker-compose.dev.cpu.yml \ down \ --rmi all \ --volumes ``` -------------------------------- ### Get Experiment Configuration (Python) Source: https://context7.com/rail-berkeley/softlearning/llms.txt Builds a complete experiment specification dict from universe/domain/task arguments. It automatically resolves environment-specific parameters, epoch lengths, and timestep budgets. ```python from examples.development.variants import ( get_variant_spec, get_variant_spec_base, get_total_timesteps, get_epoch_length, get_max_path_length, ) import argparse # Look up built-in defaults print(get_total_timesteps('gym', 'HalfCheetah', 'v3')) # 3000000 print(get_epoch_length('gym', 'HalfCheetah', 'v3')) # 25000 print(get_max_path_length('gym', 'HalfCheetah', 'v3')) # 1000 # Build full variant spec from parsed args args = argparse.Namespace( universe='gym', domain='Ant', task='v3', policy='gaussian', algorithm='SAC', checkpoint_replay_pool=None, ) variant = get_variant_spec(args) print(variant['algorithm_params']['config']['n_epochs']) # 120 (3e6 / 25000) print(variant['policy_params']['class_name']) # 'FeedforwardGaussianPolicy' print(variant['Q_params']['class_name']) # 'double_feedforward_Q_function' # Variant spec for image-based environment (auto-adds ConvNet preprocessors) args_img = argparse.Namespace( universe='dm_control', domain='cheetah', task='run', policy='gaussian', algorithm='SAC', checkpoint_replay_pool=None, ) variant_img = get_variant_spec(args_img) # variant_img['policy_params']['config']['preprocessors'] will contain # a convnet_preprocessor config when pixel observations are detected ``` -------------------------------- ### Programmatically run SAC on Pendulum-v0 Source: https://context7.com/rail-berkeley/softlearning/llms.txt Launches a local experiment programmatically using `run_example_local`. This example runs SAC on Pendulum-v0 in non-parallel mode, specifying experiment parameters. ```python from examples.instrument import run_example_local # Run SAC on Pendulum-v0 locally in non-parallel mode run_example_local( example_module_name='examples.development', example_argv=[ '--algorithm', 'SAC', '--universe', 'gym', '--domain', 'Pendulum', '--task', 'v0', '--exp-name', 'pendulum-test', '--checkpoint-frequency', '100', '--trial-cpus', '2', ] ) ``` -------------------------------- ### Get dm_control Environment with Pixel Observations Source: https://context7.com/rail-berkeley/softlearning/llms.txt This snippet shows how to obtain a dm_control environment configured for pixel observations. Ensure the 'dm_control' universe and desired domain/task are specified. ```python pixel_env = get_environment_from_params({ 'universe': 'dm_control', 'domain': 'cheetah', 'task': 'run', 'kwargs': { 'pixel_wrapper_kwargs': { 'pixels_only': True, 'render_kwargs': {'width': 84, 'height': 84, 'camera_id': 0}, } } }) ``` -------------------------------- ### Instantiate a Softlearning environment using get_environment_from_params Source: https://context7.com/rail-berkeley/softlearning/llms.txt Uses `get_environment_from_params` to create a `SoftlearningEnv` from a parameter dictionary, including custom environment parameters. Samples an action, steps the environment, and prints observation shape and action shape. ```python # Method 2: from params dict (used internally by ExperimentRunner) env = get_environment_from_params({ 'universe': 'gym', 'domain': 'Ant', 'task': 'v3', 'kwargs': { 'healthy_reward': 0.0, 'exclude_current_positions_from_observation': False, } }) action = env.action_space.sample() obs, reward, done, info = env.step(action) print(obs.keys()) # dict_keys(['observations']) print(reward) # float print(env.observation_shape, env.action_shape) ``` -------------------------------- ### Instantiate a Softlearning environment using get_environment Source: https://context7.com/rail-berkeley/softlearning/llms.txt Uses the `get_environment` utility function to create a `SoftlearningEnv` instance for the HalfCheetah-v3 Gym environment. Resets the environment and prints the initial observation. ```python from softlearning.environments.utils import get_environment, get_environment_from_params # Method 1: direct call env = get_environment( universe='gym', domain='HalfCheetah', task='v3', environment_params={} ) obs = env.reset() print(obs) # {'observations': array([...], dtype=float32)} ``` -------------------------------- ### Simulate Policy Rollouts (Python) Source: https://context7.com/rail-berkeley/softlearning/llms.txt Loads a trained policy from a Ray Tune checkpoint and runs evaluation rollouts. Can render to screen or save an MP4 video. Ensure the checkpoint path is correct. ```python from examples.development.simulate_policy import simulate_policy # Render to screen (human mode) paths = simulate_policy( checkpoint_path='~/ray_results/gym/HalfCheetah/v3/' '2024-01-01T00-00-00-my-sac-experiment-0/' 'checkpoint_1000/', num_rollouts=5, max_path_length=1000, render_kwargs={'mode': 'human'}, ) print(f"Mean return: {sum(p['rewards'].sum() for p in paths) / len(paths):.2f}") # Save video to disk (rgb_array mode) paths = simulate_policy( checkpoint_path='~/ray_results/gym/HalfCheetah/v3/' '2024-01-01T00-00-00-my-sac-experiment-0/' 'checkpoint_1000/', num_rollouts=3, max_path_length=1000, render_kwargs={'mode': 'rgb_array'}, video_save_path='/tmp/policy_video/', ) ``` -------------------------------- ### Run SQL on Hopper-v3 locally Source: https://context7.com/rail-berkeley/softlearning/llms.txt Launches a local training experiment for SQL on the Hopper-v3 environment, specifying the algorithm, universe, domain, task, and experiment name. ```bash softlearning run_example_local examples.development \ --algorithm SQL \ --universe gym \ --domain Hopper \ --task v3 \ --exp-name sql-hopper ``` -------------------------------- ### Run SAC on HalfCheetah-v3 locally Source: https://context7.com/rail-berkeley/softlearning/llms.txt Launches a local training experiment for SAC on the HalfCheetah-v3 environment. Configures checkpointing, GPU usage, and parallel seeds. ```bash softlearning run_example_local examples.development \ --algorithm SAC \ --universe gym \ --domain HalfCheetah \ --task v3 \ --exp-name my-sac-experiment \ --checkpoint-frequency 1000 \ --trial-gpus 1 \ --num-samples 3 # run 3 random seeds in parallel ``` -------------------------------- ### Initialize FlexibleReplayPool Source: https://context7.com/rail-berkeley/softlearning/llms.txt Sets up a flexible replay pool with specified fields for storing environment data. Define the `dtype` and `shape` for each field, matching your environment's observation, action, reward, and terminal signals. ```python from softlearning.replay_pools.flexible_replay_pool import FlexibleReplayPool, Field import numpy as np # Define fields matching your environment fields = { 'observations': Field(name='observations', dtype='float32', shape=(17,)), 'actions': Field(name='actions', dtype='float32', shape=(6,)), 'rewards': Field(name='rewards', dtype='float32', shape=(1,)), 'next_observations': Field(name='next_observations', dtype='float32', shape=(17,)), 'terminals': Field(name='terminals', dtype='bool', shape=(1,)), } pool = FlexibleReplayPool(max_size=int(1e6), fields=fields) ``` -------------------------------- ### simulate_policy Source: https://context7.com/rail-berkeley/softlearning/llms.txt Loads a trained policy from a Ray Tune checkpoint directory and runs evaluation rollouts, optionally rendering to screen or saving an MP4 video. ```APIDOC ## simulate_policy — Policy Rollout Visualization Loads a trained policy from a Ray Tune checkpoint directory and runs evaluation rollouts, optionally rendering to screen or saving an MP4 video. ```python from examples.development.simulate_policy import simulate_policy # Render to screen (human mode) paths = simulate_policy( checkpoint_path='~/ray_results/gym/HalfCheetah/v3/' '2024-01-01T00-00-00-my-sac-experiment-0/' 'checkpoint_1000/', num_rollouts=5, max_path_length=1000, render_kwargs={'mode': 'human'}, ) print(f"Mean return: {sum(p['rewards'].sum() for p in paths) / len(paths):.2f}") # Save video to disk (rgb_array mode) paths = simulate_policy( checkpoint_path='~/ray_results/gym/HalfCheetah/v3/' '2024-01-01T00-00-00-my-sac-experiment-0/' 'checkpoint_1000/', num_rollouts=3, max_path_length=1000, render_kwargs={'mode': 'rgb_array'}, video_save_path='/tmp/policy_video/', ) ``` ```bash # Equivalent CLI usage python -m examples.development.simulate_policy \ ~/ray_results/gym/HalfCheetah/v3/2024-01-01T00-00-00-my-sac-experiment/checkpoint_1000/ \ --max-path-length 1000 \ --num-rollouts 5 \ --render-kwargs '{"mode": "human"}' ``` ``` -------------------------------- ### Resume a Softlearning experiment from a checkpoint Source: https://context7.com/rail-berkeley/softlearning/llms.txt Launches a local training experiment, resuming from a previously saved checkpoint. Note: This functionality is currently marked as broken in the source documentation. ```bash softlearning run_example_local examples.development \ --algorithm SAC \ --universe gym \ --domain HalfCheetah \ --task v3 \ --exp-name resumed-experiment \ --checkpoint-frequency 1000 \ --restore ~/ray_results/gym/HalfCheetah/v3/2024-01-01T00-00-00-my-sac-experiment-0/checkpoint_1000/ ``` -------------------------------- ### Initialize ExperimentRunner with Ray Tune Source: https://context7.com/rail-berkeley/softlearning/llms.txt Initializes Ray and the `ExperimentRunner` trainable for managing the experiment lifecycle. This includes component building, training loops, and checkpointing. Configure Ray with the desired number of CPUs and GPUs. ```python from examples.development.main import ExperimentRunner import ray from ray import tune ray.init(num_cpus=4, num_gpus=1) ``` -------------------------------- ### Clone Softlearning Repository Source: https://github.com/rail-berkeley/softlearning/blob/master/README.md Clone the softlearning repository to your local machine. Set the SOFTLEARNING_PATH environment variable to the desired location. ```bash git clone https://github.com/rail-berkeley/softlearning.git ${SOFTLEARNING_PATH} ``` -------------------------------- ### Build and Run Docker Container Source: https://github.com/rail-berkeley/softlearning/blob/master/README.md Build the Docker image and run the container in detached mode. Ensure your MuJoCo license key is available in ~/.mujoco/mjkey.txt. ```bash export MJKEY="$(cat ~/.mujoco/mjkey.txt)" \ && docker-compose \ -f ./docker/docker-compose.dev.cpu.yml \ up \ -d \ --force-recreate ``` -------------------------------- ### Sample Actions with FeedforwardGaussianPolicy Source: https://context7.com/rail-berkeley/softlearning/llms.txt Demonstrates sampling actions and their log probabilities from a trained policy. Use `policy.actions()` for stochastic actions during training and `policy.evaluation_mode()` for deterministic actions during evaluation. `policy.get_diagnostics()` provides performance metrics. ```python # Sample stochastic actions during training observations = {'observations': np.random.randn(32, 17).astype('float32')} actions = policy.actions(observations) # shape: (32, 6) # Get actions + log probabilities in a single forward pass (numerically stable) actions, log_pis = policy.actions_and_log_probs(observations) print(actions.shape, log_pis.shape) # (32, 6) (32, 1) # Switch to deterministic evaluation mode (returns mean of distribution) with policy.evaluation_mode(): det_actions = policy.actions(observations) # shape: (32, 6), deterministic # Policy diagnostics (mean/std of shifts, scales, entropy, actions) diag = policy.get_diagnostics(observations) print(dict(diag)) # {'shifts-mean': ..., 'scales-mean': ..., 'entropy-mean': ..., 'actions-mean': ...} ``` -------------------------------- ### Simulate Policy Rollouts (CLI) Source: https://context7.com/rail-berkeley/softlearning/llms.txt Equivalent command-line usage for simulating policy rollouts. This is useful for running simulations directly from the terminal. ```bash python -m examples.development.simulate_policy \ ~/ray_results/gym/HalfCheetah/v3/2024-01-01T00-00-00-my-sac-experiment/checkpoint_1000/ \ --max-path-length 1000 \ --num-rollouts 5 \ --render-kwargs '{"mode": "human"}' ``` -------------------------------- ### Add and Sample Data with FlexibleReplayPool Source: https://context7.com/rail-berkeley/softlearning/llms.txt Adds a full episode path to the replay pool and samples batches for training. Supports random batches and sequence batches with episode-boundary masking for recurrent policies. Experience can be incrementally saved and restored. ```python # Add a full episode path at once path = { 'observations': np.random.randn(200, 17).astype('float32'), 'actions': np.random.randn(200, 6).astype('float32'), 'rewards': np.random.randn(200, 1).astype('float32'), 'next_observations': np.random.randn(200, 17).astype('float32'), 'terminals': np.zeros((200, 1), dtype=bool), } pool.add_path(path) print(pool.size) # 200 # Sample a random batch for training batch = pool.random_batch(batch_size=256) print(batch['observations'].shape) # (256, 17) print(batch['rewards'].shape) # (256, 1) # Sample a sequence batch with episode-boundary masking (for recurrent policies) seq_batch = pool.random_sequence_batch(batch_size=32, sequence_length=10) print(seq_batch['mask'].shape) # (32, 10) — False beyond episode boundary # Save and restore experience incrementally pool.save_latest_experience('/tmp/replay_pool_checkpoint.pkl.gz') pool.load_experience('/tmp/replay_pool_checkpoint.pkl.gz') ``` -------------------------------- ### Instantiate and Train Soft Q-Learning (SQL) Source: https://context7.com/rail-berkeley/softlearning/llms.txt This code sets up and runs the Soft Q-Learning (SQL) algorithm, an energy-based off-policy method using SVGD. It's suitable for multimodal action distributions and requires similar components to SAC, with specific parameters for the kernel. ```python from softlearning.algorithms.sql import SQL from softlearning.misc.kernel import adaptive_isotropic_gaussian_kernel sql = SQL( training_environment=training_env, evaluation_environment=eval_env, policy=policy, Qs=Qs, pool=pool, sampler=sampler, policy_lr=3e-4, Q_lr=3e-4, reward_scale=30, # task-dependent; HalfCheetah uses 30 discount=0.99, tau=5e-3, kernel_fn=adaptive_isotropic_gaussian_kernel, kernel_n_particles=16, kernel_update_ratio=0.5, value_n_particles=16, n_epochs=3000, epoch_length=1000, batch_size=256, ) for diagnostics in sql.train(): if diagnostics.get('done'): break print(f"epoch={diagnostics['epoch']} " f"Q_loss={diagnostics['update']['Q_loss-mean']:.4f}") ``` -------------------------------- ### Run SAC on a dm_control task locally Source: https://context7.com/rail-berkeley/softlearning/llms.txt Launches a local training experiment for SAC on a dm_control environment, specifying the universe, domain, task, experiment name, and checkpoint frequency. ```bash softlearning run_example_local examples.development \ --algorithm SAC \ --universe dm_control \ --domain cheetah \ --task run \ --exp-name dm-cheetah-run \ --checkpoint-frequency 500 ``` -------------------------------- ### `get_environment` / `get_environment_from_params` Source: https://context7.com/rail-berkeley/softlearning/llms.txt Factory functions that instantiate a `SoftlearningEnv` from a universe/domain/task triple, routing to the appropriate adapter (Gym, DmControl, or RoboSuite). ```APIDOC ## `get_environment` / `get_environment_from_params` ### Description Factory functions that instantiate a `SoftlearningEnv` from a universe/domain/task triple, routing to the appropriate adapter (Gym, DmControl, or RoboSuite). ### Method Signatures ```python from softlearning.environments.utils import get_environment, get_environment_from_params get_environment(universe: str, domain: str, task: str, environment_params: dict) get_environment_from_params(params: dict) ``` ### Parameters - **universe** (str) - The name of the environment universe (e.g., 'gym', 'dm_control'). - **domain** (str) - The name of the environment domain (e.g., 'HalfCheetah', 'cheetah'). - **task** (str) - The name of the environment task (e.g., 'v3', 'run'). - **environment_params** (dict) - Additional parameters for the environment. - **params** (dict) - A dictionary containing environment configuration, including 'universe', 'domain', 'task', and 'kwargs'. ### Request Examples Method 1: direct call ```python from softlearning.environments.utils import get_environment env = get_environment( universe='gym', domain='HalfCheetah', task='v3', environment_params={} ) obs = env.reset() print(obs) # {'observations': array([...], dtype=float32)} ``` Method 2: from params dict (used internally by ExperimentRunner) ```python from softlearning.environments.utils import get_environment_from_params env = get_environment_from_params({ 'universe': 'gym', 'domain': 'Ant', 'task': 'v3', 'kwargs': { 'healthy_reward': 0.0, 'exclude_current_positions_from_observation': False, } }) action = env.action_space.sample() obs, reward, done, info = env.step(action) print(obs.keys()) # dict_keys(['observations']) print(reward) # float print(env.observation_shape, env.action_shape) ``` ``` -------------------------------- ### Access Docker Container Source: https://github.com/rail-berkeley/softlearning/blob/master/README.md Connect to the running softlearning Docker container using the 'docker exec' command. ```bash docker exec -it softlearning bash ``` -------------------------------- ### Run Experiment with Variant Spec Source: https://context7.com/rail-berkeley/softlearning/llms.txt This code snippet shows how to run an experiment using the `tune.run` function with a predefined variant specification. It configures the environment, policy, Q-function, and algorithm parameters. ```python variant = { 'environment_params': { 'training': {'universe': 'gym', 'domain': 'Hopper', 'task': 'v3', 'kwargs': {}} }, 'policy_params': { 'class_name': 'FeedforwardGaussianPolicy', 'config': {'hidden_layer_sizes': (256, 256), 'squash': True, 'observation_keys': None, 'preprocessors': None}, }, 'Q_params': { 'class_name': 'double_feedforward_Q_function', 'config': {'hidden_layer_sizes': (256, 256), 'observation_keys': None, 'preprocessors': None}, }, 'algorithm_params': { 'class_name': 'SAC', 'config': {'n_epochs': 1000, 'epoch_length': 1000, 'batch_size': 256, 'min_pool_size': 1000, 'train_every_n_steps': 1, 'n_train_repeat': 1, 'eval_n_episodes': 1, 'policy_lr': 3e-4, 'Q_lr': 3e-4, 'alpha_lr': 3e-4, 'discount': 0.99, 'tau': 5e-3, 'target_entropy': 'auto'}, }, 'replay_pool_params': { 'class_name': 'SimpleReplayPool', 'config': {'max_size': int(1e6)}, }, 'sampler_params': { 'class_name': 'SimpleSampler', 'config': {'max_path_length': 1000}, }, 'run_params': { 'seed': 42, 'checkpoint_at_end': True, 'checkpoint_frequency': 100, 'checkpoint_replay_pool': False, }, } tune.run( ExperimentRunner, name='hopper-sac', config=variant, local_dir='~/ray_results', checkpoint_freq=100, checkpoint_at_end=True, num_samples=1, ) ``` -------------------------------- ### Simulate Trained Policy Source: https://github.com/rail-berkeley/softlearning/blob/master/README.md This command simulates a trained agent's policy. Ensure the SAC_CHECKPOINT_DIR environment variable is set to the absolute path of the saved checkpoint. Customize simulation parameters like max path length, number of rollouts, and rendering mode. ```python python -m examples.development.simulate_policy \ ${SAC_CHECKPOINT_DIR} \ --max-path-length 1000 \ --num-rollouts 1 \ --render-kwargs '{"mode": "human"}' ``` -------------------------------- ### Wrap OpenAI Gym Environment with GymAdapter Source: https://context7.com/rail-berkeley/softlearning/llms.txt Use GymAdapter to wrap standard or pixel-based OpenAI Gym environments. It normalizes action ranges, optionally rescales observations, and removes the TimeLimit wrapper by default. ```python from softlearning.environments.adapters.gym_adapter import GymAdapter # Standard state-based environment env = GymAdapter( domain='Walker2d', task='v3', rescale_action_range=(-1.0, 1.0), unwrap_time_limit=True ) ob s = env.reset() # {'observations': np.ndarray} action = env.action_space.sample() obs, reward, done, info = env.step(action) # Pixel-based environment pixel_env = GymAdapter( domain='HalfCheetah', task='v3', pixel_wrapper_kwargs={ 'pixels_only': True, 'render_kwargs': {'width': 84, 'height': 84}, } ) ob s = pixel_env.reset() # {'pixels': np.ndarray of shape (84, 84, 3)} # Custom Gym env with observation rescaling rescaled_env = GymAdapter( domain='Pendulum', task='v0', rescale_observation_range=(-1.0, 1.0), ) ``` -------------------------------- ### Train Agent with SAC Algorithm Source: https://github.com/rail-berkeley/softlearning/blob/master/README.md Use this command to train an agent using the Soft Actor-Critic (SAC) algorithm. Specify the environment (universe, domain, task), experiment name, and checkpoint frequency for saving training progress. ```bash softlearning run_example_local examples.development \ --algorithm SAC \ --universe gym \ --domain HalfCheetah \ --task v3 \ --exp-name my-sac-experiment-1 \ --checkpoint-frequency 1000 # Save the checkpoint to resume training later ``` -------------------------------- ### Initialize FeedforwardGaussianPolicy Source: https://context7.com/rail-berkeley/softlearning/llms.txt Initializes a squashed Gaussian policy using a feedforward network. Configure hidden layer sizes, activation functions, input/output shapes, and action ranges. The `squash=True` argument applies a tanh bijector to constrain actions to [-1, 1]. ```python from softlearning.policies.gaussian_policy import FeedforwardGaussianPolicy import numpy as np policy = FeedforwardGaussianPolicy( hidden_layer_sizes=(256, 256), squash=True, # applies tanh bijector to keep actions in [-1, 1] activation='relu', input_shapes=((17,),), # HalfCheetah observation dim output_shape=(6,), # HalfCheetah action dim action_range=(np.full(6, -1.0), np.full(6, 1.0)), ) ``` -------------------------------- ### BibTeX Citation for Soft Actor-Critic Algorithms and Applications Source: https://github.com/rail-berkeley/softlearning/blob/master/README.md This is the BibTeX entry for citing the 'Soft Actor-Critic Algorithms and Applications' paper, which is relevant for academic research using Softlearning. ```bibtex @techreport{haarnoja2018sacapps, title={Soft Actor-Critic Algorithms and Applications}, author={Tuomas Haarnoja and Aurick Zhou and Kristian Hartikainen and George Tucker and Sehoon Ha and Jie Tan and Vikash Kumar and Henry Zhu and Abhishek Gupta and Pieter Abbeel and Sergey Levine}, journal={arXiv preprint arXiv:1812.05905}, year={2018} } ``` -------------------------------- ### Build Feedforward Neural Network with feedforward_model Source: https://context7.com/rail-berkeley/softlearning/llms.txt Constructs a `tf.keras.Sequential` feedforward network. Configure hidden layer sizes, activation functions, and output shape/activation. This is used for building Q-functions and policy networks. ```python from softlearning.models.feedforward import feedforward_model import tensorflow as tf import numpy as np # Build a 2-layer MLP that outputs 256 values model = feedforward_model( hidden_layer_sizes=(256, 256), output_shape=(1,), # single Q-value output activation='relu', output_activation='linear', ) # Forward pass: accepts concatenated observations+actions obs_action = np.random.randn(32, 23).astype('float32') # 17-dim obs + 6-dim action q_values = model(obs_action) print(q_values.shape) # (32, 1) # Build a policy network head (outputs mean + log_std concatenated) policy_net = feedforward_model( hidden_layer_sizes=(256, 256), output_shape=(12,), # 6-dim mean + 6-dim log_std activation='relu', output_activation='linear', name='policy_network', ) ``` -------------------------------- ### get_variant_spec Source: https://context7.com/rail-berkeley/softlearning/llms.txt Builds a complete, hierarchically merged experiment specification dict from universe/domain/task arguments, resolving environment-specific parameters, epoch lengths, and total timestep budgets automatically. ```APIDOC ## get_variant_spec — Experiment Configuration Builds a complete, hierarchically merged experiment specification dict from universe/domain/task arguments, resolving environment-specific parameters, epoch lengths, and total timestep budgets automatically. ```python from examples.development.variants import ( get_variant_spec, get_variant_spec_base, get_total_timesteps, get_epoch_length, get_max_path_length, ) import argparse # Look up built-in defaults print(get_total_timesteps('gym', 'HalfCheetah', 'v3')) # 3000000 print(get_epoch_length('gym', 'HalfCheetah', 'v3')) # 25000 print(get_max_path_length('gym', 'HalfCheetah', 'v3')) # 1000 # Build full variant spec from parsed args args = argparse.Namespace( universe='gym', domain='Ant', task='v3', policy='gaussian', algorithm='SAC', checkpoint_replay_pool=None, ) variant = get_variant_spec(args) print(variant['algorithm_params']['config']['n_epochs']) # 120 (3e6 / 25000) print(variant['policy_params']['class_name']) # 'FeedforwardGaussianPolicy' print(variant['Q_params']['class_name']) # 'double_feedforward_Q_function' # Variant spec for image-based environment (auto-adds ConvNet preprocessors) args_img = argparse.Namespace( universe='dm_control', domain='cheetah', task='run', policy='gaussian', algorithm='SAC', checkpoint_replay_pool=None, ) variant_img = get_variant_spec(args_img) # variant_img['policy_params']['config']['preprocessors'] will contain # a convnet_preprocessor config when pixel observations are detected ``` ``` -------------------------------- ### Deactivate and Remove Conda Environment Source: https://github.com/rail-berkeley/softlearning/blob/master/README.md Commands to deactivate the current conda environment and remove the 'softlearning' conda environment entirely. ```bash conda deactivate conda remove --name softlearning --all ``` === COMPLETE CONTENT === This response contains all available snippets from this library. No additional content exists. Do not make further requests.