### Install Softlearning with Conda

Source: https://context7.com/rail-berkeley/softlearning/llms.txt

Clone the repository, set up a Conda environment using the provided `environment.yml` file, activate the environment, and install the package in editable mode.

```bash
git clone https://github.com/rail-berkeley/softlearning.git
cd softlearning
conda env create -f environment.yml
conda activate softlearning
pip install -e .
```

--------------------------------

### Get Help for Development Script

Source: https://github.com/rail-berkeley/softlearning/blob/master/README.md

Run this command to view all available arguments and their descriptions for the development script, allowing for detailed customization of training and simulation parameters.

```bash
python ./examples/development/main.py --help
```

--------------------------------

### Set up Softlearning with Docker

Source: https://context7.com/rail-berkeley/softlearning/llms.txt

Build and start the Docker container for development (CPU). Access the running container using `docker exec`. Teardown the container and associated volumes when finished.

```bash
export MJKEY=$(cat ~/.mujoco/mjkey.txt)
docker-compose -f ./docker/docker-compose.dev.cpu.yml up -d --force-recreate
```

```bash
docker exec -it softlearning bash
```

```bash
docker-compose -f ./docker/docker-compose.dev.cpu.yml down --rmi all --volumes
```

--------------------------------

### `run_example_local` — Programmatic API

Source: https://context7.com/rail-berkeley/softlearning/llms.txt

Programmatically launches an experiment locally using Ray and Ray Tune. Parses the example module's argument spec, builds the variant, and calls `tune.run`.

```APIDOC
## `run_example_local` — Programmatic API

### Description
Programmatically launches an experiment locally using Ray and Ray Tune. Parses the example module's argument spec, builds the variant, and calls `tune.run`.

### Method Signature
```python
from examples.instrument import run_example_local

run_example_local(example_module_name: str, example_argv: list)
```

### Parameters
- **example_module_name** (str) - The name of the example module to run.
- **example_argv** (list) - A list of strings representing the command-line arguments to pass to the example module.

### Request Example
```python
from examples.instrument import run_example_local

# Run SAC on Pendulum-v0 locally in non-parallel mode
run_example_local(
    example_module_name='examples.development',
    example_argv=[
        '--algorithm', 'SAC',
        '--universe', 'gym',
        '--domain', 'Pendulum',
        '--task', 'v0',
        '--exp-name', 'pendulum-test',
        '--checkpoint-frequency', '100',
        '--trial-cpus', '2',
    ]
)
```
```

--------------------------------

### Resume Training from Checkpoint

Source: https://github.com/rail-berkeley/softlearning/blob/master/README.md

To resume training from a saved checkpoint, run the original example main-script with the --restore flag, providing the path to the checkpoint.

```bash
softlearning run_example_local examples.development \
    --algorithm SAC \
    --universe gym \
    --domain HalfCheetah \
    --task v3 \
    --exp-name my-sac-experiment-1 \
    --checkpoint-frequency 1000 \
    --restore ${SAC_CHECKPOINT_PATH}
```

--------------------------------

### CLI: softlearning run_example_local

Source: https://context7.com/rail-berkeley/softlearning/llms.txt

The primary entry point for launching training experiments locally. It initializes a Ray cluster, wraps the experiment in Ray Tune's `Trainable` interface, and runs the specified example module with the given hyperparameters.

```APIDOC
## CLI: `softlearning run_example_local`

### Description
The primary entry point for launching training experiments locally. It initializes a Ray cluster, wraps the experiment in Ray Tune's `Trainable` interface, and runs the specified example module with the given hyperparameters.

### Usage Examples

Train SAC on HalfCheetah-v3 with checkpointing every 1000 iterations:
```bash
softlearning run_example_local examples.development \
    --algorithm SAC \
    --universe gym \
    --domain HalfCheetah \
    --task v3 \
    --exp-name my-sac-experiment \
    --checkpoint-frequency 1000 \
    --trial-gpus 1 \
    --num-samples 3
```

Train SAC on a dm_control task:
```bash
softlearning run_example_local examples.development \
    --algorithm SAC \
    --universe dm_control \
    --domain cheetah \
    --task run \
    --exp-name dm-cheetah-run \
    --checkpoint-frequency 500
```

Train SQL on Hopper-v3:
```bash
softlearning run_example_local examples.development \
    --algorithm SQL \
    --universe gym \
    --domain Hopper \
    --task v3 \
    --exp-name sql-hopper
```

Resume from a saved checkpoint (currently broken — see README):
```bash
softlearning run_example_local examples.development \
    --algorithm SAC \
    --universe gym \
    --domain HalfCheetah \
    --task v3 \
    --exp-name resumed-experiment \
    --checkpoint-frequency 1000 \
    --restore ~/ray_results/gym/HalfCheetah/v3/2024-01-01T00-00-00-my-sac-experiment-0/checkpoint_1000/
```
```

--------------------------------

### Create and Activate Conda Environment

Source: https://github.com/rail-berkeley/softlearning/blob/master/README.md

Create a conda environment from the provided environment.yml file, activate it, and install softlearning in editable mode for command-line interface access.

```bash
cd ${SOFTLEARNING_PATH}
conda env create -f environment.yml
conda activate softlearning
pip install -e ${SOFTLEARNING_PATH}
```

--------------------------------

### Instantiate and Train Soft Actor-Critic (SAC)

Source: https://context7.com/rail-berkeley/softlearning/llms.txt

This snippet demonstrates the setup and training loop for the Soft Actor-Critic (SAC) algorithm. It requires defining environment parameters, policy, Q-functions, replay pool, and sampler.

```python
from softlearning.algorithms.sac import SAC
from softlearning.environments.utils import get_environment_from_params
from softlearning.policies.gaussian_policy import FeedforwardGaussianPolicy
from softlearning import value_functions, replay_pools, samplers

# Build environment
env_params = {'universe': 'gym', 'domain': 'HalfCheetah', 'task': 'v3', 'kwargs': {}}
training_env = get_environment_from_params({'training': env_params}['training'])
eval_env     = get_environment_from_params({'training': env_params}['training'])

# Build policy
policy = FeedforwardGaussianPolicy(
    hidden_layer_sizes=(256, 256),
    squash=True,
    input_shapes=training_env.observation_shape,
    output_shape=training_env.action_shape,
    action_range=(training_env.action_space.low, training_env.action_space.high),
)

# Build Q-functions (double Q to reduce overestimation)
Qs = value_functions.get({
    'class_name': 'double_feedforward_Q_function',
    'config': {
        'hidden_layer_sizes': (256, 256),
        'input_shapes': (training_env.observation_shape, training_env.action_shape),
    }
})

# Build replay pool and sampler
pool = replay_pools.get({
    'class_name': 'SimpleReplayPool',
    'config': {'max_size': int(1e6), 'environment': training_env},
})
sampler = samplers.get({
    'class_name': 'SimpleSampler',
    'config': {'environment': training_env, 'policy': policy,
               'pool': pool, 'max_path_length': 1000},
})

# Instantiate SAC
sac = SAC(
    training_environment=training_env,
    evaluation_environment=eval_env,
    policy=policy,
    Qs=Qs,
    pool=pool,
    sampler=sampler,
    policy_lr=3e-4,
    Q_lr=3e-4,
    alpha_lr=3e-4,
    target_entropy='auto',   # heuristic: -|A|
    discount=0.99,
    tau=5e-3,
    n_epochs=3000,
    epoch_length=1000,
    batch_size=256,
)

# Run training (generator-based)
for diagnostics in sac.train():
    if diagnostics.get('done'):
        break
    print(f"epoch={diagnostics['epoch']}  "
          f"eval_reward={diagnostics['evaluation']['episode-reward-mean']:.2f}  "
          f"alpha={diagnostics['alpha']:.4f}")

```

--------------------------------

### Clean Up Docker Setup

Source: https://github.com/rail-berkeley/softlearning/blob/master/README.md

Stop and remove all services defined in the docker-compose file, including associated images and volumes.

```bash
docker-compose \
    -f ./docker/docker-compose.dev.cpu.yml \
    down \
    --rmi all \
    --volumes
```

--------------------------------

### Get Experiment Configuration (Python)

Source: https://context7.com/rail-berkeley/softlearning/llms.txt

Builds a complete experiment specification dict from universe/domain/task arguments. It automatically resolves environment-specific parameters, epoch lengths, and timestep budgets.

```python
from examples.development.variants import (
    get_variant_spec,
    get_variant_spec_base,
    get_total_timesteps,
    get_epoch_length,
    get_max_path_length,
)
import argparse

# Look up built-in defaults
print(get_total_timesteps('gym', 'HalfCheetah', 'v3'))  # 3000000
print(get_epoch_length('gym', 'HalfCheetah', 'v3'))     # 25000
print(get_max_path_length('gym', 'HalfCheetah', 'v3'))  # 1000

# Build full variant spec from parsed args
args = argparse.Namespace(
    universe='gym', domain='Ant', task='v3',
    policy='gaussian', algorithm='SAC',
    checkpoint_replay_pool=None,
)
variant = get_variant_spec(args)
print(variant['algorithm_params']['config']['n_epochs'])   # 120 (3e6 / 25000)
print(variant['policy_params']['class_name'])              # 'FeedforwardGaussianPolicy'
print(variant['Q_params']['class_name'])                   # 'double_feedforward_Q_function'

# Variant spec for image-based environment (auto-adds ConvNet preprocessors)
args_img = argparse.Namespace(
    universe='dm_control', domain='cheetah', task='run',
    policy='gaussian', algorithm='SAC', checkpoint_replay_pool=None,
)
variant_img = get_variant_spec(args_img)
# variant_img['policy_params']['config']['preprocessors'] will contain
# a convnet_preprocessor config when pixel observations are detected
```

--------------------------------

### Programmatically run SAC on Pendulum-v0

Source: https://context7.com/rail-berkeley/softlearning/llms.txt

Launches a local experiment programmatically using `run_example_local`. This example runs SAC on Pendulum-v0 in non-parallel mode, specifying experiment parameters.

```python
from examples.instrument import run_example_local

# Run SAC on Pendulum-v0 locally in non-parallel mode
run_example_local(
    example_module_name='examples.development',
    example_argv=[
        '--algorithm', 'SAC',
        '--universe', 'gym',
        '--domain', 'Pendulum',
        '--task', 'v0',
        '--exp-name', 'pendulum-test',
        '--checkpoint-frequency', '100',
        '--trial-cpus', '2',
    ]
)
```

--------------------------------

### Get dm_control Environment with Pixel Observations

Source: https://context7.com/rail-berkeley/softlearning/llms.txt

This snippet shows how to obtain a dm_control environment configured for pixel observations. Ensure the 'dm_control' universe and desired domain/task are specified.

```python
pixel_env = get_environment_from_params({
    'universe': 'dm_control',
    'domain': 'cheetah',
    'task': 'run',
    'kwargs': {
        'pixel_wrapper_kwargs': {
            'pixels_only': True,
            'render_kwargs': {'width': 84, 'height': 84, 'camera_id': 0},
        }
    }
})

```

--------------------------------

### Instantiate a Softlearning environment using get_environment_from_params

Source: https://context7.com/rail-berkeley/softlearning/llms.txt

Uses `get_environment_from_params` to create a `SoftlearningEnv` from a parameter dictionary, including custom environment parameters. Samples an action, steps the environment, and prints observation shape and action shape.

```python
# Method 2: from params dict (used internally by ExperimentRunner)
env = get_environment_from_params({
    'universe': 'gym',
    'domain': 'Ant',
    'task': 'v3',
    'kwargs': {
        'healthy_reward': 0.0,
        'exclude_current_positions_from_observation': False,
    }
})
action = env.action_space.sample()
obs, reward, done, info = env.step(action)
print(obs.keys())   # dict_keys(['observations'])
print(reward)       # float
print(env.observation_shape, env.action_shape)
```

--------------------------------

### Instantiate a Softlearning environment using get_environment

Source: https://context7.com/rail-berkeley/softlearning/llms.txt

Uses the `get_environment` utility function to create a `SoftlearningEnv` instance for the HalfCheetah-v3 Gym environment. Resets the environment and prints the initial observation.

```python
from softlearning.environments.utils import get_environment, get_environment_from_params

# Method 1: direct call
env = get_environment(
    universe='gym',
    domain='HalfCheetah',
    task='v3',
    environment_params={}
)
obs = env.reset()
print(obs)  # {'observations': array([...], dtype=float32)}
```

--------------------------------

### Simulate Policy Rollouts (Python)

Source: https://context7.com/rail-berkeley/softlearning/llms.txt

Loads a trained policy from a Ray Tune checkpoint and runs evaluation rollouts. Can render to screen or save an MP4 video. Ensure the checkpoint path is correct.

```python
from examples.development.simulate_policy import simulate_policy

# Render to screen (human mode)
paths = simulate_policy(
    checkpoint_path='~/ray_results/gym/HalfCheetah/v3/'
                    '2024-01-01T00-00-00-my-sac-experiment-0/'
                    'checkpoint_1000/',
    num_rollouts=5,
    max_path_length=1000,
    render_kwargs={'mode': 'human'},
)
print(f"Mean return: {sum(p['rewards'].sum() for p in paths) / len(paths):.2f}")

# Save video to disk (rgb_array mode)
paths = simulate_policy(
    checkpoint_path='~/ray_results/gym/HalfCheetah/v3/'
                    '2024-01-01T00-00-00-my-sac-experiment-0/'
                    'checkpoint_1000/',
    num_rollouts=3,
    max_path_length=1000,
    render_kwargs={'mode': 'rgb_array'},
    video_save_path='/tmp/policy_video/',
)
```

--------------------------------

### Run SQL on Hopper-v3 locally

Source: https://context7.com/rail-berkeley/softlearning/llms.txt

Launches a local training experiment for SQL on the Hopper-v3 environment, specifying the algorithm, universe, domain, task, and experiment name.

```bash
softlearning run_example_local examples.development \
    --algorithm SQL \
    --universe gym \
    --domain Hopper \
    --task v3 \
    --exp-name sql-hopper
```

--------------------------------

### Run SAC on HalfCheetah-v3 locally

Source: https://context7.com/rail-berkeley/softlearning/llms.txt

Launches a local training experiment for SAC on the HalfCheetah-v3 environment. Configures checkpointing, GPU usage, and parallel seeds.

```bash
softlearning run_example_local examples.development \
    --algorithm SAC \
    --universe gym \
    --domain HalfCheetah \
    --task v3 \
    --exp-name my-sac-experiment \
    --checkpoint-frequency 1000 \
    --trial-gpus 1 \
    --num-samples 3          # run 3 random seeds in parallel
```

--------------------------------

### Initialize FlexibleReplayPool

Source: https://context7.com/rail-berkeley/softlearning/llms.txt

Sets up a flexible replay pool with specified fields for storing environment data. Define the `dtype` and `shape` for each field, matching your environment's observation, action, reward, and terminal signals.

```python
from softlearning.replay_pools.flexible_replay_pool import FlexibleReplayPool, Field
import numpy as np

# Define fields matching your environment
fields = {
    'observations': Field(name='observations', dtype='float32', shape=(17,)),
    'actions':      Field(name='actions',      dtype='float32', shape=(6,)),
    'rewards':      Field(name='rewards',       dtype='float32', shape=(1,)),
    'next_observations': Field(name='next_observations', dtype='float32', shape=(17,)),
    'terminals':    Field(name='terminals',    dtype='bool',    shape=(1,)),
}

pool = FlexibleReplayPool(max_size=int(1e6), fields=fields)
```

--------------------------------

### simulate_policy

Source: https://context7.com/rail-berkeley/softlearning/llms.txt

Loads a trained policy from a Ray Tune checkpoint directory and runs evaluation rollouts, optionally rendering to screen or saving an MP4 video.

```APIDOC
## simulate_policy — Policy Rollout Visualization

Loads a trained policy from a Ray Tune checkpoint directory and runs evaluation rollouts, optionally rendering to screen or saving an MP4 video.

```python
from examples.development.simulate_policy import simulate_policy

# Render to screen (human mode)
paths = simulate_policy(
    checkpoint_path='~/ray_results/gym/HalfCheetah/v3/'
                    '2024-01-01T00-00-00-my-sac-experiment-0/'
                    'checkpoint_1000/',
    num_rollouts=5,
    max_path_length=1000,
    render_kwargs={'mode': 'human'},
)
print(f"Mean return: {sum(p['rewards'].sum() for p in paths) / len(paths):.2f}")

# Save video to disk (rgb_array mode)
paths = simulate_policy(
    checkpoint_path='~/ray_results/gym/HalfCheetah/v3/'
                    '2024-01-01T00-00-00-my-sac-experiment-0/'
                    'checkpoint_1000/',
    num_rollouts=3,
    max_path_length=1000,
    render_kwargs={'mode': 'rgb_array'},
    video_save_path='/tmp/policy_video/',
)
```

```bash
# Equivalent CLI usage
python -m examples.development.simulate_policy \
    ~/ray_results/gym/HalfCheetah/v3/2024-01-01T00-00-00-my-sac-experiment/checkpoint_1000/ \
    --max-path-length 1000 \
    --num-rollouts 5 \
    --render-kwargs '{"mode": "human"}'
```
```

--------------------------------

### Resume a Softlearning experiment from a checkpoint

Source: https://context7.com/rail-berkeley/softlearning/llms.txt

Launches a local training experiment, resuming from a previously saved checkpoint. Note: This functionality is currently marked as broken in the source documentation.

```bash
softlearning run_example_local examples.development \
    --algorithm SAC \
    --universe gym \
    --domain HalfCheetah \
    --task v3 \
    --exp-name resumed-experiment \
    --checkpoint-frequency 1000 \
    --restore ~/ray_results/gym/HalfCheetah/v3/2024-01-01T00-00-00-my-sac-experiment-0/checkpoint_1000/
```

--------------------------------

### Initialize ExperimentRunner with Ray Tune

Source: https://context7.com/rail-berkeley/softlearning/llms.txt

Initializes Ray and the `ExperimentRunner` trainable for managing the experiment lifecycle. This includes component building, training loops, and checkpointing. Configure Ray with the desired number of CPUs and GPUs.

```python
from examples.development.main import ExperimentRunner
import ray
from ray import tune

ray.init(num_cpus=4, num_gpus=1)

```

--------------------------------

### Clone Softlearning Repository

Source: https://github.com/rail-berkeley/softlearning/blob/master/README.md

Clone the softlearning repository to your local machine. Set the SOFTLEARNING_PATH environment variable to the desired location.

```bash
git clone https://github.com/rail-berkeley/softlearning.git ${SOFTLEARNING_PATH}
```

--------------------------------

### Build and Run Docker Container

Source: https://github.com/rail-berkeley/softlearning/blob/master/README.md

Build the Docker image and run the container in detached mode. Ensure your MuJoCo license key is available in ~/.mujoco/mjkey.txt.

```bash
export MJKEY="$(cat ~/.mujoco/mjkey.txt)" \
    && docker-compose \
        -f ./docker/docker-compose.dev.cpu.yml \
        up \
        -d \
        --force-recreate
```

--------------------------------

### Sample Actions with FeedforwardGaussianPolicy

Source: https://context7.com/rail-berkeley/softlearning/llms.txt

Demonstrates sampling actions and their log probabilities from a trained policy. Use `policy.actions()` for stochastic actions during training and `policy.evaluation_mode()` for deterministic actions during evaluation. `policy.get_diagnostics()` provides performance metrics.

```python
# Sample stochastic actions during training
observations = {'observations': np.random.randn(32, 17).astype('float32')}
actions = policy.actions(observations)          # shape: (32, 6)

# Get actions + log probabilities in a single forward pass (numerically stable)
actions, log_pis = policy.actions_and_log_probs(observations)
print(actions.shape, log_pis.shape)             # (32, 6)  (32, 1)

# Switch to deterministic evaluation mode (returns mean of distribution)
with policy.evaluation_mode():
    det_actions = policy.actions(observations)  # shape: (32, 6), deterministic

# Policy diagnostics (mean/std of shifts, scales, entropy, actions)
diag = policy.get_diagnostics(observations)
print(dict(diag))
# {'shifts-mean': ..., 'scales-mean': ..., 'entropy-mean': ..., 'actions-mean': ...}
```

--------------------------------

### Simulate Policy Rollouts (CLI)

Source: https://context7.com/rail-berkeley/softlearning/llms.txt

Equivalent command-line usage for simulating policy rollouts. This is useful for running simulations directly from the terminal.

```bash
python -m examples.development.simulate_policy \
    ~/ray_results/gym/HalfCheetah/v3/2024-01-01T00-00-00-my-sac-experiment/checkpoint_1000/ \
    --max-path-length 1000 \
    --num-rollouts 5 \
    --render-kwargs '{"mode": "human"}'
```

--------------------------------

### Add and Sample Data with FlexibleReplayPool

Source: https://context7.com/rail-berkeley/softlearning/llms.txt

Adds a full episode path to the replay pool and samples batches for training. Supports random batches and sequence batches with episode-boundary masking for recurrent policies. Experience can be incrementally saved and restored.

```python
# Add a full episode path at once
path = {
    'observations':      np.random.randn(200, 17).astype('float32'),
    'actions':           np.random.randn(200, 6).astype('float32'),
    'rewards':           np.random.randn(200, 1).astype('float32'),
    'next_observations': np.random.randn(200, 17).astype('float32'),
    'terminals':         np.zeros((200, 1), dtype=bool),
}
pool.add_path(path)
print(pool.size)  # 200

# Sample a random batch for training
batch = pool.random_batch(batch_size=256)
print(batch['observations'].shape)   # (256, 17)
print(batch['rewards'].shape)        # (256, 1)

# Sample a sequence batch with episode-boundary masking (for recurrent policies)
seq_batch = pool.random_sequence_batch(batch_size=32, sequence_length=10)
print(seq_batch['mask'].shape)       # (32, 10) — False beyond episode boundary

# Save and restore experience incrementally
pool.save_latest_experience('/tmp/replay_pool_checkpoint.pkl.gz')
pool.load_experience('/tmp/replay_pool_checkpoint.pkl.gz')
```

--------------------------------

### Instantiate and Train Soft Q-Learning (SQL)

Source: https://context7.com/rail-berkeley/softlearning/llms.txt

This code sets up and runs the Soft Q-Learning (SQL) algorithm, an energy-based off-policy method using SVGD. It's suitable for multimodal action distributions and requires similar components to SAC, with specific parameters for the kernel.

```python
from softlearning.algorithms.sql import SQL
from softlearning.misc.kernel import adaptive_isotropic_gaussian_kernel

sql = SQL(
    training_environment=training_env,
    evaluation_environment=eval_env,
    policy=policy,
    Qs=Qs,
    pool=pool,
    sampler=sampler,
    policy_lr=3e-4,
    Q_lr=3e-4,
    reward_scale=30,                  # task-dependent; HalfCheetah uses 30
    discount=0.99,
    tau=5e-3,
    kernel_fn=adaptive_isotropic_gaussian_kernel,
    kernel_n_particles=16,
    kernel_update_ratio=0.5,
    value_n_particles=16,
    n_epochs=3000,
    epoch_length=1000,
    batch_size=256,
)

for diagnostics in sql.train():
    if diagnostics.get('done'):
        break
    print(f"epoch={diagnostics['epoch']}  "
          f"Q_loss={diagnostics['update']['Q_loss-mean']:.4f}")

```

--------------------------------

### Run SAC on a dm_control task locally

Source: https://context7.com/rail-berkeley/softlearning/llms.txt

Launches a local training experiment for SAC on a dm_control environment, specifying the universe, domain, task, experiment name, and checkpoint frequency.

```bash
softlearning run_example_local examples.development \
    --algorithm SAC \
    --universe dm_control \
    --domain cheetah \
    --task run \
    --exp-name dm-cheetah-run \
    --checkpoint-frequency 500
```

--------------------------------

### `get_environment` / `get_environment_from_params`

Source: https://context7.com/rail-berkeley/softlearning/llms.txt

Factory functions that instantiate a `SoftlearningEnv` from a universe/domain/task triple, routing to the appropriate adapter (Gym, DmControl, or RoboSuite).

```APIDOC
## `get_environment` / `get_environment_from_params`

### Description
Factory functions that instantiate a `SoftlearningEnv` from a universe/domain/task triple, routing to the appropriate adapter (Gym, DmControl, or RoboSuite).

### Method Signatures
```python
from softlearning.environments.utils import get_environment, get_environment_from_params

get_environment(universe: str, domain: str, task: str, environment_params: dict)
get_environment_from_params(params: dict)
```

### Parameters
- **universe** (str) - The name of the environment universe (e.g., 'gym', 'dm_control').
- **domain** (str) - The name of the environment domain (e.g., 'HalfCheetah', 'cheetah').
- **task** (str) - The name of the environment task (e.g., 'v3', 'run').
- **environment_params** (dict) - Additional parameters for the environment.
- **params** (dict) - A dictionary containing environment configuration, including 'universe', 'domain', 'task', and 'kwargs'.

### Request Examples

Method 1: direct call
```python
from softlearning.environments.utils import get_environment

env = get_environment(
    universe='gym',
    domain='HalfCheetah',
    task='v3',
    environment_params={}
)
obs = env.reset()
print(obs)  # {'observations': array([...], dtype=float32)}
```

Method 2: from params dict (used internally by ExperimentRunner)
```python
from softlearning.environments.utils import get_environment_from_params

env = get_environment_from_params({
    'universe': 'gym',
    'domain': 'Ant',
    'task': 'v3',
    'kwargs': {
        'healthy_reward': 0.0,
        'exclude_current_positions_from_observation': False,
    }
})
action = env.action_space.sample()
obs, reward, done, info = env.step(action)
print(obs.keys())   # dict_keys(['observations'])
print(reward)       # float
print(env.observation_shape, env.action_shape)
```
```

--------------------------------

### Access Docker Container

Source: https://github.com/rail-berkeley/softlearning/blob/master/README.md

Connect to the running softlearning Docker container using the 'docker exec' command.

```bash
docker exec -it softlearning bash
```

--------------------------------

### Run Experiment with Variant Spec

Source: https://context7.com/rail-berkeley/softlearning/llms.txt

This code snippet shows how to run an experiment using the `tune.run` function with a predefined variant specification. It configures the environment, policy, Q-function, and algorithm parameters.

```python
variant = {
    'environment_params': {
        'training': {'universe': 'gym', 'domain': 'Hopper', 'task': 'v3', 'kwargs': {}}
    },
    'policy_params': {
        'class_name': 'FeedforwardGaussianPolicy',
        'config': {'hidden_layer_sizes': (256, 256), 'squash': True,
                   'observation_keys': None, 'preprocessors': None},
    },
    'Q_params': {
        'class_name': 'double_feedforward_Q_function',
        'config': {'hidden_layer_sizes': (256, 256),
                   'observation_keys': None, 'preprocessors': None},
    },
    'algorithm_params': {
        'class_name': 'SAC',
        'config': {'n_epochs': 1000, 'epoch_length': 1000, 'batch_size': 256,
                   'min_pool_size': 1000, 'train_every_n_steps': 1,
                   'n_train_repeat': 1, 'eval_n_episodes': 1,
                   'policy_lr': 3e-4, 'Q_lr': 3e-4, 'alpha_lr': 3e-4,
                   'discount': 0.99, 'tau': 5e-3, 'target_entropy': 'auto'},
    },
    'replay_pool_params': {
        'class_name': 'SimpleReplayPool',
        'config': {'max_size': int(1e6)},
    },
    'sampler_params': {
        'class_name': 'SimpleSampler',
        'config': {'max_path_length': 1000},
    },
    'run_params': {
        'seed': 42, 'checkpoint_at_end': True,
        'checkpoint_frequency': 100, 'checkpoint_replay_pool': False,
    },
}

tune.run(
    ExperimentRunner,
    name='hopper-sac',
    config=variant,
    local_dir='~/ray_results',
    checkpoint_freq=100,
    checkpoint_at_end=True,
    num_samples=1,
)
```

--------------------------------

### Simulate Trained Policy

Source: https://github.com/rail-berkeley/softlearning/blob/master/README.md

This command simulates a trained agent's policy. Ensure the SAC_CHECKPOINT_DIR environment variable is set to the absolute path of the saved checkpoint. Customize simulation parameters like max path length, number of rollouts, and rendering mode.

```python
python -m examples.development.simulate_policy \
    ${SAC_CHECKPOINT_DIR} \
    --max-path-length 1000 \
    --num-rollouts 1 \
    --render-kwargs '{"mode": "human"}'
```

--------------------------------

### Wrap OpenAI Gym Environment with GymAdapter

Source: https://context7.com/rail-berkeley/softlearning/llms.txt

Use GymAdapter to wrap standard or pixel-based OpenAI Gym environments. It normalizes action ranges, optionally rescales observations, and removes the TimeLimit wrapper by default.

```python
from softlearning.environments.adapters.gym_adapter import GymAdapter

# Standard state-based environment
env = GymAdapter(
    domain='Walker2d',
    task='v3',
    rescale_action_range=(-1.0, 1.0),
    unwrap_time_limit=True
)
ob s = env.reset()          # {'observations': np.ndarray}
action = env.action_space.sample()
obs, reward, done, info = env.step(action)

# Pixel-based environment
pixel_env = GymAdapter(
    domain='HalfCheetah',
    task='v3',
    pixel_wrapper_kwargs={
        'pixels_only': True,
        'render_kwargs': {'width': 84, 'height': 84},
    }
)
ob s = pixel_env.reset()    # {'pixels': np.ndarray of shape (84, 84, 3)}

# Custom Gym env with observation rescaling
rescaled_env = GymAdapter(
    domain='Pendulum',
    task='v0',
    rescale_observation_range=(-1.0, 1.0),
)

```

--------------------------------

### Train Agent with SAC Algorithm

Source: https://github.com/rail-berkeley/softlearning/blob/master/README.md

Use this command to train an agent using the Soft Actor-Critic (SAC) algorithm. Specify the environment (universe, domain, task), experiment name, and checkpoint frequency for saving training progress.

```bash
softlearning run_example_local examples.development \
    --algorithm SAC \
    --universe gym \
    --domain HalfCheetah \
    --task v3 \
    --exp-name my-sac-experiment-1 \
    --checkpoint-frequency 1000  # Save the checkpoint to resume training later
```

--------------------------------

### Initialize FeedforwardGaussianPolicy

Source: https://context7.com/rail-berkeley/softlearning/llms.txt

Initializes a squashed Gaussian policy using a feedforward network. Configure hidden layer sizes, activation functions, input/output shapes, and action ranges. The `squash=True` argument applies a tanh bijector to constrain actions to [-1, 1].

```python
from softlearning.policies.gaussian_policy import FeedforwardGaussianPolicy
import numpy as np

policy = FeedforwardGaussianPolicy(
    hidden_layer_sizes=(256, 256),
    squash=True,            # applies tanh bijector to keep actions in [-1, 1]
    activation='relu',
    input_shapes=((17,),),  # HalfCheetah observation dim
    output_shape=(6,),      # HalfCheetah action dim
    action_range=(np.full(6, -1.0), np.full(6, 1.0)),
)
```

--------------------------------

### BibTeX Citation for Soft Actor-Critic Algorithms and Applications

Source: https://github.com/rail-berkeley/softlearning/blob/master/README.md

This is the BibTeX entry for citing the 'Soft Actor-Critic Algorithms and Applications' paper, which is relevant for academic research using Softlearning.

```bibtex
@techreport{haarnoja2018sacapps,
  title={Soft Actor-Critic Algorithms and Applications},
  author={Tuomas Haarnoja and Aurick Zhou and Kristian Hartikainen and George Tucker and Sehoon Ha and Jie Tan and Vikash Kumar and Henry Zhu and Abhishek Gupta and Pieter Abbeel and Sergey Levine},
  journal={arXiv preprint arXiv:1812.05905},
  year={2018}
}
```

--------------------------------

### Build Feedforward Neural Network with feedforward_model

Source: https://context7.com/rail-berkeley/softlearning/llms.txt

Constructs a `tf.keras.Sequential` feedforward network. Configure hidden layer sizes, activation functions, and output shape/activation. This is used for building Q-functions and policy networks.

```python
from softlearning.models.feedforward import feedforward_model
import tensorflow as tf
import numpy as np

# Build a 2-layer MLP that outputs 256 values
model = feedforward_model(
    hidden_layer_sizes=(256, 256),
    output_shape=(1,),            # single Q-value output
    activation='relu',
    output_activation='linear',
)

# Forward pass: accepts concatenated observations+actions
obs_action = np.random.randn(32, 23).astype('float32')  # 17-dim obs + 6-dim action
q_values = model(obs_action)
print(q_values.shape)  # (32, 1)

# Build a policy network head (outputs mean + log_std concatenated)
policy_net = feedforward_model(
    hidden_layer_sizes=(256, 256),
    output_shape=(12,),           # 6-dim mean + 6-dim log_std
    activation='relu',
    output_activation='linear',
    name='policy_network',
)
```

--------------------------------

### get_variant_spec

Source: https://context7.com/rail-berkeley/softlearning/llms.txt

Builds a complete, hierarchically merged experiment specification dict from universe/domain/task arguments, resolving environment-specific parameters, epoch lengths, and total timestep budgets automatically.

```APIDOC
## get_variant_spec — Experiment Configuration

Builds a complete, hierarchically merged experiment specification dict from universe/domain/task arguments, resolving environment-specific parameters, epoch lengths, and total timestep budgets automatically.

```python
from examples.development.variants import (
    get_variant_spec,
    get_variant_spec_base,
    get_total_timesteps,
    get_epoch_length,
    get_max_path_length,
)
import argparse

# Look up built-in defaults
print(get_total_timesteps('gym', 'HalfCheetah', 'v3'))  # 3000000
print(get_epoch_length('gym', 'HalfCheetah', 'v3'))     # 25000
print(get_max_path_length('gym', 'HalfCheetah', 'v3'))  # 1000

# Build full variant spec from parsed args
args = argparse.Namespace(
    universe='gym', domain='Ant', task='v3',
    policy='gaussian', algorithm='SAC',
    checkpoint_replay_pool=None,
)
variant = get_variant_spec(args)
print(variant['algorithm_params']['config']['n_epochs'])   # 120 (3e6 / 25000)
print(variant['policy_params']['class_name'])              # 'FeedforwardGaussianPolicy'
print(variant['Q_params']['class_name'])                   # 'double_feedforward_Q_function'

# Variant spec for image-based environment (auto-adds ConvNet preprocessors)
args_img = argparse.Namespace(
    universe='dm_control', domain='cheetah', task='run',
    policy='gaussian', algorithm='SAC', checkpoint_replay_pool=None,
)
variant_img = get_variant_spec(args_img)
# variant_img['policy_params']['config']['preprocessors'] will contain
# a convnet_preprocessor config when pixel observations are detected
```
```

--------------------------------

### Deactivate and Remove Conda Environment

Source: https://github.com/rail-berkeley/softlearning/blob/master/README.md

Commands to deactivate the current conda environment and remove the 'softlearning' conda environment entirely.

```bash
conda deactivate
conda remove --name softlearning --all
```

=== COMPLETE CONTENT === This response contains all available snippets from this library. No additional content exists. Do not make further requests.