### Get Current Date and Time

Source: https://github.com/verily-src/workbench-examples/blob/main/cloud_env_setup.ipynb

This shell command prints the current date and time. It is used for logging and provenance purposes, indicating when specific setup steps or the notebook execution occurred.

```shell
date
```

--------------------------------

### Setup Libraries and Utilities

Source: https://github.com/verily-src/workbench-examples/blob/main/single_cell/getting-started-with-hca.ipynb

Imports essential Python libraries for making HTTP requests, manipulating file paths, and displaying progress bars during downloads. These are foundational for interacting with web APIs and managing data.

```python
# Import the standard "requests" library for programmatic access of HTTP URLs
import requests

# Import the standard "os" module for URL path manipulation
import os

# Import "tqdm" to display a progress bar during downloads
from tqdm import tqdm
```

--------------------------------

### List Jupyter Lab Extensions

Source: https://github.com/verily-src/workbench-examples/blob/main/cloud_env_setup.ipynb

This shell command lists all installed JupyterLab extensions. This information is valuable for debugging and understanding the available functionalities within the JupyterLab environment, aiding in environment setup and troubleshooting.

```shell
jupyter labextension list
```

--------------------------------

### Get CPU Core Count

Source: https://github.com/verily-src/workbench-examples/blob/main/cloud_env_setup.ipynb

This shell command counts the number of logical CPU cores available on the system by parsing the `/proc/cpuinfo` file. It provides a quick way to assess the processing power of the cloud environment.

```shell
grep processor /proc/cpuinfo | wc -l
```

--------------------------------

### Get Total System Memory

Source: https://github.com/verily-src/workbench-examples/blob/main/cloud_env_setup.ipynb

This shell command retrieves the total system memory in kilobytes by reading the `MemTotal` line from the `/proc/meminfo` file. This is useful for understanding the memory resources available in the cloud environment.

```shell
grep "^MemTotal:" /proc/meminfo
```

--------------------------------

### Create Cromwell Examples Directory

Source: https://github.com/verily-src/workbench-examples/blob/main/cromwell_setup/cromwell_gvs_stats.ipynb

Ensures the existence of a '~/wb-tutorials/cromwell' directory, which is used to store Cromwell-related files, such as server logs. The `!mkdir -p` command creates the directory if it does not already exist, preventing potential errors in subsequent operations.

```bash
CROMWELL_EXAMPLES_DIR=os.path.expanduser('~/wb-tutorials/cromwell')
CROMWELL_SERVER_LOG=f'{CROMWELL_EXAMPLES_DIR}/cromwell.server.log'

!mkdir -p {CROMWELL_EXAMPLES_DIR}
```

--------------------------------

### Clone workbench-examples Repository using Git

Source: https://github.com/verily-src/workbench-examples/blob/main/nextflow/workspace_description.md

Clones the 'workbench-examples' Git repository. This command is used if the repository is not automatically cloned into the Verily Workbench cloud environment. It requires Git to be installed in the environment.

```sh
git clone https://github.com/verily-src/workbench-examples.git
```

--------------------------------

### Manage Dataproc Autoscaling Policies with gcloud CLI

Source: https://context7.com/verily-src/workbench-examples/llms.txt

Demonstrates how to manage Dataproc autoscaling policies using the `gcloud` command-line tool. It includes commands for importing a policy from a YAML file, describing an existing policy, and applying a policy during the creation of a Dataproc cluster. Requires the Google Cloud SDK to be installed and configured.

```bash
# Import autoscaling policy
gcloud dataproc autoscaling-policies import two_worker_autoscaling_policy \
    --source=two_worker_autoscaling_policy.yaml \
    --region=us-central1

# Describe policy
gcloud dataproc autoscaling-policies describe two_worker_autoscaling_policy \
    --region=us-central1

# Use policy when creating cluster
wb resource create dataproc-cluster \
    --name=my-cluster \
    --autoscaling-policy=two_worker_autoscaling_policy \
    --num-workers=2
```

--------------------------------

### Setup RNASeq Test Datasets - Python

Source: https://github.com/verily-src/workbench-examples/blob/main/nextflow/nextflow_examples.ipynb

This code snippet is used to set up the necessary test datasets for the RNASeq pipeline. It either checks out a specific branch if the workspace is a clone of the 'Getting Started with Nextflow workspace' or clones the repository and checks out the branch if it's a new or existing personal workspace. It ensures the 'samplesheet_minimal.csv' file is accessible.

```python
!cd /home/jupyter/test-datasets && git checkout rnaseq
!cd /home/jupyter/test-datasets && cat samplesheet/samplesheet_minimal.csv || echo "Something's not quite right. Please ensure you've added the Git repo as referenced resource and checked out the RNASeq branch."
```

```python
![[ -f test-datasets/samplesheet/samplesheet_minimal.csv ]] && echo "Resource already exists" || (wb resource add-ref git-repo --name=test-datasets --repo-url=git@github.com:nf-core/test-datasets.git && cd /home/jupyter && wb git clone --resource=nf-core-sample-data-repo &&  git checkout rnaseq)
```

--------------------------------

### Start MySQL Database for Cromwell

Source: https://github.com/verily-src/workbench-examples/blob/main/cromwell_setup/cromwell_server_management.ipynb

Starts a MySQL database instance using Docker to store Cromwell job states. It configures the database with necessary credentials and parameters. Dependencies: Docker. Inputs: None. Outputs: Runs a MySQL Docker container.

```bash
!docker run -p 3306:3306 \
    --name MySQLContainer \
    -e MYSQL_ROOT_PASSWORD=cromwell \
    -e MYSQL_DATABASE=cromwell_db \
    -e MYSQL_USER=cromwell \
    -e MYSQL_PASSWORD=cromwell \
    -d mysql/mysql-server:5.5 \
    --max-allowed-packet=16M
```

--------------------------------

### Create JupyterLab Notebook Instances with Workbench CLI

Source: https://context7.com/verily-src/workbench-examples/llms.txt

This section shows how to create JupyterLab notebook instances using the Workbench CLI. Examples include creating an instance with a custom post-startup script and accessing it privately, as well as creating an instance with default settings.

```bash
# Create cloud environment (notebook instance)
wb resource create gcp-notebook \
    --access=PRIVATE_ACCESS \
    --cloning=COPY_NOTHING \
    --name=analysis-environment \
    --description="Personal analysis environment" \
    --instance-id=my-notebook-20231208 \
    --post-startup-script=gs://my-bucket/startup.sh

# Create with default settings
wb resource create gcp-notebook \
    --name=default-notebook \
    --description="Default notebook instance"
```

--------------------------------

### Display HelloWorld WDL Workflow Content (Shell)

Source: https://github.com/verily-src/workbench-examples/blob/main/cromwell_setup/cromwell_examples.ipynb

Displays the content of the 'helloWorld.wdl' file using the 'cat' shell command. This workflow is a simple example that takes a string input parameter named 'name' and has no file inputs.

```shell
!cat workflows/wdl/helloWorld.wdl
```

--------------------------------

### Download Files for First Project

Source: https://github.com/verily-src/workbench-examples/blob/main/single_cell/getting-started-with-hca.ipynb

Downloads all files for the first project in the previously fetched project list. It uses the `download_project_files` function and specifies an output directory `OUTPUT_DIR`. Prints status messages before and after download.

```python
TARGET_PROJECT = PROJECT_LIST[0]

print(f"Downloading files for project '{TARGET_PROJECT['title']}'")
download_project_files(CATALOG, TARGET_PROJECT['uuid'], OUTPUT_DIR)
print("Downloads Complete.")
```

--------------------------------

### Move Resource to Version Folder using Workbench CLI

Source: https://github.com/verily-src/workbench-examples/blob/main/first_hour_on_vwb/creating_a_data_collection.ipynb

This example demonstrates how to move a resource to a specific version folder within a workspace using the Workbench CLI. It requires the target folder's ID, the resource's name, and the workspace ID.

```bash
# Move desired resource to version folder
wb resource move --folder-id=<FOLDER_ID> --name=<RESOURCE_NAME> --workspace=<WORKSPACE_ID>
```

--------------------------------

### Create Cromwell Options JSON (Python)

Source: https://github.com/verily-src/workbench-examples/blob/main/cromwell_setup/cromwell_gvs_setup_inputs.ipynb

Generates a JSON file named `gvs_options.json` that configures Cromwell execution options. This example enables both reading from and writing to the Cromwell cache, which can significantly speed up subsequent runs by reusing previous results.

```python
with open('gvs_options.json', 'w') as outfile:
    json.dump({
        'read_from_cache': True,
        'write_to_cache': True
    }, outfile, indent=4)
```

--------------------------------

### Configure Git User Name and Email

Source: https://github.com/verily-src/workbench-examples/blob/main/cloud_env_setup.ipynb

This Python snippet configures the global Git user name and email address. It checks for environment variables and allows manual input if needed. This ensures that Git commits are properly attributed. Dependencies include the `os` module.

```python
import os

if not os.getenv('GOOGLE_CLOUD_PROJECT'):
    raise Exception('Expected environment variables are not available. Please let workbench-support@verily.com know.')

# [Optional] EDIT THIS CELL If you wish to set your name and email address for all git repositories, change these
# values to be correct for you. All other cells in this notebook work fine unchanged.

# Uncomment the following line if you want to use your Workbench email address as your Git email address.
#GIT_EMAIL = os.environ['WORKBENCH_USER_EMAIL']
GIT_EMAIL = None

GIT_NAME = None

!git config --global --list

if GIT_NAME is not None:
    !git config --global user.name "{GIT_NAME}"

if GIT_EMAIL is not None:
    !git config --global user.email "{GIT_EMAIL}"

!git config --global --list | grep user

# [Optional] EDIT THIS CELL If you wish to set the text editor when using git
# in the terminal instead of via the JupyterLab git extension.

# !git config --global core.editor emacs
```

--------------------------------

### Install and Import nf-core Tool

Source: https://github.com/verily-src/workbench-examples/blob/main/nextflow/nextflow_examples.ipynb

Installs the 'nf-core' companion tool using pip and imports it. This tool is necessary for interacting with 'nf-core' pipelines. After installation, the kernel must be restarted before the tool can be successfully imported.

```python
try:
    import nf_core
    print("nf-core is already installed")
except:
    print("Installing nf-core...")
    !pip install nf-core
```

```python
try:
    import nf_core
    print("nf-core is already installed")
except:
    print("Please restart the kernel before importing...")
```

--------------------------------

### Start Cromwell Server in Background

Source: https://github.com/verily-src/workbench-examples/blob/main/cromwell_setup/cromwell_server_management.ipynb

Launches the Cromwell server in server mode as a background task. All server messages are redirected to a log file. The server takes a few seconds to initialize and become ready for requests. Dependencies: Bash shell. Inputs: CROMWELL_CONF, CROMWELL_SERVER_LOG. Outputs: Starts Cromwell server process.

```bash
%%bash -s {CROMWELL_CONF} {CROMWELL_SERVER_LOG}

# Start Cromwell in server mode
cromwell --config "$1" --logdir "$(dirname "$2")" server > "$2" 2>&1 &
```

--------------------------------

### Start Cromwell Server in Background (Bash)

Source: https://github.com/verily-src/workbench-examples/blob/main/cromwell_setup/cromwell_server_management.ipynb

Starts the Cromwell server in the background using the %%bash magic command. It configures the JVM memory and points to the Cromwell configuration file. The output is redirected to a specified log file. This method is preferred over '!' for background processes in IPython.

```Bash
CROMWELL_CONF="$1"
CROMWELL_SERVER_LOG="$2"

java -Xms10g -Xmx10g -Dconfig.file="${CROMWELL_CONF}" -jar "${CROMWELL_JAR}" server &> "${CROMWELL_SERVER_LOG}" &
```

--------------------------------

### Create Python Virtual Environment Directory

Source: https://github.com/verily-src/workbench-examples/blob/main/cloud_env_setup.ipynb

This shell command creates a directory named `venvs` in the user's home directory. This directory is intended to store Python virtual environments used by Verily Workbench tutorials, promoting organized dependency management.

```shell
mkdir -p ~/venvs
```

--------------------------------

### List Installed Python Packages with Bash

Source: https://github.com/verily-src/workbench-examples/blob/main/ml_examples/ml4h/ML4H_ML_intro.ipynb

This Bash command uses `pip freeze` to list all installed Python packages and their versions in the current environment. This is useful for environment management and reproducibility.

```bash
pip3 freeze
```

--------------------------------

### List Installed Python Packages (Shell)

Source: https://github.com/verily-src/workbench-examples/blob/main/ml_examples/llama31/vwb_8b_v100_llama31_hf.ipynb

Outputs a list of all installed Python packages and their versions. This is crucial for understanding the environment's configuration and ensuring reproducibility.

```bash
!pip freeze
```

--------------------------------

### Configure Workspace and Cloud Environment Directories

Source: https://github.com/verily-src/workbench-examples/blob/main/cromwell_setup/cromwell_server_management.ipynb

Sets up local directories for tutorial files and Cromwell configuration. It defines paths for tutorial files, Cromwell configuration, and server logs, creating necessary directories. Dependencies: os module. Inputs: None. Outputs: Prints configured paths.

```python
import os

CROMWELL_EXAMPLES_DIR=os.path.expanduser('~/wb-tutorials/cromwell')
CROMWELL_CONF=f'{CROMWELL_EXAMPLES_DIR}/cromwell.conf'
CROMWELL_SERVER_LOG=f'{CROMWELL_EXAMPLES_DIR}/cromwell.server.log'

!mkdir -p {CROMWELL_EXAMPLES_DIR}

print(f'Tutorial files will be written locally to {CROMWELL_EXAMPLES_DIR}')
print()
print(f'Cromwell configuration file will be written to {CROMWELL_CONF}')
print(f'Cromwell server log file will be written to {CROMWELL_SERVER_LOG}')
```

--------------------------------

### Configure Cromwell Tutorial File Paths (Python)

Source: https://github.com/verily-src/workbench-examples/blob/main/cromwell_setup/cromwell_examples.ipynb

Configures and prints the local file paths for Cromwell-related tutorial files within the VWB environment. This includes directories for examples, configuration, input JSON files, and log files. It also creates the necessary directories using '!mkdir -p'.

```python
import os

CROMWELL_EXAMPLES_DIR=os.path.expanduser('~/wb-tutorials/cromwell')
CROMWELL_CONF=f'{CROMWELL_EXAMPLES_DIR}/cromwell.runmode.conf'

HELLO_WORLD_INPUTS_JSON=f'{CROMWELL_EXAMPLES_DIR}/hello_world.inputs.json'
SAMPLE_INPUTS_JSON=f'{CROMWELL_EXAMPLES_DIR}/sample.inputs.json'

RUNMODE_LOG=f'{CROMWELL_EXAMPLES_DIR}/cromwell.run.log'

!mkdir -p {CROMWELL_EXAMPLES_DIR}

print(f'Tutorial files will be written locally to {CROMWELL_EXAMPLES_DIR}')
print()
print(f'Cromwell configuration file will be written to {CROMWELL_CONF}')
print(f'Cromwell hello-world input JSON file will be written to {HELLO_WORLD_INPUTS_JSON}')
print(f'Cromwell runmode log file will be written to {RUNMODE_LOG}')
print(f'Cromwell samples input JSON file will be written to {SAMPLE_INPUTS_JSON}')
```

--------------------------------

### Set Notebook Globals

Source: https://github.com/verily-src/workbench-examples/blob/main/single_cell/getting-started-with-hca.ipynb

Defines global variables for notebook operations, including the HCA catalog endpoint URL, directory for saved files, and an example project UUID. It also creates the necessary output directory if it doesn't exist.

```python
CATALOG_PREFIX = 'dcp'
ENDPOINT_URL = f'https://service.azul.data.humancellatlas.org/index'
CATALOGS_URL = f'{ENDPOINT_URL}/catalogs'
PROJECTS_URL = f'{ENDPOINT_URL}/projects'

HCA_EXAMPLES_DIR = os.path.expanduser('~/wb-tutorials/hca')
OUTPUT_DIR = os.path.join(HCA_EXAMPLES_DIR, 'data')

!mkdir -p "{OUTPUT_DIR}"
```

--------------------------------

### Generate Environment Provenance: Jupyter Lab Extensions

Source: https://github.com/verily-src/workbench-examples/blob/main/dataproc/batch_job_submit.ipynb

Lists installed Jupyter Lab extensions using `jupyter labextension list`. This helps in documenting the Jupyter environment setup, including any custom extensions that might affect notebook execution.

```bash
!jupyter labextension list
```

--------------------------------

### Setup: Get Current Workspace ID in Python

Source: https://github.com/verily-src/workbench-examples/blob/main/first_hour_on_vwb/creating_a_data_collection.ipynb

This code snippet captures the ID of the current workspace using the 'wb workspace describe' command and then parses the JSON output to extract the workspace ID. It requires the 'subprocess', 'json', 'ipywidgets', 'widget_utils', 'vwb_folder_utils', and 'datetime' libraries.

```python
import json
import ipywidgets as widgets
import subprocess
import widget_utils as wu
import vwb_folder_utils as vfu
from datetime import date

'''
Resolves ID of current workspace.
'''
def get_current_workspace_id():
    CURRENT_WORKSPACE_ID_CMD_OUTPUT = !wb workspace describe --format=json | jq --raw-output ".id"
    CURRENT_WORKSPACE_ID = CURRENT_WORKSPACE_ID_CMD_OUTPUT[0]
    return CURRENT_WORKSPACE_ID

CURRENT_WORKSPACE_ID = get_current_workspace_id()
print(f"Current workspace ID is {CURRENT_WORKSPACE_ID}")
```

--------------------------------

### Manage BigQuery Datasets with Workbench CLI

Source: https://context7.com/verily-src/workbench-examples/llms.txt

Commands for creating and managing BigQuery datasets in Verily Workbench. Includes options for table lifecycle policies and referencing datasets/tables from other projects.

```bash
# Create BigQuery dataset with auto-delete for tables (14 days)
wb resource create bq-dataset \
    --name=tabular_data_autodelete_after_two_weeks \
    --dataset-id=tabular_data_autodelete_after_two_weeks \
    --cloning=COPY_NOTHING \
    --default-table-lifetime=1209600 \
    --description="BigQuery dataset for temporary tabular data."

# Add referenced BigQuery dataset from another project
wb resource add-ref bq-dataset \
    --cloning=COPY_REFERENCE \
    --description="Public genomics dataset" \
    --name=public-genomes \
    --path=bigquery-public-data.human_genome_variants

# Add referenced BigQuery table
wb resource add-ref bq-table \
    --cloning=COPY_REFERENCE \
    --description="1000 Genomes pedigree data" \
    --name=genomes-pedigree \
    --path=bigquery-public-data.human_genome_variants.1000_genomes_pedigree

# List all workspace resources
wb resource list
```

--------------------------------

### Install Specific 'igraph' Version (R)

Source: https://github.com/verily-src/workbench-examples/blob/main/1kgenomes_examples/R_1k_genomes.ipynb

This R command installs a specific version ('1.6.0') of the 'igraph' package from a specified CRAN repository. This is done to 'pin' the version and avoid potential compatibility issues with the latest version, as mentioned in the context of the example. It requires the 'remotes' package to be installed and loaded.

```r
install_version("igraph", version = "1.6.0", repos = "http://cran.us.r-project.org")
```

--------------------------------

### Workbench CLI - BigQuery Dataset Management

Source: https://context7.com/verily-src/workbench-examples/llms.txt

Create and manage BigQuery datasets, including setting table lifecycle policies and adding references to existing datasets or tables.

```APIDOC
## Workbench CLI - BigQuery Dataset Management

### Description
Create and manage BigQuery datasets, including setting table lifecycle policies and adding references to existing datasets or tables.

### Commands

- **`wb resource create bq-dataset --name=<dataset-name> --dataset-id=<bq-dataset-id> --cloning=<cloning-policy> --default-table-lifetime=<seconds> --description=<description>`**: Create a new BigQuery dataset.
- **`wb resource add-ref bq-dataset --cloning=<cloning-policy> --description=<description> --name=<resource-name> --path=<dataset-path>`**: Add a referenced BigQuery dataset from another project or location.
- **`wb resource add-ref bq-table --cloning=<cloning-policy> --description=<description> --name=<resource-name> --path=<table-path>`**: Add a referenced BigQuery table.
- **`wb resource list`**: List all workspace resources.
```

--------------------------------

### Get Latest DCP Catalog

Source: https://github.com/verily-src/workbench-examples/blob/main/single_cell/getting-started-with-hca.ipynb

Retrieves the latest Data Coordination Platform (DCP) catalog identifier. This is a simple utility function to get the catalog name, which is then used in other operations.

```python
CATALOG = get_dcp_catalog()
print(f"The DCP catalog is: {CATALOG}")
```

--------------------------------

### Display Final Workbench Resource List

Source: https://github.com/verily-src/workbench-examples/blob/main/workspace_setup.ipynb

Lists the Workbench resources after their creation or resolution. This command allows users to confirm that the Cloud Storage buckets and BigQuery dataset have been successfully set up in their workspace.

```bash
!wb resource list
```

--------------------------------

### Install 'irlba' R Package

Source: https://github.com/verily-src/workbench-examples/blob/main/1kgenomes_examples/R_1k_genomes.ipynb

This R command installs the 'irlba' package, which provides efficient methods for truncated singular value decomposition and principal components analysis (PCA) on large sparse and dense matrices. This package is a dependency for the PCA computation in this example. It requires an internet connection to download from CRAN.

```r
# Fast and memory efficient methods for truncated singular value decomposition and
# principal components analysis of large sparse and dense matrices.
install.packages("irlba")
```

--------------------------------

### Get Project Request Parameters

Source: https://github.com/verily-src/workbench-examples/blob/main/single_cell/getting-started-with-hca.ipynb

Constructs the necessary parameters for requesting a list of projects from the HCA catalog. This includes specifying the catalog, the maximum number of projects to retrieve, and sorting options.

```python
def get_project_request_params(catalog: str, max_projects: int) -> dict:

    # Set up request parameters
    return {
      'catalog': catalog,
      'size': max_projects,
      'sort': 'projectTitle',
      'order': 'asc'
    }
```

--------------------------------

### Get Latest DCP Catalog

Source: https://github.com/verily-src/workbench-examples/blob/main/single_cell/getting-started-with-hca.ipynb

Identifies and returns the latest Data Coordination Platform (DCP) catalog from the list of available catalogs. It filters for DCP-prefixed catalogs and selects the one with the highest numerical suffix.

```python
def get_dcp_catalog() -> str:
    # We want to latest dcp catalog.
    catalogs = list_catalogs()
    
    # Extract the 'dcp' catalogs
    dcp_catalogs = [c for c in catalogs if c.startswith(CATALOG_PREFIX)]
    
    # Get the largest numerically
    max_value = 0
    max_catalog = None
    for c in dcp_catalogs:
        if int(c[len(CATALOG_PREFIX):]) > max_value:
            max_value = int(c[len(CATALOG_PREFIX):])
            max_catalog = c
    
    return max_catalog
```

--------------------------------

### Manage Cloud Storage Buckets with Workbench CLI

Source: https://context7.com/verily-src/workbench-examples/llms.txt

Commands for creating and managing Google Cloud Storage buckets within Verily Workbench. Supports features like automatic deletion policies and referencing existing buckets.

```bash
# Create a durable storage bucket
wb resource create gcs-bucket \
    --name=ws_files \
    --bucket-name=${GOOGLE_CLOUD_PROJECT}-ws-files \
    --cloning=COPY_NOTHING \
    --description="Bucket for reports and provenance records."

# Create auto-deleting bucket (files deleted after 14 days)
wb resource create gcs-bucket \
    --name=ws_files_autodelete_after_two_weeks \
    --bucket-name=${GOOGLE_CLOUD_PROJECT}-autodelete-after-two-weeks \
    --cloning=COPY_NOTHING \
    --auto-delete=14 \
    --description="Bucket for temporary storage with automatic cleanup."

# Resolve bucket URL from workspace reference
wb resolve --name=ws_files

# Add existing bucket as referenced resource
wb resource add-ref gcs-bucket \
    --bucket-name=my-existing-bucket \
    --cloning=COPY_REFERENCE \
    --description="External bucket reference" \
    --name=external-bucket
```

--------------------------------

### Configure and Run ML Training from Command Line (Python)

Source: https://github.com/verily-src/workbench-examples/blob/main/ml_examples/ml4h/mnist_survival_analysis_demo.ipynb

This Python code snippet demonstrates how to set `sys.argv` to mimic command-line arguments for running an ML training process. It configures input/output tensors, batch size, and output directories. Dependencies include `sys` and a `parse_args` function.

```python
import sys

# Assuming HD5_FOLDER and OUTPUT_FOLDER are defined elsewhere
# Assuming parse_args and train_multimodal_multitask are imported

sys.argv = ['train',
            '--tensors', HD5_FOLDER,
            '--input_tensors', 'mnist.mnist_image',
            '--output_tensors', 'mnist.mnist_label',
            '--batch_size', '64',
            '--test_steps', '64',
            '--epochs', '6',
            '--output_folder', OUTPUT_FOLDER,
            '--id', 'learn_mnist'
           ]
args = parse_args()
metrics = train_multimodal_multitask(args)
```

--------------------------------

### Query Top 1000 SNPs from BigQuery

Source: https://github.com/verily-src/workbench-examples/blob/main/1kgenomes_examples/GWAS_experiments.ipynb

Retrieves the top 1000 SNP positions from a BigQuery table, ordered by start position. It selects specific columns including reference name, start position, reference bases, alternative bases, and Chi-squared score. The results are limited to 1000 rows and then ordered.

```sql
SELECT * FROM (
  SELECT
    reference_name,
    start_position,
    reference_bases,
    alt_bases,
    chi_squared_score
  FROM `stats_results_table`
  LIMIT 1000
)
ORDER BY start_position asc
```

--------------------------------

### Fetch and Print Project List

Source: https://github.com/verily-src/workbench-examples/blob/main/single_cell/getting-started-with-hca.ipynb

Fetches a short list of projects (up to 10) from the latest DCP catalog and prints their details. This snippet demonstrates the usage of the `list_projects` function.

```python
PROJECT_LIST = list_projects(CATALOG, 10)
```

--------------------------------

### Install 'threejs' R Package

Source: https://github.com/verily-src/workbench-examples/blob/main/1kgenomes_examples/R_1k_genomes.ipynb

This R command installs the 'threejs' package, which enables the creation of interactive 3D scatter plots, network plots, and globes using the 'three.js' JavaScript visualization library. This package is used for visualizing the PCA results in 3D. It requires an internet connection to download from CRAN.

```r
# Create interactive 3D scatter plots, network plots, and globes using the 'three.js' visualization library
install.packages("threejs")
```

--------------------------------

### Fetch HCA Project List

Source: https://github.com/verily-src/workbench-examples/blob/main/single_cell/getting-started-with-hca.ipynb

Fetches a list of project titles and UUIDs from the HCA catalog. It handles pagination and limits the number of projects returned. Dependencies include `fetch_json` and `get_project_request_params`.

```python
def list_projects(catalog: str, max_projects: int) -> list:

    # Allocate a list to populate for return
    project_list = []

    print(f"Fetching first {max_projects} projects:")
    
    # Set up the fetch parameters
    url = PROJECTS_URL
    params = get_project_request_params(catalog, max_projects)
    
    while url and len(project_list) < max_projects:
        response_json = fetch_json(url, params)

        # Iterate over results, pulling out key project elements
        for hit in response_json['hits']:
            uuid = hit['entryId']
            shortname = hit['projects'][0]['projectShortname']
            title = hit['projects'][0]['projectTitle']

            print("-----------------------")
            print(f"Title: {title}")
            print(f"Shortname: {shortname}")
            print(f"Id: {uuid}")

            project_list.append({'title': title, 'uuid': uuid})

        # Handle response pagination if we haven't reached max_projects
        url = response_json['pagination']['next']
        if url:
            params = None
        else:
            break

    return project_list
```

--------------------------------

### Manage Workspaces with Workbench CLI

Source: https://context7.com/verily-src/workbench-examples/llms.txt

Commands to manage and inspect Verily Workbench workspaces using the `wb` CLI. This includes checking status, listing workspaces, retrieving details, and switching between workspaces.

```bash
# Check current workspace status
wb status

# List all accessible workspaces
wb workspace list

# Get workspace details in JSON format
wb workspace describe --format=json

# Extract workspace ID programmatically
wb workspace describe --format=json | jq --raw-output ".id"

# Switch to a different workspace
wb workspace set --id=<workspace-id>
```

--------------------------------

### ML4H Imports and Setup

Source: https://github.com/verily-src/workbench-examples/blob/main/ml_examples/ml4h/ML4H_Model_Factory_Intro.ipynb

This code block imports essential libraries and modules from the ML4H toolkit and other common data science packages. It sets up the environment for machine learning tasks using ML4H, including data manipulation, model creation, and visualization tools.

```python
# Imports
import os
import sys
import pickle
import random
from typing import List, Dict, Callable
from collections import defaultdict, Counter

import h5py
import numpy as np


from ml4h.defines import StorageType
from ml4h.arguments import parse_args
from ml4h.TensorMap import TensorMap, Interpretation
from ml4h.tensor_generators import test_train_valid_tensor_generators
from ml4h.models.train import train_model_from_generators
from ml4h.models.model_factory import make_multimodal_multitask_model
from ml4h.models.inspect import plot_and_time_model
from ml4h.recipes import compare_multimodal_scalar_task_models, train_multimodal_multitask

%matplotlib inline
import matplotlib.pyplot as plt
from matplotlib import gridspec
```

--------------------------------

### List Available Catalogs

Source: https://github.com/verily-src/workbench-examples/blob/main/single_cell/getting-started-with-hca.ipynb

Retrieves a list of available Human Cell Atlas (HCA) catalogs from the server. It filters out internal catalogs and returns a list of catalog names, such as ['dcp31', 'dcp32', 'dcp1'].

```python
def list_catalogs() -> list:
    response = fetch_json(CATALOGS_URL, None)

    catalogs = []
    for catalog, details in response['catalogs'].items():
        if not details['internal']:
            catalogs.append(catalog)

    return catalogs
```

--------------------------------

### Get Workspace ID with `wb` CLI

Source: https://github.com/verily-src/workbench-examples/blob/main/dataproc/batch_job_submit.ipynb

Retrieves the workspace ID using the `wb workspace describe` command and parses the JSON output to extract the ID. This is a prerequisite for deriving other workspace-specific resources.

```python
ws_id_list = !wb workspace describe --format=JSON | jq '.id'
WORKSPACE_ID = ws_id_list[0]
print(WORKSPACE_ID)
```

--------------------------------

### Workbench CLI - Workspace Management

Source: https://context7.com/verily-src/workbench-examples/llms.txt

Manage Verily Workbench workspaces, including checking status, listing available workspaces, describing workspace details, and setting the active workspace.

```APIDOC
## Workbench CLI - Workspace Management

### Description
Manage Verily Workbench workspaces, including checking status, listing available workspaces, describing workspace details, and setting the active workspace.

### Commands

- **`wb status`**: Check the current workspace status.
- **`wb workspace list`**: List all accessible workspaces.
- **`wb workspace describe --format=json`**: Get detailed workspace information in JSON format.
- **`wb workspace describe --format=json | jq --raw-output ".id"`**: Extract the workspace ID programmatically.
- **`wb workspace set --id=<workspace-id>`**: Switch to a different workspace.
```

--------------------------------

### Create BigQuery Dataset using Workbench CLI

Source: https://github.com/verily-src/workbench-examples/blob/main/1kgenomes_examples/GWAS_experiments.ipynb

Executes a command-line instruction via the Verily Workbench CLI (`wb resource create bq-dataset`) to create a new BigQuery dataset within the current workspace. The dataset is specified to be in the `US` region.

```bash
!wb resource create bq-dataset --location=US --id $bq_dataset_name
```

--------------------------------

### Download File with Progress Bar

Source: https://github.com/verily-src/workbench-examples/blob/main/single_cell/getting-started-with-hca.ipynb

Downloads a file from a given URL to a specified local path. It streams the content and displays a progress bar using `tqdm`, providing visual feedback on the download status and speed.

```python
def download_file(url: str, output_path: str) -> None:
    # Start the request stream
    response = requests.get(url, stream=True)
    response.raise_for_status()

    # Get the content length so the progress bar can display accurate progress
    total = int(response.headers.get('content-length', 0))
    print(f'Downloading to: {output_path}', flush=True)
    
    # Fetch the content in chunks, updating the progress bar
    with open(output_path, 'wb') as f:
        with tqdm(total=total, unit='B', unit_scale=True, unit_divisor=1024) as bar:
            for chunk in response.iter_content(chunk_size=1024):
                size = f.write(chunk)
                bar.update(size)
```

--------------------------------

### Workbench CLI - Git Repository Integration

Source: https://context7.com/verily-src/workbench-examples/llms.txt

Integrate Git repositories as workspace references, allowing them to be automatically mounted into cloud environments.

```APIDOC
## Workbench CLI - Git Repository Integration

### Description
Integrate Git repositories as workspace references, allowing them to be automatically mounted into cloud environments.

### Commands

- **`wb resource add-ref git-repo --repo-url=<repository-url> --name=<resource-name> --cloning=<cloning-policy> --description=<description>`**: Add a Git repository as a referenced resource.
  - Supports public and private repositories (private requires SSH key setup).
- **`wb resource resolve --name=<resource-name>`**: Resolve the reference to a Git repository.
```

--------------------------------

### Generate Environment Provenance: Conda Environment

Source: https://github.com/verily-src/workbench-examples/blob/main/dataproc/batch_job_submit.ipynb

Exports the current Conda environment configuration using `conda env export`. This command lists all packages and their versions installed in the active Conda environment, crucial for reproducibility.

```bash
!conda env export
```

--------------------------------

### Python Utilities for Group Management

Source: https://context7.com/verily-src/workbench-examples/llms.txt

This Python code assists in retrieving and managing organization-linked group information and permissions. It provides functions to get a group's description and to list role assignments for users and other groups within a specified organization.

```python
import json
import subprocess

# Get org-linked group information
def get_org_linked_group_info(org_id, group_name):
    """
    Return an org-linked group's description as JSON.
    """
    wb_command = ["wb", "group", "describe",
                  f"--org={org_id}",
                  f"--name={group_name}",
                  "--format=JSON"]
    result = subprocess.run(wb_command, capture_output=True, text=True)
    group_info = json.loads(result.stdout)
    return group_info

# Get group role assignments
def get_org_linked_group_roles(org_id, group_name):
    """
    Return a flattened mapping of users to roles for named group.
    """
    roles_dict = {role: set() for role in ["ADMIN", "MEMBER", "READER", "SUPPORT"]}

    wb_command = ["wb", "group", "role", "list",
                  f"--org={org_id}",
                  f"--name={group_name}",
                  "--format=JSON"]
    result = subprocess.run(wb_command, capture_output=True, text=True)
    nested_roles = json.loads(result.stdout)

    for item in nested_roles:
        if item['principal']['principalType'] == "GROUP":
            roles_dict[role].update(
                get_org_linked_group_roles(
                    item['principal']['groupOrg'],
                    item['principal']['groupName']
                )["MEMBER"]
            )
        else:
            for role in item['roles']:
                if item['principal']['userEmail'] is not None:
                    roles_dict[role].add(item['principal']['userEmail'])
    return roles_dict

# Usage example
group_info = get_org_linked_group_info("verily", "research-team")
roles = get_org_linked_group_roles("verily", "research-team")
print(f"Group info: {group_info}")
print(f"Admins: {roles['ADMIN']}")
print(f"Members: {roles['MEMBER']}")
```

--------------------------------

### Setup and Configuration: Python Variables

Source: https://github.com/verily-src/workbench-examples/blob/main/dataproc/create_hail_cluster.ipynb

Sets up necessary Python environment variables and constructs a unique Hail cluster name using the user's email and the current date. It imports the datetime and os modules for date formatting and environment variable access.

```python
from datetime import datetime
import os
```

```python
USER = os.getenv('WORKBENCH_USER_EMAIL')
if USER:
    USER = USER.split('@')[0].replace('.', '-')
else:
    print('WORKBENCH_USER_EMAIL not defined; using USER')
    USER = os.getenv('USER')
print(USER)
```

```python
HAIL_CLUSTER_NAME = '-'.join(['hail', USER, datetime.now().strftime('%Y%m%d')])

print(HAIL_CLUSTER_NAME)
```

--------------------------------

### Verify Git Global Configuration

Source: https://context7.com/verily-src/workbench-examples/llms.txt

This snippet lists the global Git configuration and filters it to show the user name and email settings, verifying the previous configuration step. This command helps confirm that the git config commands were successful.

```bash
!git config --global --list | grep user
```

--------------------------------

### Fetch JSON Data with Error Handling

Source: https://github.com/verily-src/workbench-examples/blob/main/single_cell/getting-started-with-hca.ipynb

A utility function to fetch JSON data from a given URL with optional parameters. It handles HTTP errors and returns the parsed JSON response on success. This is crucial for robust API interactions.

```python
def fetch_json(url: str, params: dict) -> list:
    response = requests.get(url, params=params)
    response.raise_for_status()
    
    return response.json()
```

--------------------------------

### Integrate Git Repositories with Workbench CLI

Source: https://context7.com/verily-src/workbench-examples/llms.txt

Commands to add Git repositories as workspace references, enabling automatic mounting into cloud environments. Supports both public and private repositories.

```bash
# Add public GitHub repository
wb resource add-ref git-repo \
    --repo-url=https://github.com/verily-src/workbench-examples.git \
    --name=workbench-examples \
    --cloning=COPY_REFERENCE \
    --description="Verily Workbench example notebooks"

# Add private GitHub repository (requires SSH key setup)
wb resource add-ref git-repo \
    --repo-url=git@github.com:org/private-repo.git \
    --name=private-repo \
    --cloning=COPY_REFERENCE \
    --description="Private research repository"

# Resolve repository reference
wb resource resolve --name=workbench-examples
```

--------------------------------

### Generate Environment Provenance with Python

Source: https://github.com/verily-src/workbench-examples/blob/main/first_hour_on_vwb/creating_a_data_collection.ipynb

These Python code snippets are used to generate provenance information about the current notebook environment. They capture the date, conda/pip installed packages, Jupyter Lab extensions, CPU count, and total memory.

```python
!date
```

```python
!conda env export
```

```python
!jupyter labextension list
```

```python
!grep ^processor /proc/cpuinfo | wc -l
```

```python
!grep "^MemTotal:" /proc/meminfo
```