### Install GreenBoost and Verify Setup

Source: https://context7.com/isolatedoctopi/greenboost/llms.txt

Installs GreenBoost using a setup script and verifies the installation by checking the status and confirming the total virtual VRAM visible to PyTorch.

```bash
git clone https://gitlab.com/IsolatedOctopi/greenboost.git
cd greenboost
sudo ./greenboost_setup.sh          # interactive: Full Install or Light Install

# Full Install performs:
#   - DKMS build + load of greenboost.ko
#   - /etc/ld.so.preload  ← libgreenboost_cuda.so (system-wide CUDA shim)
#   - /etc/ld.so.audit    ← libgreenboost_audit.so (process filter)
#   - Ollama systemd drop-in injecting GREENBOOST_ACTIVE=1
#   - CPU governor + NVMe udev rules + hugepages tuning
#   - idle-reclaim daemon + shader-boost service

# After install, verify:
greenboost status
# Tier 1  GPU VRAM           : 12 GB  (physical VRAM on e.g. RTX 3060)
# Tier 2  System RAM pool    : 51 GB  (pinned DDR, DMA-BUF exported)
# Tier 3  NVMe               : 200 GB (backing file /var/lib/greenboost/t3_store)

# Confirm virtual VRAM is visible to PyTorch (login shell — GREENBOOST_ACTIVE=1 already set):
python -c "import torch; print(round(torch.cuda.get_device_properties(0).total_memory/1e9,1), 'GB')"
# 63.0 GB   (T1 12 GB + T2 51 GB)
```

--------------------------------

### Install GreenBoost

Source: https://gitlab.com/isolatedoctopi/greenboost/-/blob/main/README.md

Clone the repository, navigate to the directory, and run the setup script. The installer will prompt for installation mode (Full or Light).

```bash
git clone https://gitlab.com/IsolatedOctopi/greenboost.git
cd greenboost
sudo ./greenboost_setup.sh
```

--------------------------------

### Deploy Test Workload

Source: https://gitlab.com/isolatedoctopi/greenboost/-/blob/main/k8s-deployment/INSTALL_CLUSTER.md

Deploys an example LLM pod to test the GreenBoost installation. This involves applying a pod definition and monitoring its logs.

```bash
# Deploy example LLM pod
kubectl apply -f k8s-examples/greenboost-llm-pod.yaml

# Monitor pod status
kubectl get pods -n greenboost-llm
kubectl logs -f llm-inference -n greenboost-llm -c ollama
```

--------------------------------

### Start GreenBoost Gaming Service

Source: https://gitlab.com/isolatedoctopi/greenboost/-/blob/main/GREENBOOST_PROTON.md

This command starts the GreenBoost gaming service before launching a game. It shifts VRAM priority to the game. The service is stopped after the game exits.

```bash
systemctl start greenboost-gaming.service
```

--------------------------------

### Docker/Podman Integration with GreenBoost (Path B)

Source: https://context7.com/isolatedoctopi/greenboost/llms.txt

Run containers with GreenBoost enabled. This example uses environment variables and volume mounts to preload the GreenBoost CUDA library, enabling Path B automatically.

```bash
docker run --gpus all \
  -e GREENBOOST_ACTIVE=1 \
  -v /usr/local/lib/libgreenboost_cuda.so:/usr/local/lib/libgreenboost_cuda.so \
  -e LD_PRELOAD=/usr/local/lib/libgreenboost_cuda.so \
  nvcr.io/nvidia/cuda:12.4.0-runtime-ubuntu22.04 \
  python run_model.py
```

--------------------------------

### Launch vLLM API Server with GreenBoost

Source: https://gitlab.com/isolatedoctopi/greenboost/-/blob/main/DOCUMENTATION.md

Start the vLLM API server with GreenBoost enabled, specifying model path and GPU utilization. This command enables GreenBoost for vLLM.

```bash
GREENBOOST_ACTIVE=1 LD_PRELOAD=/usr/local/lib/libgreenboost_cuda.so \
    python -m vllm.entrypoints.openai.api_server \
        --model /opt/models/glm-4.7-flash-hf \
        --max-model-len 131072 \
        --gpu-memory-utilization 0.95 \
        --enforce-eager
```

--------------------------------

### Greenboost Maintenance Commands

Source: https://context7.com/isolatedoctopi/greenboost/llms.txt

Commands for system configuration installation and recovery.

```bash
sudo greenboost install-sys-configs
```

```bash
sudo greenboost recover
```

--------------------------------

### GreenBoost Profile Format (YAML Frontmatter)

Source: https://context7.com/isolatedoctopi/greenboost/llms.txt

Example structure of a GreenBoost profile file using YAML frontmatter. Defines memory allocation, swap, and other hardware-specific parameters. Human-readable comments can follow the YAML section.

```yaml
---
physical_vram_gb: 12
virtual_vram_gb: 51
safety_reserve_gb: 8
nvme_swap_gb: 200
nvme_pool_gb: 180
use_hugepages: 1
pcores_only: 1
tier3_backend: nvme
---
# Human-readable section follows ...
# RTX 5070 12 GB, i9-14900KF, 64 GB DDR5
```

--------------------------------

### Example GreenBoost Metrics Output

Source: https://gitlab.com/isolatedoctopi/greenboost/-/blob/main/k8s-deployment/monitoring.md

Sample output from the GreenBoost metrics exporter, showing various memory tiers, watchdog pressure, and NVLink status.

```text
# HELP greenboost_t1_used_bytes Physical VRAM in use (T1)
# TYPE greenboost_t1_used_bytes gauge
greenboost_t1_used_bytes{gpu="0"} 10737418240
greenboost_t1_used_bytes{gpu="1"} 10737418240
# TYPE greenboost_t1_total_bytes gauge
greenboost_t1_total_bytes 32984999936

# HELP greenboost_t2_used_bytes System DDR pool used (T2)
# TYPE greenboost_t2_used_bytes gauge
greenboost_t2_used_bytes 329853488128

# HELP greenboost_t2_total_bytes DDR pool total capacity
# TYPE greenboost_t2_total_bytes gauge
greenboost_t2_total_bytes 329853488128

# HELP greenboost_watchdog_pressure Watchdog pressure (0=healthy, 100=impending OOM)
# TYPE greenboost_watchdog_pressure gauge
greenboost_watchdog_pressure 0

# HELP greenboost_nvlink_ready NVLink fabric health (1=ready, 0=not ready)
# TYPE greenboost_nvlink_ready gauge
greenboost_nvlink_ready 1
```

--------------------------------

### Run ExLlamaV3 Script with GreenBoost

Source: https://gitlab.com/isolatedoctopi/greenboost/-/blob/main/DOCUMENTATION.md

Execute an ExLlamaV3 Python script using GreenBoost. This command assumes ExLlamaV3 is installed and configured.

```bash
greenboost run python your_exllama_script.py
```

--------------------------------

### Configure GreenBoost Memory Tiers

Source: https://gitlab.com/isolatedoctopi/greenboost/-/blob/main/k8s-deployment/INSTALL_CLUSTER.md

Adjust memory tier sizing in the `values.yaml` file for different cluster configurations. This example shows settings for an 8x V100 32GB setup.

```yaml
greenboost:
  physicalVramGb: 256   # 8× V100 32GB
  virtualVramGb: 307   # Adjust based on available system RAM
  tier3Backend: "lustre"
```

--------------------------------

### Manual Coordination of Gaming and Inference Services

Source: https://context7.com/isolatedoctopi/greenboost/llms.txt

Manually manage GreenBoost services for coordinating gaming and inference workloads. Start the gaming service before playing to reduce KV T1 reservation, and stop it afterward to restore inference priority.

```bash
systemctl start  greenboost-gaming.service   # before game: reduce KV T1 reservation
```

```bash
systemctl stop   greenboost-gaming.service   # after game: restore inference priority
```

--------------------------------

### GreenBoost Proton Installation Script

Source: https://gitlab.com/isolatedoctopi/greenboost/-/blob/main/GREENBOOST_PROTON.md

This script installs GreenBoost Proton to the Steam compatibility tools directory. Ensure you are in the correct directory before running.

```bash
cd ~/Dev/greenboost_main_branch/greenboost_proton_wayland
./install.sh
```

--------------------------------

### Enable TurboQuant KV Compression with Environment Variables

Source: https://context7.com/isolatedoctopi/greenboost/llms.txt

Applies INT2/INT3/INT4 quantization to KV cache buffers for increased history. Set environment variables before starting inference. The compression bits can be auto-detected or explicitly set.

```bash
# Enable via environment variables before starting inference:
export GREENBOOST_TQ_ENABLED=1
export GREENBOOST_TQ_BITS=4          # 2, 3, or 4 bits (0=auto)
export GREENBOOST_TQ_HEAD_DIM=0      # 0=auto-detect attention head dimension
export GREENBOOST_TQ_SEED=42         # rotation matrix seed

greenboost run python -m vllm.entrypoints.openai.api_server \
    --model /opt/models/glm-4.7-flash-hf \
    --max-model-len 131072
```

```python
data = {k.strip(): int(v.strip())
        for line in open('/sys/class/greenboost/greenboost/status')
        if '=' in line for k, v in [line.split('=', 1)]}
print(f'KV compressed savings: {data.get("kv_compressed_mb", 0)} MB')
print(f'Active compression bits: {data.get("kv_compression_bits", 0)}')
print(f'TurboQuant sessions: {data.get("kv_compression_sessions", 0)}')
```

```c
struct gb_turboquant_req tq = {
    .enabled  = 1,
    .bits     = 4,    /* INT4 — 3× effective KV capacity */
    .head_dim = 0,    /* 0 = auto-detect                 */
    .seed     = 42,
};
ioctl(gb_fd, GB_IOCTL_SET_TURBOQUANT, &tq);
```

--------------------------------

### Apply ExLlamaV3 GreenBoost KV Cache Patch

Source: https://gitlab.com/isolatedoctopi/greenboost/-/blob/main/patches/README.md

Instructions for applying the GreenBoost KV cache layer patch to ExLlamaV3. This involves cloning the upstream repository, copying patch files, and installing the modified library.

```bash
git clone https://github.com/turboderp-org/exllamav3 libraries/exllamav3
cp patches/exllamav3/exllamav3/cache/greenboost.py libraries/exllamav3/exllamav3/cache/
cp patches/exllamav3/exllamav3/cache/__init__.py    libraries/exllamav3/exllamav3/cache/
STLOADER_USE_URING=1 /opt/greenboost/venv/bin/pip install -e libraries/exllamav3 --no-build-isolation
```

--------------------------------

### Check Kubelet Plugin Logs

Source: https://gitlab.com/isolatedoctopi/greenboost/-/blob/main/k8s-deployment/INSTALL_CLUSTER.md

Retrieve logs from the GreenBoost kubelet plugin to diagnose startup issues. Specify the correct namespace and pod name.

```bash
# Check plugin logs
kubectl logs -n greenboost-system <pod-name> -c greenboost-plugin
```

--------------------------------

### Activate GreenBoost Shim in Different Contexts

Source: https://context7.com/isolatedoctopi/greenboost/llms.txt

Demonstrates how to activate the GreenBoost shim in login shells, non-login contexts using wrappers or environment variables, systemd units, and Docker containers. Also shows how to enable debug logging and disable the shim for a specific process.

```bash
# Login shell — shim already active, run directly:
python your_script.py
python -m vllm.entrypoints.openai.api_server --model /opt/models/glm-4.7-flash-hf

# Non-login context — use the wrapper:
greenboost run python your_script.py
greenboost run ollama serve

# Or set inline:
GREENBOOST_ACTIVE=1 LD_PRELOAD=/usr/local/lib/libgreenboost_cuda.so python your_script.py

# In a systemd unit:
# [Service]
# Environment="GREENBOOST_ACTIVE=1"
# Environment="LD_PRELOAD=/usr/local/lib/libgreenboost_cuda.so"

# In Docker — Path B (no kernel module) activates automatically:
docker run --gpus all \
  -e GREENBOOST_ACTIVE=1 \
  -v /usr/local/lib/libgreenboost_cuda.so:/usr/local/lib/libgreenboost_cuda.so \
  -e LD_PRELOAD=/usr/local/lib/libgreenboost_cuda.so \
  my-llm-image python run_model.py

# Debug — see per-allocation path decisions:
GREENBOOST_DEBUG=1 ollama run glm-4.7-flash:q8_0 2>&1 | grep -E "KV|Phase|VRAM"
# [GreenBoost] Path A0 (cudaImportExtMem) : enabled — cudaImportExternalMemory (best bandwidth)
# [GreenBoost] Path A  (DMA-BUF+kernel)   : enabled — mmap+GB_IOCTL_PIN_USER_PTR+HostReg
# [GreenBoost] Path B  (HostReg/no-kmod)  : available — mmap+cuMemHostRegister (containers/VMs)
# [GreenBoost] Path C  (UVM/managed)      : available — cuMemAllocManaged+cuMemAdvise (last resort)

# Disable shim for one process without unloading:
GREENBOOST_DISABLE=1 python sensitive_script.py
```

--------------------------------

### Install NVIDIA GPU Operator via Helm

Source: https://context7.com/isolatedoctopi/greenboost/llms.txt

Installs the NVIDIA GPU Operator using Helm, enabling essential GPU management features like drivers, CUDA toolkit, and DCGM exporter. Ensure Helm repositories are updated before installation.

```bash
helm repo add nvidia https://helm.ngc.nvidia.com/nvidia && helm repo update
helm install --wait --generate-name nvidia/gpu-operator \
  --namespace gpu-operator --create-namespace \
  --set driver.enabled=true --set toolkit.enabled=true \
  --set dcgmExporter.enabled=true
```

--------------------------------

### Verify Kubelet Plugin Directory

Source: https://gitlab.com/isolatedoctopi/greenboost/-/blob/main/k8s-deployment/INSTALL_CLUSTER.md

List the contents of the kubelet plugin directory to ensure the GreenBoost plugin is correctly placed. Replace `<pod-name>` with the target pod's name.

```bash
# Verify kubelet plugin directory
kubectl exec -it <pod-name> -- ls -la /var/lib/kubelet/plugins/
```

--------------------------------

### Install NVIDIA k8s-dra-driver-gpu

Source: https://gitlab.com/isolatedoctopi/greenboost/-/blob/main/k8s-deployment/INSTALL_CLUSTER.md

Installs the NVIDIA k8s-dra-driver-gpu using Helm. This driver is essential for GPU resource management in Kubernetes.

```bash
# Clone k8s-dra-driver-gpu repository (if needed)
git clone https://github.com/NVIDIA/k8s-dra-driver-gpu.git \
  ~/Dev/greenboost_sources/k8s-dra-driver-gpu
cd ~/Dev/greenboost_sources/k8s-dra-driver-gpu

# Install via Helm
helm install --wait --generate-name \
  deployments/helm/nvidia-dra-driver-gpu \
  --namespace nvidia-dra-system \
  --create-namespace \
  --set resources.computeDomains.enabled=true \
  --set resources.gpus.enabled=true
```

--------------------------------

### Helm Install GreenBoost DRA Driver

Source: https://gitlab.com/isolatedoctopi/greenboost/-/blob/main/k8s-deployment/monitoring.md

Install the GreenBoost DRA driver using Helm. Ensure you are in the correct directory.

```bash
helm install greenboost-dra deployments/helm/greenboost-dra-driver
```

--------------------------------

### Install NVIDIA GPU Operator

Source: https://gitlab.com/isolatedoctopi/greenboost/-/blob/main/k8s-deployment/INSTALL_CLUSTER.md

Installs the NVIDIA GPU Operator using Helm, ensuring DRA is enabled. This is a prerequisite for GreenBoost.

```bash
# NVIDIA GPU Operator installation
helm repo add nvidia https://helm.ngc.nvidia.com/nvidia
helm repo update

# Install GPU Operator with DRA enabled
helm install --wait --generate-name \
  nvidia/gpu-operator \
  --namespace gpu-operator \
  --create-namespace \
  --set driver.enabled=true \
  --set toolkit.enabled=true \
  --set devicePlugin.enabled=true \
  --set migStrategy=single \
  --set dcgmExporter.enabled=true
```

--------------------------------

### Manage GreenBoost Profiles

Source: https://context7.com/isolatedoctopi/greenboost/llms.txt

Commands for creating, listing, showing, diffing, and activating GreenBoost profiles. Use 'create' to auto-detect hardware and save to a default profile. Activate a profile to reload modules with new parameters.

```bash
sudo greenboost profile create
```

```bash
greenboost profile list
```

```bash
greenboost profile show
```

```bash
greenboost profile diff
```

```bash
sudo greenboost profile activate /etc/greenboost/profiles/v100_cluster_node.md
```

--------------------------------

### Install GreenBoost DRA Driver

Source: https://gitlab.com/isolatedoctopi/greenboost/-/blob/main/k8s-deployment/INSTALL_CLUSTER.md

Installs the GreenBoost DRA driver using its Helm chart. This step enables GreenBoost's advanced features like NVLink pooling and device class management.

```bash
# Create greenboost namespace
kubectl create namespace greenboost-system

# Install GreenBoost DRA driver
helm install --wait --generate-name \
  deployments/helm/greenboost-dra-driver \
  --namespace greenboost-system \
  --values k8s-deployment/values-v100-cluster.yaml \
  --set greenboost.enable=true \
  --set greenboost.nvlinkPool=true \
  --set deviceClass.enabled=true \
  --set metricsExporter.enabled=true
```

--------------------------------

### Run text-generation-inference (TGI) with GreenBoost

Source: https://gitlab.com/isolatedoctopi/greenboost/-/blob/main/DOCUMENTATION.md

TGI uses a PyTorch backend. Use 'greenboost run' for command-line execution or configure the systemd service with the necessary environment variables.

```bash
greenboost run text-generation-launcher \
    --model-id /opt/models/glm-4.7-flash-hf \
    --num-shard 1 \
    --max-total-tokens 131072
```

```ini
[Service]
Environment="LD_PRELOAD=/usr/local/lib/libgreenboost_cuda.so"
Environment="GREENBOOST_ACTIVE=1"
```

--------------------------------

### Download and Run Transformers Model with GreenBoost

Source: https://gitlab.com/isolatedoctopi/greenboost/-/blob/main/DOCUMENTATION.md

Download a model from Hugging Face Hub using `snapshot_download` and then load and run it within a single script executed by GreenBoost.

```bash
greenboost run python - <<'EOF'
from huggingface_hub import snapshot_download
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

snapshot_download("THUDM/glm-4.7-flash-hf", local_dir="/opt/models/glm-4.7-flash-hf")

model = AutoModelForCausalLM.from_pretrained(
    "/opt/models/glm-4.7-flash-hf", torch_dtype=torch.bfloat16, device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("/opt/models/glm-4.7-flash-hf")
ids = tokenizer("Hello!", return_tensors="pt").input_ids.to(model.device)
print(tokenizer.decode(model.generate(ids, max_new_tokens=100)[0]))
EOF
```

--------------------------------

### Activate GREENBOOST in Interactive Terminal

Source: https://gitlab.com/isolatedoctopi/greenboost/-/blob/main/DOCUMENTATION.md

Set GREENBOOST_ACTIVE=1 in an interactive terminal to enable GREENBOOST. This example uses Python to report VRAM.

```shell
python -c "
import torch
print('VRAM reported:', round(torch.cuda.get_device_properties(0).total_memory / 1e9, 1), 'GB')
"
```

--------------------------------

### Update KV Reserve via greenboost_setup.sh

Source: https://gitlab.com/isolatedoctopi/greenboost/-/blob/main/DOCUMENTATION.md

Update the KV reserve at runtime to 4096 MB using the greenboost_setup.sh script. This script reads the current configuration and patches it accordingly.

```bash
# Via greenboost_setup.sh (reads current then patches)
sudo ./greenboost_setup.sh tune-kv-reserve 4096
```

--------------------------------

### View Live Dashboard with GreenBoost Vulkan

Source: https://gitlab.com/isolatedoctopi/greenboost/-/blob/main/greenboost_proton/architecture.md

Use this command to access the live dashboard, which displays device, DX12 game, T2 stats, and any identified issues.

```bash
greenboost vulkan          # live dashboard: device, DX12 game, T2 stats, issues
```

--------------------------------

### GreenBoost File Layout

Source: https://gitlab.com/isolatedoctopi/greenboost/-/blob/main/greenboost_proton/architecture.md

The directory structure of the greenboost-proton-wayland project, showing key components like the patched orchestrator, installer scripts, and bundled Proton Experimental binaries.

```plaintext
greenboost-proton-wayland/
├── proton                   Python 3 orchestrator (patched Proton Experimental)
├── install.sh               Steam compat tool installer
├── compatibilitytool.vdf    Steam registration
├── toolmanifest.vdf         Steam invocation spec
├── version                  Version string
├── files/                   (Wine+VKD3D+DXVK)
├── protonfixes/          
├── filelock.py           
├── architecture.md
└── documentation.md
```

--------------------------------

### Run Inference with GreenBoost

Source: https://gitlab.com/isolatedoctopi/greenboost/-/blob/main/DOCUMENTATION.md

Activate GreenBoost and preload the library to run inference scripts. This is a general command for enabling GreenBoost.

```bash
GREENBOOST_ACTIVE=1 LD_PRELOAD=/usr/local/lib/libgreenboost_cuda.so \
    python inference.py
```

--------------------------------

### Pin KV Cache to T1 with Transformers

Source: https://gitlab.com/isolatedoctopi/greenboost/-/blob/main/DOCUMENTATION.md

Configure environment variables to reserve memory for KV cache and ensure it lands in T1. This setup is for Hugging Face Transformers.

```python
import os
os.environ["GREENBOOST_KV_RESERVE_MB"] = "4096"   # 4 GB reserved for KV
os.environ["GREENBOOST_KV_OVERFLOW"]   = "0"       # use phase detector (default)

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained(
    "/opt/models/glm-4.7-flash-hf",
    torch_dtype=torch.bfloat16,
    device_map="auto",          # sees T1+T2 total — all layers on GPU
    attn_implementation="flash_attention_2",   # requires OLLAMA_FLASH_ATTENTION=1 equivalent
)
tokenizer = AutoTokenizer.from_pretrained("/opt/models/glm-4.7-flash-hf")

# All tokens below will have KV cache allocated in T1 (phase detector: INFERENCE)
out = model.generate(
    tokenizer("Hello!", return_tensors="pt").input_ids.cuda(),
    max_new_tokens=512,
    use_cache=True,   # ensure KV cache is used
)
print(tokenizer.decode(out[0], skip_special_tokens=True))
```

--------------------------------

### Run Ollama with GreenBoost

Source: https://gitlab.com/isolatedoctopi/greenboost/-/blob/main/DOCUMENTATION.md

Ollama is handled automatically by 'install-sys-configs'. If running Ollama outside systemd, use the 'greenboost run' wrapper.

```bash
ollama run glm-4.7-flash:q8_0   # GreenBoost is transparent

greenboost run ollama serve
```

--------------------------------

### WSL2 Integration with GreenBoost (Path B)

Source: https://context7.com/isolatedoctopi/greenboost/llms.txt

Configure GreenBoost for WSL2 by setting environment variables in user or system-wide configuration files. This ensures Path B works natively by exposing the GPU via /dev/dxg.

```bash
export GREENBOOST_ACTIVE=1
export LD_PRELOAD=/usr/local/lib/libgreenboost_cuda.so
python your_script.py
```

--------------------------------

### Increase KV Reserve for vLLM

Source: https://gitlab.com/isolatedoctopi/greenboost/-/blob/main/DOCUMENTATION.md

Set a large KV reserve and enable overflow for GreenBoost before launching vLLM. This ensures vLLM's pre-allocated KV cache gets priority in T1.

```bash
# Set 8 GB reserve before starting vLLM (131K context needs ~7–9 GB KV)
GREENBOOST_KV_RESERVE_MB=8192 \
GREENBOOST_KV_OVERFLOW=1 \
GREENBOOST_ACTIVE=1 LD_PRELOAD=/usr/local/lib/libgreenboost_cuda.so \
    python -m vllm.entrypoints.openai.api_server \
        --model /opt/models/glm-4.7-flash-hf \
        --max-model-len 131072
```

--------------------------------

### Deploy Test LLM Workload and Monitor Logs

Source: https://context7.com/isolatedoctopi/greenboost/llms.txt

Applies a sample LLM workload to the Kubernetes cluster and streams its logs. This helps verify the GreenBoost deployment and monitor the inference process.

```bash
kubectl apply -f k8s-examples/greenboost-llm-pod.yaml
kubectl logs -f llm-inference -n greenboost-llm -c ollama
```

--------------------------------

### Run LLM in Docker with GreenBoost Path B

Source: https://gitlab.com/isolatedoctopi/greenboost/-/blob/main/CONTAINER_VM_MODE.md

This command demonstrates how to run a Docker container with GreenBoost enabled. It preloads the GreenBoost CUDA library and sets the necessary environment variable. Ensure the GreenBoost library is mounted into the container.

```bash
docker run --gpus all \
  -e GREENBOOST_ACTIVE=1 \
  -v /usr/local/lib/libgreenboost_cuda.so:/usr/local/lib/libgreenboost_cuda.so \
  -e LD_PRELOAD=/usr/local/lib/libgreenboost_cuda.so \
  my-llm-image python run_model.py
```

--------------------------------

### Run vLLM with GreenBoost

Source: https://gitlab.com/isolatedoctopi/greenboost/-/blob/main/DOCUMENTATION.md

vLLM loads libcuda lazily. Use 'greenboost run' for command-line execution or configure the systemd service with the necessary environment variables.

```bash
greenboost run python -m vllm.entrypoints.openai.api_server \
    --model /opt/models/glm-4.7-flash-hf \
    --dtype float16 \
    --max-model-len 131072 \
    --gpu-memory-utilization 0.95
```

```ini
[Service]
Environment="LD_PRELOAD=/usr/local/lib/libgreenboost_cuda.so"
Environment="GREENBOOST_ACTIVE=1"
```

--------------------------------

### TF with XLA and GreenBoost Configuration

Source: https://gitlab.com/isolatedoctopi/greenboost/-/blob/main/DOCUMENTATION.md

Run XLA training scripts with GreenBoost, disabling the phase detection and KV overflow heuristics. This prevents XLA's scratch buffer pre-allocations from confusing the phase detector.

```bash
GREENBOOST_PHASE_DETECT=0 GREENBOOST_KV_OVERFLOW=0 \
GREENBOOST_ACTIVE=1 LD_PRELOAD=/usr/local/lib/libgreenboost_cuda.so \
    python xla_training.py
```

--------------------------------

### TensorFlow Memory Growth Configuration

Source: https://gitlab.com/isolatedoctopi/greenboost/-/blob/main/DOCUMENTATION.md

Configure TensorFlow to use memory growth instead of pre-allocating all GPU memory on startup. This allows GreenBoost to manage allocations more effectively. Note that LD_PRELOAD must be set before Python starts.

```python
import os
os.environ["GREENBOOST_ACTIVE"] = "1"
# LD_PRELOAD must be set before Python starts, not here

import tensorflow as tf

# Prevent TF from trying to allocate the entire T1+T2 virtual pool at once
gpus = tf.config.list_physical_devices("GPU")
for gpu in gpus:
    tf.config.experimental.set_memory_growth(gpu, True)

# From this point forward cudaMalloc calls are GreenBoost-managed:
# weights → T1, KV-like buffers → T1-priority (if phase heuristic fires)
model = tf.keras.models.load_model("/opt/models/my_model.keras")
```

--------------------------------

### GreenBoost Profile Storage Structure

Source: https://gitlab.com/isolatedoctopi/greenboost/-/blob/main/profiles/README.md

Illustrates the directory structure for GreenBoost profiles, including default and resolved profile files.

```bash
/etc/greenboost/
├── profiles/
│   ├── default.md                    # auto-generated on install
│   └── resolved_<timestamp>.md       # written on --profile conflict resolution
└── active_profile.md                 # symlink → profiles/<active>.md
```

--------------------------------

### Activate GreenBoost in Non-Login Contexts

Source: https://gitlab.com/isolatedoctopi/greenboost/-/blob/main/DOCUMENTATION.md

For non-login contexts like cron jobs or Docker entrypoints, use the 'greenboost run' wrapper or set GREENBOOST_ACTIVE=1 and LD_PRELOAD explicitly. This is also how you configure it in systemd units or Docker environments.

```bash
# Wrapper — sets GREENBOOST_ACTIVE=1 for one command
greenboost run python your_script.py

# Or inline
GREENBOOST_ACTIVE=1 LD_PRELOAD=/usr/local/lib/libgreenboost_cuda.so python your_script.py

# Or in a systemd unit / Docker environment
Environment="GREENBOOST_ACTIVE=1"
Environment="LD_PRELOAD=/usr/local/lib/libgreenboost_cuda.so"
```

--------------------------------

### Install GreenBoost DRA Driver with Helm

Source: https://context7.com/isolatedoctopi/greenboost/llms.txt

Deploys the GreenBoost DRA driver to a Kubernetes cluster using Helm. This configuration enables GreenBoost features, NVLink pool aggregation, and metrics export. Specify the appropriate values file for cluster sizing.

```bash
helm install --wait --generate-name deployments/helm/greenboost-dra-driver \
  --namespace greenboost-system \
  --values deployments/helm/greenboost-dra-driver/values-v100-cluster.yaml \
  --set greenboost.enable=true \
  --set greenboost.nvlinkPool=true \
  --set metricsExporter.enabled=true
```

--------------------------------

### Greenboost Core Commands

Source: https://context7.com/isolatedoctopi/greenboost/llms.txt

Basic commands for checking status, cleaning memory, and loading/unloading the kernel module.

```bash
greenboost status
```

```bash
greenboost clean memory
```

```bash
sudo greenboost load
```

```bash
sudo greenboost unload
```

```bash
greenboost run <app> [args...]
```

--------------------------------

### Text Generation Inference (TGI) with GreenBoost

Source: https://context7.com/isolatedoctopi/greenboost/llms.txt

Launch the text-generation-launcher for TGI with GreenBoost. This command specifies the model and shard count, leveraging GreenBoost for memory management during inference.

```bash
greenboost run text-generation-launcher \
    --model-id /opt/models/glm-4.7-flash-hf \
    --num-shard 1 \
    --max-total-tokens 131072
```

--------------------------------

### Running TensorFlow Scripts with GreenBoost

Source: https://gitlab.com/isolatedoctopi/greenboost/-/blob/main/DOCUMENTATION.md

Launch TensorFlow scripts with GreenBoost enabled by setting environment variables. Ensure LD_PRELOAD is correctly set before script execution.

```bash
GREENBOOST_ACTIVE=1 LD_PRELOAD=/usr/local/lib/libgreenboost_cuda.so \
    python your_tf_script.py
```

--------------------------------

### Check GreenBoost Kernel Module Parameters

Source: https://gitlab.com/isolatedoctopi/greenboost/-/blob/main/k8s-deployment/INSTALL_CLUSTER.md

Inspect the parameters of the loaded GreenBoost kernel module using `modinfo`. Replace `<pod-name>` with the relevant pod name.

```bash
# Check module parameters
kubectl exec -it <pod-name> -- modinfo greenboost
```

--------------------------------

### ExLlamaV3 Native GB_ALLOC_* Usage Pattern

Source: https://gitlab.com/isolatedoctopi/greenboost/-/blob/main/DOCUMENTATION.md

This C code demonstrates the typical usage pattern for allocating memory with specific flags using the `GB_IOCTL_ALLOC` IOCTL in ExLlamaV3. It shows how to set up a `gb_alloc_req` structure with desired size and flags, and how to handle the returned handle upon successful allocation.

```c
struct gb_alloc_req req = {
    .size_bytes = kv_size,
    .flags      = GB_ALLOC_KV_CACHE | GB_ALLOC_T1_PRIORITY,
    .tier_hint  = 1,   /* prefer T1 */
};
if (ioctl(gb_fd, GB_IOCTL_ALLOC, &req) == 0)
    kv_handle = req.handle;
```

--------------------------------

### GreenBoost Profile Management CLI Commands

Source: https://gitlab.com/isolatedoctopi/greenboost/-/blob/main/profiles/README.md

Provides essential bash commands for managing GreenBoost profiles, such as creating, showing, listing, activating, and diffing profiles.

```bash
# Profile management
sudo ./greenboost_setup.sh profile create           # auto-detect hardware → write default.md
sudo ./greenboost_setup.sh profile show             # print active profile
sudo ./greenboost_setup.sh profile show <file>      # print specific file
sudo ./greenboost_setup.sh profile list             # list available profiles
sudo ./greenboost_setup.sh profile activate <file>  # set active_profile.md symlink
sudo ./greenboost_setup.sh profile diff [file]      # compare profile vs live hardware

# Load with a user-supplied profile (any command accepts --profile)
sudo ./greenboost_setup.sh --profile ~/my_profile.md load
sudo ./greenboost_setup.sh --profile ~/my_profile.md full-install
```

--------------------------------

### Build GreenBoost Kernel Module

Source: https://gitlab.com/isolatedoctopi/greenboost/-/blob/main/k8s-deployment/INSTALL_CLUSTER.md

Prepares and builds the GreenBoost kernel module on all cluster nodes using a DaemonSet. This involves creating a ConfigMap with module source files and applying a DaemonSet definition.

```bash
# Deploy container image with build tools
kubectl create configmap greenboost-module \
  --from-file=greenboost.c \
  --from-file=greenboost_ioctl.h

# Create DaemonSet to build module
kubectl apply -f k8s-deployment/greenboost-module-builder.yaml
```

--------------------------------

### Verify GreenBoost Activation

Source: https://gitlab.com/isolatedoctopi/greenboost/-/blob/main/DOCUMENTATION.md

These bash commands help verify if GreenBoost is active and functioning correctly. Checking the `/sys/class/greenboost/greenboost/status` file after loading a large model can confirm if the T2 pool is being utilized. The second command is a placeholder for confirming virtual VRAM visibility.

```bash
# Should show T2 pool in use (non-zero) after loading a model larger than 12 GB:
cat /sys/class/greenboost/greenboost/status
```

```bash
# Confirm virtual VRAM is visible (should report T1+T2 total, not just physical VRAM):
```

--------------------------------

### Vulkan Shader Boost Daemon Actions

Source: https://gitlab.com/isolatedoctopi/greenboost/-/blob/main/ARCHITECTURE.md

Details the actions performed by the `greenboost-shader-boost.service` daemon to optimize `fossilize_replay` workers, including renicing, I/O priority adjustment, and CPU core pinning.

```text
Action | Effect
|--------|--------|
| `renice -5` | Elevates above all nice-0 background tasks |
| `ionice -c2 -n0` | Best-effort I/O at highest priority |
| `taskset 0-<pcores_max>` | Pins to P-cores (auto-detected from sysfs `core_type`)
```

--------------------------------

### ExLlamaV3 Integration with GreenBoost

Source: https://context7.com/isolatedoctopi/greenboost/llms.txt

Run ExLlamaV3 scripts with GreenBoost's KV cache overflow support. Set the `GREENBOOST_KV_OVERFLOW=1` environment variable to enable allocation priority for T1.

```bash
GREENBOOST_KV_OVERFLOW=1 greenboost run python your_exllama_script.py
```

--------------------------------

### Greenboost Diagnostics Commands

Source: https://context7.com/isolatedoctopi/greenboost/llms.txt

Commands for running benchmarks, collecting logs, and clearing log data.

```bash
greenboost benchmark
```

```bash
greenboost logs
```

```bash
greenboost inference-logs
```

```bash
greenboost proton-logs
```

```bash
sudo greenboost clear logs
```

```bash
sudo greenboost clear inference-logs
```

--------------------------------

### Inspect ServiceMonitor Configuration

Source: https://gitlab.com/isolatedoctopi/greenboost/-/blob/main/k8s-deployment/monitoring.md

Examine the YAML configuration of the 'greenboost-metrics' ServiceMonitor in the 'monitoring' namespace.

```bash
kubectl get servicemonitor greenboost-metrics -n monitoring -o yaml
```

--------------------------------

### Deploy GreenBoost Metrics Exporter

Source: https://gitlab.com/isolatedoctopi/greenboost/-/blob/main/k8s-deployment/monitoring.md

Apply the Kubernetes configuration for the GreenBoost metrics exporter.

```bash
kubectl apply -f deployments/helm/greenboost-dra-driver/exporter/
```

--------------------------------

### vLLM Integration with GreenBoost

Source: https://context7.com/isolatedoctopi/greenboost/llms.txt

Launch a vLLM OpenAI-compatible API server with GreenBoost. The `--gpu-memory-utilization` flag queries T1 free space via GreenBoost, allowing efficient memory management.

```bash
greenboost run python -m vllm.entrypoints.openai.api_server \
    --model /opt/models/glm-4.7-flash-hf \
    --max-model-len 131072 \
    --gpu-memory-utilization 0.95
```

--------------------------------

### Optimal Loading Order for PyTorch with GreenBoost

Source: https://gitlab.com/isolatedoctopi/greenboost/-/blob/main/DOCUMENTATION.md

Load model weights first, then allocate KV cache. This order ensures GreenBoost's phase detector correctly identifies the phases and maximizes T1 utilization.

```python
import torch

# 1. Load model weights (phase = MODEL_LOAD — reserve inactive, weights fill T1)
model = MyModel().cuda()
model.load_state_dict(torch.load("model.pt", map_location="cuda"))

# 2. Allocate KV cache (phase = INFERENCE — reserve activates, KV lands in T1)
kv_cache = torch.zeros(batch, n_layers, seq_len, d_model, device="cuda")
# ↑ GreenBoost auto-classifies this as KV via quiet-gap heuristic
```

--------------------------------

### Ollama Integration with GreenBoost

Source: https://context7.com/isolatedoctopi/greenboost/llms.txt

Run Ollama models with GreenBoost. It's handled automatically in login shells. For non-login shells or Docker, use `greenboost run ollama serve`. Ollama 0.18+ intercepts the VMM path automatically.

```bash
ollama run glm-4.7-flash:q8_0
```

```bash
greenboost run ollama serve
```

--------------------------------

### Greenboost Profile Management Commands

Source: https://context7.com/isolatedoctopi/greenboost/llms.txt

Commands for managing Greenboost profiles, including creation, listing, showing, activation, and diffing.

```bash
greenboost profile
```

```bash
sudo greenboost profile create
```

```bash
greenboost profile list
```

```bash
greenboost profile show
```

```bash
greenboost profile show <file>
```

```bash
sudo greenboost profile activate <f>
```

```bash
greenboost profile diff [file]
```

--------------------------------

### Check NVLink Fabric State

Source: https://gitlab.com/isolatedoctopi/greenboost/-/blob/main/k8s-deployment/INSTALL_CLUSTER.md

Read the `nvlink_ready` file to determine if the NVLink fabric is active and ready for use by GreenBoost. Replace `<pod-name>` with the relevant pod name.

```bash
# Check NVLink fabric state
kubectl exec -it <pod-name> -- cat /sys/class/greenboost/greenboost/nvlink_ready
```

--------------------------------

### Run ExLlamaV3 Script with GreenBoost venv

Source: https://gitlab.com/isolatedoctopi/greenboost/-/blob/main/DOCUMENTATION.md

Execute an ExLlamaV3 script using the GreenBoost virtual environment. This ensures GreenBoost's CUDA library is preloaded for ExLlamaV3.

```bash
GREENBOOST_ACTIVE=1 LD_PRELOAD=/usr/local/lib/libgreenboost_cuda.so \
    /opt/greenboost/venv/bin/python your_exllama_script.py
```

--------------------------------

### Create ServiceMonitor for Metrics

Source: https://gitlab.com/isolatedoctopi/greenboost/-/blob/main/k8s-deployment/monitoring.md

Apply the ServiceMonitor configuration to enable Prometheus to discover and scrape GreenBoost metrics.

```bash
kubectl apply -f k8s-deployment/servicemonitor.yaml
```

--------------------------------

### Manage Session Priority with Python fcntl

Source: https://context7.com/isolatedoctopi/greenboost/llms.txt

Use Python's fcntl module to interact with the Greenboost IOCTL for session management. This allows promoting a session's T2 buffers to the LRU head (active) or demoting them to the LRU tail (idle), influencing eviction priority in multi-model deployments. Ensure the '/dev/greenboost' device exists.

```python
import fcntl, struct, os

_IOC_SESSION_IDLE   = (1 << 30) | (ord('G') << 8) | 18 | (8 << 16)
_IOC_SESSION_ACTIVE = (1 << 30) | (ord('G') << 8) | 19 | (8 << 16)

def _gb_session_ioctl(cmd: int) -> None:
    dev = "/dev/greenboost"
    if not os.path.exists(dev):
        return
    buf = struct.pack("II", 0, 0)   # pid=0 → caller's PID, reserved=0
    fd  = os.open(dev, os.O_RDWR)
    try:
        fcntl.ioctl(fd, cmd, buf)
    finally:
        os.close(fd)

def gb_session_idle()  -> None: _gb_session_ioctl(_IOC_SESSION_IDLE)
def gb_session_active()-> None: _gb_session_ioctl(_IOC_SESSION_ACTIVE)

# ── Usage in a multi-model server ────────────────────────────────────────────
# When a request arrives for model A:
gb_session_active()   # pin model A's T2 weights to LRU head (last evicted)

# Process request ...
response = model_a.generate(prompt)

# When model A goes idle (no pending requests):
gb_session_idle()     # move model A's T2 weights to LRU tail (first evicted)
# Model B (active) will keep its T2 weights; model A's become eviction candidates
```

--------------------------------

### Optimize Kubelet Plugin Resources

Source: https://gitlab.com/isolatedoctopi/greenboost/-/blob/main/k8s-deployment/INSTALL_CLUSTER.md

Configure resource requests and limits for the kubelet plugin in `values.yaml` to optimize its performance and stability.

```yaml
kubeletPlugin:
  resources:
    requests:
      cpu: "100m"
      memory: "256Mi"
    limits:
      cpu: "500m"
      memory: "512Mi"
```

--------------------------------

### Run PyTorch Script with GreenBoost

Source: https://gitlab.com/isolatedoctopi/greenboost/-/blob/main/DOCUMENTATION.md

Execute a generic Python script that uses PyTorch with GreenBoost. This can be done via the `greenboost run` command or by manually setting environment variables.

```bash
greenboost run python your_script.py
```

```bash
GREENBOOST_ACTIVE=1 LD_PRELOAD=/usr/local/lib/libgreenboost_cuda.so python your_script.py
```

--------------------------------

### Load Transformers Model with device_map='auto'

Source: https://gitlab.com/isolatedoctopi/greenboost/-/blob/main/DOCUMENTATION.md

Load a Hugging Face Transformers model using `device_map="auto"`. GreenBoost reports the total VRAM (T1+T2), allowing the entire model to be loaded onto the GPU.

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model = AutoModelForCausalLM.from_pretrained(
    "/opt/models/glm-4.7-flash-hf",
    torch_dtype=torch.bfloat16,
    device_map="auto",          # sees T1+T2 total — loads entire model onto T1+T2
)
tokenizer = AutoTokenizer.from_pretrained("/opt/models/glm-4.7-flash-hf")

messages = [{"role": "user", "content": "Hello!"}]
input_ids = tokenizer.apply_chat_template(
    messages, tokenize=True, return_tensors="pt", add_generation_prompt=True
).to(model.device)

output = model.generate(input_ids, max_new_tokens=300)
print(tokenizer.decode(output[0], skip_special_tokens=True))
```