### Install GreenBoost and Verify Setup Source: https://context7.com/isolatedoctopi/greenboost/llms.txt Installs GreenBoost using a setup script and verifies the installation by checking the status and confirming the total virtual VRAM visible to PyTorch. ```bash git clone https://gitlab.com/IsolatedOctopi/greenboost.git cd greenboost sudo ./greenboost_setup.sh # interactive: Full Install or Light Install # Full Install performs: # - DKMS build + load of greenboost.ko # - /etc/ld.so.preload ← libgreenboost_cuda.so (system-wide CUDA shim) # - /etc/ld.so.audit ← libgreenboost_audit.so (process filter) # - Ollama systemd drop-in injecting GREENBOOST_ACTIVE=1 # - CPU governor + NVMe udev rules + hugepages tuning # - idle-reclaim daemon + shader-boost service # After install, verify: greenboost status # Tier 1 GPU VRAM : 12 GB (physical VRAM on e.g. RTX 3060) # Tier 2 System RAM pool : 51 GB (pinned DDR, DMA-BUF exported) # Tier 3 NVMe : 200 GB (backing file /var/lib/greenboost/t3_store) # Confirm virtual VRAM is visible to PyTorch (login shell — GREENBOOST_ACTIVE=1 already set): python -c "import torch; print(round(torch.cuda.get_device_properties(0).total_memory/1e9,1), 'GB')" # 63.0 GB (T1 12 GB + T2 51 GB) ``` -------------------------------- ### Install GreenBoost Source: https://gitlab.com/isolatedoctopi/greenboost/-/blob/main/README.md Clone the repository, navigate to the directory, and run the setup script. The installer will prompt for installation mode (Full or Light). ```bash git clone https://gitlab.com/IsolatedOctopi/greenboost.git cd greenboost sudo ./greenboost_setup.sh ``` -------------------------------- ### Deploy Test Workload Source: https://gitlab.com/isolatedoctopi/greenboost/-/blob/main/k8s-deployment/INSTALL_CLUSTER.md Deploys an example LLM pod to test the GreenBoost installation. This involves applying a pod definition and monitoring its logs. ```bash # Deploy example LLM pod kubectl apply -f k8s-examples/greenboost-llm-pod.yaml # Monitor pod status kubectl get pods -n greenboost-llm kubectl logs -f llm-inference -n greenboost-llm -c ollama ``` -------------------------------- ### Start GreenBoost Gaming Service Source: https://gitlab.com/isolatedoctopi/greenboost/-/blob/main/GREENBOOST_PROTON.md This command starts the GreenBoost gaming service before launching a game. It shifts VRAM priority to the game. The service is stopped after the game exits. ```bash systemctl start greenboost-gaming.service ``` -------------------------------- ### Docker/Podman Integration with GreenBoost (Path B) Source: https://context7.com/isolatedoctopi/greenboost/llms.txt Run containers with GreenBoost enabled. This example uses environment variables and volume mounts to preload the GreenBoost CUDA library, enabling Path B automatically. ```bash docker run --gpus all \ -e GREENBOOST_ACTIVE=1 \ -v /usr/local/lib/libgreenboost_cuda.so:/usr/local/lib/libgreenboost_cuda.so \ -e LD_PRELOAD=/usr/local/lib/libgreenboost_cuda.so \ nvcr.io/nvidia/cuda:12.4.0-runtime-ubuntu22.04 \ python run_model.py ``` -------------------------------- ### Launch vLLM API Server with GreenBoost Source: https://gitlab.com/isolatedoctopi/greenboost/-/blob/main/DOCUMENTATION.md Start the vLLM API server with GreenBoost enabled, specifying model path and GPU utilization. This command enables GreenBoost for vLLM. ```bash GREENBOOST_ACTIVE=1 LD_PRELOAD=/usr/local/lib/libgreenboost_cuda.so \ python -m vllm.entrypoints.openai.api_server \ --model /opt/models/glm-4.7-flash-hf \ --max-model-len 131072 \ --gpu-memory-utilization 0.95 \ --enforce-eager ``` -------------------------------- ### Greenboost Maintenance Commands Source: https://context7.com/isolatedoctopi/greenboost/llms.txt Commands for system configuration installation and recovery. ```bash sudo greenboost install-sys-configs ``` ```bash sudo greenboost recover ``` -------------------------------- ### GreenBoost Profile Format (YAML Frontmatter) Source: https://context7.com/isolatedoctopi/greenboost/llms.txt Example structure of a GreenBoost profile file using YAML frontmatter. Defines memory allocation, swap, and other hardware-specific parameters. Human-readable comments can follow the YAML section. ```yaml --- physical_vram_gb: 12 virtual_vram_gb: 51 safety_reserve_gb: 8 nvme_swap_gb: 200 nvme_pool_gb: 180 use_hugepages: 1 pcores_only: 1 tier3_backend: nvme --- # Human-readable section follows ... # RTX 5070 12 GB, i9-14900KF, 64 GB DDR5 ``` -------------------------------- ### Example GreenBoost Metrics Output Source: https://gitlab.com/isolatedoctopi/greenboost/-/blob/main/k8s-deployment/monitoring.md Sample output from the GreenBoost metrics exporter, showing various memory tiers, watchdog pressure, and NVLink status. ```text # HELP greenboost_t1_used_bytes Physical VRAM in use (T1) # TYPE greenboost_t1_used_bytes gauge greenboost_t1_used_bytes{gpu="0"} 10737418240 greenboost_t1_used_bytes{gpu="1"} 10737418240 # TYPE greenboost_t1_total_bytes gauge greenboost_t1_total_bytes 32984999936 # HELP greenboost_t2_used_bytes System DDR pool used (T2) # TYPE greenboost_t2_used_bytes gauge greenboost_t2_used_bytes 329853488128 # HELP greenboost_t2_total_bytes DDR pool total capacity # TYPE greenboost_t2_total_bytes gauge greenboost_t2_total_bytes 329853488128 # HELP greenboost_watchdog_pressure Watchdog pressure (0=healthy, 100=impending OOM) # TYPE greenboost_watchdog_pressure gauge greenboost_watchdog_pressure 0 # HELP greenboost_nvlink_ready NVLink fabric health (1=ready, 0=not ready) # TYPE greenboost_nvlink_ready gauge greenboost_nvlink_ready 1 ``` -------------------------------- ### Run ExLlamaV3 Script with GreenBoost Source: https://gitlab.com/isolatedoctopi/greenboost/-/blob/main/DOCUMENTATION.md Execute an ExLlamaV3 Python script using GreenBoost. This command assumes ExLlamaV3 is installed and configured. ```bash greenboost run python your_exllama_script.py ``` -------------------------------- ### Configure GreenBoost Memory Tiers Source: https://gitlab.com/isolatedoctopi/greenboost/-/blob/main/k8s-deployment/INSTALL_CLUSTER.md Adjust memory tier sizing in the `values.yaml` file for different cluster configurations. This example shows settings for an 8x V100 32GB setup. ```yaml greenboost: physicalVramGb: 256 # 8× V100 32GB virtualVramGb: 307 # Adjust based on available system RAM tier3Backend: "lustre" ``` -------------------------------- ### Manual Coordination of Gaming and Inference Services Source: https://context7.com/isolatedoctopi/greenboost/llms.txt Manually manage GreenBoost services for coordinating gaming and inference workloads. Start the gaming service before playing to reduce KV T1 reservation, and stop it afterward to restore inference priority. ```bash systemctl start greenboost-gaming.service # before game: reduce KV T1 reservation ``` ```bash systemctl stop greenboost-gaming.service # after game: restore inference priority ``` -------------------------------- ### GreenBoost Proton Installation Script Source: https://gitlab.com/isolatedoctopi/greenboost/-/blob/main/GREENBOOST_PROTON.md This script installs GreenBoost Proton to the Steam compatibility tools directory. Ensure you are in the correct directory before running. ```bash cd ~/Dev/greenboost_main_branch/greenboost_proton_wayland ./install.sh ``` -------------------------------- ### Enable TurboQuant KV Compression with Environment Variables Source: https://context7.com/isolatedoctopi/greenboost/llms.txt Applies INT2/INT3/INT4 quantization to KV cache buffers for increased history. Set environment variables before starting inference. The compression bits can be auto-detected or explicitly set. ```bash # Enable via environment variables before starting inference: export GREENBOOST_TQ_ENABLED=1 export GREENBOOST_TQ_BITS=4 # 2, 3, or 4 bits (0=auto) export GREENBOOST_TQ_HEAD_DIM=0 # 0=auto-detect attention head dimension export GREENBOOST_TQ_SEED=42 # rotation matrix seed greenboost run python -m vllm.entrypoints.openai.api_server \ --model /opt/models/glm-4.7-flash-hf \ --max-model-len 131072 ``` ```python data = {k.strip(): int(v.strip()) for line in open('/sys/class/greenboost/greenboost/status') if '=' in line for k, v in [line.split('=', 1)]} print(f'KV compressed savings: {data.get("kv_compressed_mb", 0)} MB') print(f'Active compression bits: {data.get("kv_compression_bits", 0)}') print(f'TurboQuant sessions: {data.get("kv_compression_sessions", 0)}') ``` ```c struct gb_turboquant_req tq = { .enabled = 1, .bits = 4, /* INT4 — 3× effective KV capacity */ .head_dim = 0, /* 0 = auto-detect */ .seed = 42, }; ioctl(gb_fd, GB_IOCTL_SET_TURBOQUANT, &tq); ``` -------------------------------- ### Apply ExLlamaV3 GreenBoost KV Cache Patch Source: https://gitlab.com/isolatedoctopi/greenboost/-/blob/main/patches/README.md Instructions for applying the GreenBoost KV cache layer patch to ExLlamaV3. This involves cloning the upstream repository, copying patch files, and installing the modified library. ```bash git clone https://github.com/turboderp-org/exllamav3 libraries/exllamav3 cp patches/exllamav3/exllamav3/cache/greenboost.py libraries/exllamav3/exllamav3/cache/ cp patches/exllamav3/exllamav3/cache/__init__.py libraries/exllamav3/exllamav3/cache/ STLOADER_USE_URING=1 /opt/greenboost/venv/bin/pip install -e libraries/exllamav3 --no-build-isolation ``` -------------------------------- ### Check Kubelet Plugin Logs Source: https://gitlab.com/isolatedoctopi/greenboost/-/blob/main/k8s-deployment/INSTALL_CLUSTER.md Retrieve logs from the GreenBoost kubelet plugin to diagnose startup issues. Specify the correct namespace and pod name. ```bash # Check plugin logs kubectl logs -n greenboost-system -c greenboost-plugin ``` -------------------------------- ### Activate GreenBoost Shim in Different Contexts Source: https://context7.com/isolatedoctopi/greenboost/llms.txt Demonstrates how to activate the GreenBoost shim in login shells, non-login contexts using wrappers or environment variables, systemd units, and Docker containers. Also shows how to enable debug logging and disable the shim for a specific process. ```bash # Login shell — shim already active, run directly: python your_script.py python -m vllm.entrypoints.openai.api_server --model /opt/models/glm-4.7-flash-hf # Non-login context — use the wrapper: greenboost run python your_script.py greenboost run ollama serve # Or set inline: GREENBOOST_ACTIVE=1 LD_PRELOAD=/usr/local/lib/libgreenboost_cuda.so python your_script.py # In a systemd unit: # [Service] # Environment="GREENBOOST_ACTIVE=1" # Environment="LD_PRELOAD=/usr/local/lib/libgreenboost_cuda.so" # In Docker — Path B (no kernel module) activates automatically: docker run --gpus all \ -e GREENBOOST_ACTIVE=1 \ -v /usr/local/lib/libgreenboost_cuda.so:/usr/local/lib/libgreenboost_cuda.so \ -e LD_PRELOAD=/usr/local/lib/libgreenboost_cuda.so \ my-llm-image python run_model.py # Debug — see per-allocation path decisions: GREENBOOST_DEBUG=1 ollama run glm-4.7-flash:q8_0 2>&1 | grep -E "KV|Phase|VRAM" # [GreenBoost] Path A0 (cudaImportExtMem) : enabled — cudaImportExternalMemory (best bandwidth) # [GreenBoost] Path A (DMA-BUF+kernel) : enabled — mmap+GB_IOCTL_PIN_USER_PTR+HostReg # [GreenBoost] Path B (HostReg/no-kmod) : available — mmap+cuMemHostRegister (containers/VMs) # [GreenBoost] Path C (UVM/managed) : available — cuMemAllocManaged+cuMemAdvise (last resort) # Disable shim for one process without unloading: GREENBOOST_DISABLE=1 python sensitive_script.py ``` -------------------------------- ### Install NVIDIA GPU Operator via Helm Source: https://context7.com/isolatedoctopi/greenboost/llms.txt Installs the NVIDIA GPU Operator using Helm, enabling essential GPU management features like drivers, CUDA toolkit, and DCGM exporter. Ensure Helm repositories are updated before installation. ```bash helm repo add nvidia https://helm.ngc.nvidia.com/nvidia && helm repo update helm install --wait --generate-name nvidia/gpu-operator \ --namespace gpu-operator --create-namespace \ --set driver.enabled=true --set toolkit.enabled=true \ --set dcgmExporter.enabled=true ``` -------------------------------- ### Verify Kubelet Plugin Directory Source: https://gitlab.com/isolatedoctopi/greenboost/-/blob/main/k8s-deployment/INSTALL_CLUSTER.md List the contents of the kubelet plugin directory to ensure the GreenBoost plugin is correctly placed. Replace `` with the target pod's name. ```bash # Verify kubelet plugin directory kubectl exec -it -- ls -la /var/lib/kubelet/plugins/ ``` -------------------------------- ### Install NVIDIA k8s-dra-driver-gpu Source: https://gitlab.com/isolatedoctopi/greenboost/-/blob/main/k8s-deployment/INSTALL_CLUSTER.md Installs the NVIDIA k8s-dra-driver-gpu using Helm. This driver is essential for GPU resource management in Kubernetes. ```bash # Clone k8s-dra-driver-gpu repository (if needed) git clone https://github.com/NVIDIA/k8s-dra-driver-gpu.git \ ~/Dev/greenboost_sources/k8s-dra-driver-gpu cd ~/Dev/greenboost_sources/k8s-dra-driver-gpu # Install via Helm helm install --wait --generate-name \ deployments/helm/nvidia-dra-driver-gpu \ --namespace nvidia-dra-system \ --create-namespace \ --set resources.computeDomains.enabled=true \ --set resources.gpus.enabled=true ``` -------------------------------- ### Helm Install GreenBoost DRA Driver Source: https://gitlab.com/isolatedoctopi/greenboost/-/blob/main/k8s-deployment/monitoring.md Install the GreenBoost DRA driver using Helm. Ensure you are in the correct directory. ```bash helm install greenboost-dra deployments/helm/greenboost-dra-driver ``` -------------------------------- ### Install NVIDIA GPU Operator Source: https://gitlab.com/isolatedoctopi/greenboost/-/blob/main/k8s-deployment/INSTALL_CLUSTER.md Installs the NVIDIA GPU Operator using Helm, ensuring DRA is enabled. This is a prerequisite for GreenBoost. ```bash # NVIDIA GPU Operator installation helm repo add nvidia https://helm.ngc.nvidia.com/nvidia helm repo update # Install GPU Operator with DRA enabled helm install --wait --generate-name \ nvidia/gpu-operator \ --namespace gpu-operator \ --create-namespace \ --set driver.enabled=true \ --set toolkit.enabled=true \ --set devicePlugin.enabled=true \ --set migStrategy=single \ --set dcgmExporter.enabled=true ``` -------------------------------- ### Manage GreenBoost Profiles Source: https://context7.com/isolatedoctopi/greenboost/llms.txt Commands for creating, listing, showing, diffing, and activating GreenBoost profiles. Use 'create' to auto-detect hardware and save to a default profile. Activate a profile to reload modules with new parameters. ```bash sudo greenboost profile create ``` ```bash greenboost profile list ``` ```bash greenboost profile show ``` ```bash greenboost profile diff ``` ```bash sudo greenboost profile activate /etc/greenboost/profiles/v100_cluster_node.md ``` -------------------------------- ### Install GreenBoost DRA Driver Source: https://gitlab.com/isolatedoctopi/greenboost/-/blob/main/k8s-deployment/INSTALL_CLUSTER.md Installs the GreenBoost DRA driver using its Helm chart. This step enables GreenBoost's advanced features like NVLink pooling and device class management. ```bash # Create greenboost namespace kubectl create namespace greenboost-system # Install GreenBoost DRA driver helm install --wait --generate-name \ deployments/helm/greenboost-dra-driver \ --namespace greenboost-system \ --values k8s-deployment/values-v100-cluster.yaml \ --set greenboost.enable=true \ --set greenboost.nvlinkPool=true \ --set deviceClass.enabled=true \ --set metricsExporter.enabled=true ``` -------------------------------- ### Run text-generation-inference (TGI) with GreenBoost Source: https://gitlab.com/isolatedoctopi/greenboost/-/blob/main/DOCUMENTATION.md TGI uses a PyTorch backend. Use 'greenboost run' for command-line execution or configure the systemd service with the necessary environment variables. ```bash greenboost run text-generation-launcher \ --model-id /opt/models/glm-4.7-flash-hf \ --num-shard 1 \ --max-total-tokens 131072 ``` ```ini [Service] Environment="LD_PRELOAD=/usr/local/lib/libgreenboost_cuda.so" Environment="GREENBOOST_ACTIVE=1" ``` -------------------------------- ### Download and Run Transformers Model with GreenBoost Source: https://gitlab.com/isolatedoctopi/greenboost/-/blob/main/DOCUMENTATION.md Download a model from Hugging Face Hub using `snapshot_download` and then load and run it within a single script executed by GreenBoost. ```bash greenboost run python - <<'EOF' from huggingface_hub import snapshot_download from transformers import AutoTokenizer, AutoModelForCausalLM import torch snapshot_download("THUDM/glm-4.7-flash-hf", local_dir="/opt/models/glm-4.7-flash-hf") model = AutoModelForCausalLM.from_pretrained( "/opt/models/glm-4.7-flash-hf", torch_dtype=torch.bfloat16, device_map="auto" ) tokenizer = AutoTokenizer.from_pretrained("/opt/models/glm-4.7-flash-hf") ids = tokenizer("Hello!", return_tensors="pt").input_ids.to(model.device) print(tokenizer.decode(model.generate(ids, max_new_tokens=100)[0])) EOF ``` -------------------------------- ### Activate GREENBOOST in Interactive Terminal Source: https://gitlab.com/isolatedoctopi/greenboost/-/blob/main/DOCUMENTATION.md Set GREENBOOST_ACTIVE=1 in an interactive terminal to enable GREENBOOST. This example uses Python to report VRAM. ```shell python -c " import torch print('VRAM reported:', round(torch.cuda.get_device_properties(0).total_memory / 1e9, 1), 'GB') " ``` -------------------------------- ### Update KV Reserve via greenboost_setup.sh Source: https://gitlab.com/isolatedoctopi/greenboost/-/blob/main/DOCUMENTATION.md Update the KV reserve at runtime to 4096 MB using the greenboost_setup.sh script. This script reads the current configuration and patches it accordingly. ```bash # Via greenboost_setup.sh (reads current then patches) sudo ./greenboost_setup.sh tune-kv-reserve 4096 ``` -------------------------------- ### View Live Dashboard with GreenBoost Vulkan Source: https://gitlab.com/isolatedoctopi/greenboost/-/blob/main/greenboost_proton/architecture.md Use this command to access the live dashboard, which displays device, DX12 game, T2 stats, and any identified issues. ```bash greenboost vulkan # live dashboard: device, DX12 game, T2 stats, issues ``` -------------------------------- ### GreenBoost File Layout Source: https://gitlab.com/isolatedoctopi/greenboost/-/blob/main/greenboost_proton/architecture.md The directory structure of the greenboost-proton-wayland project, showing key components like the patched orchestrator, installer scripts, and bundled Proton Experimental binaries. ```plaintext greenboost-proton-wayland/ ├── proton Python 3 orchestrator (patched Proton Experimental) ├── install.sh Steam compat tool installer ├── compatibilitytool.vdf Steam registration ├── toolmanifest.vdf Steam invocation spec ├── version Version string ├── files/ (Wine+VKD3D+DXVK) ├── protonfixes/ ├── filelock.py ├── architecture.md └── documentation.md ``` -------------------------------- ### Run Inference with GreenBoost Source: https://gitlab.com/isolatedoctopi/greenboost/-/blob/main/DOCUMENTATION.md Activate GreenBoost and preload the library to run inference scripts. This is a general command for enabling GreenBoost. ```bash GREENBOOST_ACTIVE=1 LD_PRELOAD=/usr/local/lib/libgreenboost_cuda.so \ python inference.py ``` -------------------------------- ### Pin KV Cache to T1 with Transformers Source: https://gitlab.com/isolatedoctopi/greenboost/-/blob/main/DOCUMENTATION.md Configure environment variables to reserve memory for KV cache and ensure it lands in T1. This setup is for Hugging Face Transformers. ```python import os os.environ["GREENBOOST_KV_RESERVE_MB"] = "4096" # 4 GB reserved for KV os.environ["GREENBOOST_KV_OVERFLOW"] = "0" # use phase detector (default) from transformers import AutoModelForCausalLM, AutoTokenizer import torch model = AutoModelForCausalLM.from_pretrained( "/opt/models/glm-4.7-flash-hf", torch_dtype=torch.bfloat16, device_map="auto", # sees T1+T2 total — all layers on GPU attn_implementation="flash_attention_2", # requires OLLAMA_FLASH_ATTENTION=1 equivalent ) tokenizer = AutoTokenizer.from_pretrained("/opt/models/glm-4.7-flash-hf") # All tokens below will have KV cache allocated in T1 (phase detector: INFERENCE) out = model.generate( tokenizer("Hello!", return_tensors="pt").input_ids.cuda(), max_new_tokens=512, use_cache=True, # ensure KV cache is used ) print(tokenizer.decode(out[0], skip_special_tokens=True)) ``` -------------------------------- ### Run Ollama with GreenBoost Source: https://gitlab.com/isolatedoctopi/greenboost/-/blob/main/DOCUMENTATION.md Ollama is handled automatically by 'install-sys-configs'. If running Ollama outside systemd, use the 'greenboost run' wrapper. ```bash ollama run glm-4.7-flash:q8_0 # GreenBoost is transparent greenboost run ollama serve ``` -------------------------------- ### WSL2 Integration with GreenBoost (Path B) Source: https://context7.com/isolatedoctopi/greenboost/llms.txt Configure GreenBoost for WSL2 by setting environment variables in user or system-wide configuration files. This ensures Path B works natively by exposing the GPU via /dev/dxg. ```bash export GREENBOOST_ACTIVE=1 export LD_PRELOAD=/usr/local/lib/libgreenboost_cuda.so python your_script.py ``` -------------------------------- ### Increase KV Reserve for vLLM Source: https://gitlab.com/isolatedoctopi/greenboost/-/blob/main/DOCUMENTATION.md Set a large KV reserve and enable overflow for GreenBoost before launching vLLM. This ensures vLLM's pre-allocated KV cache gets priority in T1. ```bash # Set 8 GB reserve before starting vLLM (131K context needs ~7–9 GB KV) GREENBOOST_KV_RESERVE_MB=8192 \ GREENBOOST_KV_OVERFLOW=1 \ GREENBOOST_ACTIVE=1 LD_PRELOAD=/usr/local/lib/libgreenboost_cuda.so \ python -m vllm.entrypoints.openai.api_server \ --model /opt/models/glm-4.7-flash-hf \ --max-model-len 131072 ``` -------------------------------- ### Deploy Test LLM Workload and Monitor Logs Source: https://context7.com/isolatedoctopi/greenboost/llms.txt Applies a sample LLM workload to the Kubernetes cluster and streams its logs. This helps verify the GreenBoost deployment and monitor the inference process. ```bash kubectl apply -f k8s-examples/greenboost-llm-pod.yaml kubectl logs -f llm-inference -n greenboost-llm -c ollama ``` -------------------------------- ### Run LLM in Docker with GreenBoost Path B Source: https://gitlab.com/isolatedoctopi/greenboost/-/blob/main/CONTAINER_VM_MODE.md This command demonstrates how to run a Docker container with GreenBoost enabled. It preloads the GreenBoost CUDA library and sets the necessary environment variable. Ensure the GreenBoost library is mounted into the container. ```bash docker run --gpus all \ -e GREENBOOST_ACTIVE=1 \ -v /usr/local/lib/libgreenboost_cuda.so:/usr/local/lib/libgreenboost_cuda.so \ -e LD_PRELOAD=/usr/local/lib/libgreenboost_cuda.so \ my-llm-image python run_model.py ``` -------------------------------- ### Run vLLM with GreenBoost Source: https://gitlab.com/isolatedoctopi/greenboost/-/blob/main/DOCUMENTATION.md vLLM loads libcuda lazily. Use 'greenboost run' for command-line execution or configure the systemd service with the necessary environment variables. ```bash greenboost run python -m vllm.entrypoints.openai.api_server \ --model /opt/models/glm-4.7-flash-hf \ --dtype float16 \ --max-model-len 131072 \ --gpu-memory-utilization 0.95 ``` ```ini [Service] Environment="LD_PRELOAD=/usr/local/lib/libgreenboost_cuda.so" Environment="GREENBOOST_ACTIVE=1" ``` -------------------------------- ### TF with XLA and GreenBoost Configuration Source: https://gitlab.com/isolatedoctopi/greenboost/-/blob/main/DOCUMENTATION.md Run XLA training scripts with GreenBoost, disabling the phase detection and KV overflow heuristics. This prevents XLA's scratch buffer pre-allocations from confusing the phase detector. ```bash GREENBOOST_PHASE_DETECT=0 GREENBOOST_KV_OVERFLOW=0 \ GREENBOOST_ACTIVE=1 LD_PRELOAD=/usr/local/lib/libgreenboost_cuda.so \ python xla_training.py ``` -------------------------------- ### TensorFlow Memory Growth Configuration Source: https://gitlab.com/isolatedoctopi/greenboost/-/blob/main/DOCUMENTATION.md Configure TensorFlow to use memory growth instead of pre-allocating all GPU memory on startup. This allows GreenBoost to manage allocations more effectively. Note that LD_PRELOAD must be set before Python starts. ```python import os os.environ["GREENBOOST_ACTIVE"] = "1" # LD_PRELOAD must be set before Python starts, not here import tensorflow as tf # Prevent TF from trying to allocate the entire T1+T2 virtual pool at once gpus = tf.config.list_physical_devices("GPU") for gpu in gpus: tf.config.experimental.set_memory_growth(gpu, True) # From this point forward cudaMalloc calls are GreenBoost-managed: # weights → T1, KV-like buffers → T1-priority (if phase heuristic fires) model = tf.keras.models.load_model("/opt/models/my_model.keras") ``` -------------------------------- ### GreenBoost Profile Storage Structure Source: https://gitlab.com/isolatedoctopi/greenboost/-/blob/main/profiles/README.md Illustrates the directory structure for GreenBoost profiles, including default and resolved profile files. ```bash /etc/greenboost/ ├── profiles/ │ ├── default.md # auto-generated on install │ └── resolved_.md # written on --profile conflict resolution └── active_profile.md # symlink → profiles/.md ``` -------------------------------- ### Activate GreenBoost in Non-Login Contexts Source: https://gitlab.com/isolatedoctopi/greenboost/-/blob/main/DOCUMENTATION.md For non-login contexts like cron jobs or Docker entrypoints, use the 'greenboost run' wrapper or set GREENBOOST_ACTIVE=1 and LD_PRELOAD explicitly. This is also how you configure it in systemd units or Docker environments. ```bash # Wrapper — sets GREENBOOST_ACTIVE=1 for one command greenboost run python your_script.py # Or inline GREENBOOST_ACTIVE=1 LD_PRELOAD=/usr/local/lib/libgreenboost_cuda.so python your_script.py # Or in a systemd unit / Docker environment Environment="GREENBOOST_ACTIVE=1" Environment="LD_PRELOAD=/usr/local/lib/libgreenboost_cuda.so" ``` -------------------------------- ### Install GreenBoost DRA Driver with Helm Source: https://context7.com/isolatedoctopi/greenboost/llms.txt Deploys the GreenBoost DRA driver to a Kubernetes cluster using Helm. This configuration enables GreenBoost features, NVLink pool aggregation, and metrics export. Specify the appropriate values file for cluster sizing. ```bash helm install --wait --generate-name deployments/helm/greenboost-dra-driver \ --namespace greenboost-system \ --values deployments/helm/greenboost-dra-driver/values-v100-cluster.yaml \ --set greenboost.enable=true \ --set greenboost.nvlinkPool=true \ --set metricsExporter.enabled=true ``` -------------------------------- ### Greenboost Core Commands Source: https://context7.com/isolatedoctopi/greenboost/llms.txt Basic commands for checking status, cleaning memory, and loading/unloading the kernel module. ```bash greenboost status ``` ```bash greenboost clean memory ``` ```bash sudo greenboost load ``` ```bash sudo greenboost unload ``` ```bash greenboost run [args...] ``` -------------------------------- ### Text Generation Inference (TGI) with GreenBoost Source: https://context7.com/isolatedoctopi/greenboost/llms.txt Launch the text-generation-launcher for TGI with GreenBoost. This command specifies the model and shard count, leveraging GreenBoost for memory management during inference. ```bash greenboost run text-generation-launcher \ --model-id /opt/models/glm-4.7-flash-hf \ --num-shard 1 \ --max-total-tokens 131072 ``` -------------------------------- ### Running TensorFlow Scripts with GreenBoost Source: https://gitlab.com/isolatedoctopi/greenboost/-/blob/main/DOCUMENTATION.md Launch TensorFlow scripts with GreenBoost enabled by setting environment variables. Ensure LD_PRELOAD is correctly set before script execution. ```bash GREENBOOST_ACTIVE=1 LD_PRELOAD=/usr/local/lib/libgreenboost_cuda.so \ python your_tf_script.py ``` -------------------------------- ### Check GreenBoost Kernel Module Parameters Source: https://gitlab.com/isolatedoctopi/greenboost/-/blob/main/k8s-deployment/INSTALL_CLUSTER.md Inspect the parameters of the loaded GreenBoost kernel module using `modinfo`. Replace `` with the relevant pod name. ```bash # Check module parameters kubectl exec -it -- modinfo greenboost ``` -------------------------------- ### ExLlamaV3 Native GB_ALLOC_* Usage Pattern Source: https://gitlab.com/isolatedoctopi/greenboost/-/blob/main/DOCUMENTATION.md This C code demonstrates the typical usage pattern for allocating memory with specific flags using the `GB_IOCTL_ALLOC` IOCTL in ExLlamaV3. It shows how to set up a `gb_alloc_req` structure with desired size and flags, and how to handle the returned handle upon successful allocation. ```c struct gb_alloc_req req = { .size_bytes = kv_size, .flags = GB_ALLOC_KV_CACHE | GB_ALLOC_T1_PRIORITY, .tier_hint = 1, /* prefer T1 */ }; if (ioctl(gb_fd, GB_IOCTL_ALLOC, &req) == 0) kv_handle = req.handle; ``` -------------------------------- ### GreenBoost Profile Management CLI Commands Source: https://gitlab.com/isolatedoctopi/greenboost/-/blob/main/profiles/README.md Provides essential bash commands for managing GreenBoost profiles, such as creating, showing, listing, activating, and diffing profiles. ```bash # Profile management sudo ./greenboost_setup.sh profile create # auto-detect hardware → write default.md sudo ./greenboost_setup.sh profile show # print active profile sudo ./greenboost_setup.sh profile show # print specific file sudo ./greenboost_setup.sh profile list # list available profiles sudo ./greenboost_setup.sh profile activate # set active_profile.md symlink sudo ./greenboost_setup.sh profile diff [file] # compare profile vs live hardware # Load with a user-supplied profile (any command accepts --profile) sudo ./greenboost_setup.sh --profile ~/my_profile.md load sudo ./greenboost_setup.sh --profile ~/my_profile.md full-install ``` -------------------------------- ### Build GreenBoost Kernel Module Source: https://gitlab.com/isolatedoctopi/greenboost/-/blob/main/k8s-deployment/INSTALL_CLUSTER.md Prepares and builds the GreenBoost kernel module on all cluster nodes using a DaemonSet. This involves creating a ConfigMap with module source files and applying a DaemonSet definition. ```bash # Deploy container image with build tools kubectl create configmap greenboost-module \ --from-file=greenboost.c \ --from-file=greenboost_ioctl.h # Create DaemonSet to build module kubectl apply -f k8s-deployment/greenboost-module-builder.yaml ``` -------------------------------- ### Verify GreenBoost Activation Source: https://gitlab.com/isolatedoctopi/greenboost/-/blob/main/DOCUMENTATION.md These bash commands help verify if GreenBoost is active and functioning correctly. Checking the `/sys/class/greenboost/greenboost/status` file after loading a large model can confirm if the T2 pool is being utilized. The second command is a placeholder for confirming virtual VRAM visibility. ```bash # Should show T2 pool in use (non-zero) after loading a model larger than 12 GB: cat /sys/class/greenboost/greenboost/status ``` ```bash # Confirm virtual VRAM is visible (should report T1+T2 total, not just physical VRAM): ``` -------------------------------- ### Vulkan Shader Boost Daemon Actions Source: https://gitlab.com/isolatedoctopi/greenboost/-/blob/main/ARCHITECTURE.md Details the actions performed by the `greenboost-shader-boost.service` daemon to optimize `fossilize_replay` workers, including renicing, I/O priority adjustment, and CPU core pinning. ```text Action | Effect |--------|--------| | `renice -5` | Elevates above all nice-0 background tasks | | `ionice -c2 -n0` | Best-effort I/O at highest priority | | `taskset 0-` | Pins to P-cores (auto-detected from sysfs `core_type`) ``` -------------------------------- ### ExLlamaV3 Integration with GreenBoost Source: https://context7.com/isolatedoctopi/greenboost/llms.txt Run ExLlamaV3 scripts with GreenBoost's KV cache overflow support. Set the `GREENBOOST_KV_OVERFLOW=1` environment variable to enable allocation priority for T1. ```bash GREENBOOST_KV_OVERFLOW=1 greenboost run python your_exllama_script.py ``` -------------------------------- ### Greenboost Diagnostics Commands Source: https://context7.com/isolatedoctopi/greenboost/llms.txt Commands for running benchmarks, collecting logs, and clearing log data. ```bash greenboost benchmark ``` ```bash greenboost logs ``` ```bash greenboost inference-logs ``` ```bash greenboost proton-logs ``` ```bash sudo greenboost clear logs ``` ```bash sudo greenboost clear inference-logs ``` -------------------------------- ### Inspect ServiceMonitor Configuration Source: https://gitlab.com/isolatedoctopi/greenboost/-/blob/main/k8s-deployment/monitoring.md Examine the YAML configuration of the 'greenboost-metrics' ServiceMonitor in the 'monitoring' namespace. ```bash kubectl get servicemonitor greenboost-metrics -n monitoring -o yaml ``` -------------------------------- ### Deploy GreenBoost Metrics Exporter Source: https://gitlab.com/isolatedoctopi/greenboost/-/blob/main/k8s-deployment/monitoring.md Apply the Kubernetes configuration for the GreenBoost metrics exporter. ```bash kubectl apply -f deployments/helm/greenboost-dra-driver/exporter/ ``` -------------------------------- ### vLLM Integration with GreenBoost Source: https://context7.com/isolatedoctopi/greenboost/llms.txt Launch a vLLM OpenAI-compatible API server with GreenBoost. The `--gpu-memory-utilization` flag queries T1 free space via GreenBoost, allowing efficient memory management. ```bash greenboost run python -m vllm.entrypoints.openai.api_server \ --model /opt/models/glm-4.7-flash-hf \ --max-model-len 131072 \ --gpu-memory-utilization 0.95 ``` -------------------------------- ### Optimal Loading Order for PyTorch with GreenBoost Source: https://gitlab.com/isolatedoctopi/greenboost/-/blob/main/DOCUMENTATION.md Load model weights first, then allocate KV cache. This order ensures GreenBoost's phase detector correctly identifies the phases and maximizes T1 utilization. ```python import torch # 1. Load model weights (phase = MODEL_LOAD — reserve inactive, weights fill T1) model = MyModel().cuda() model.load_state_dict(torch.load("model.pt", map_location="cuda")) # 2. Allocate KV cache (phase = INFERENCE — reserve activates, KV lands in T1) kv_cache = torch.zeros(batch, n_layers, seq_len, d_model, device="cuda") # ↑ GreenBoost auto-classifies this as KV via quiet-gap heuristic ``` -------------------------------- ### Ollama Integration with GreenBoost Source: https://context7.com/isolatedoctopi/greenboost/llms.txt Run Ollama models with GreenBoost. It's handled automatically in login shells. For non-login shells or Docker, use `greenboost run ollama serve`. Ollama 0.18+ intercepts the VMM path automatically. ```bash ollama run glm-4.7-flash:q8_0 ``` ```bash greenboost run ollama serve ``` -------------------------------- ### Greenboost Profile Management Commands Source: https://context7.com/isolatedoctopi/greenboost/llms.txt Commands for managing Greenboost profiles, including creation, listing, showing, activation, and diffing. ```bash greenboost profile ``` ```bash sudo greenboost profile create ``` ```bash greenboost profile list ``` ```bash greenboost profile show ``` ```bash greenboost profile show ``` ```bash sudo greenboost profile activate ``` ```bash greenboost profile diff [file] ``` -------------------------------- ### Check NVLink Fabric State Source: https://gitlab.com/isolatedoctopi/greenboost/-/blob/main/k8s-deployment/INSTALL_CLUSTER.md Read the `nvlink_ready` file to determine if the NVLink fabric is active and ready for use by GreenBoost. Replace `` with the relevant pod name. ```bash # Check NVLink fabric state kubectl exec -it -- cat /sys/class/greenboost/greenboost/nvlink_ready ``` -------------------------------- ### Run ExLlamaV3 Script with GreenBoost venv Source: https://gitlab.com/isolatedoctopi/greenboost/-/blob/main/DOCUMENTATION.md Execute an ExLlamaV3 script using the GreenBoost virtual environment. This ensures GreenBoost's CUDA library is preloaded for ExLlamaV3. ```bash GREENBOOST_ACTIVE=1 LD_PRELOAD=/usr/local/lib/libgreenboost_cuda.so \ /opt/greenboost/venv/bin/python your_exllama_script.py ``` -------------------------------- ### Create ServiceMonitor for Metrics Source: https://gitlab.com/isolatedoctopi/greenboost/-/blob/main/k8s-deployment/monitoring.md Apply the ServiceMonitor configuration to enable Prometheus to discover and scrape GreenBoost metrics. ```bash kubectl apply -f k8s-deployment/servicemonitor.yaml ``` -------------------------------- ### Manage Session Priority with Python fcntl Source: https://context7.com/isolatedoctopi/greenboost/llms.txt Use Python's fcntl module to interact with the Greenboost IOCTL for session management. This allows promoting a session's T2 buffers to the LRU head (active) or demoting them to the LRU tail (idle), influencing eviction priority in multi-model deployments. Ensure the '/dev/greenboost' device exists. ```python import fcntl, struct, os _IOC_SESSION_IDLE = (1 << 30) | (ord('G') << 8) | 18 | (8 << 16) _IOC_SESSION_ACTIVE = (1 << 30) | (ord('G') << 8) | 19 | (8 << 16) def _gb_session_ioctl(cmd: int) -> None: dev = "/dev/greenboost" if not os.path.exists(dev): return buf = struct.pack("II", 0, 0) # pid=0 → caller's PID, reserved=0 fd = os.open(dev, os.O_RDWR) try: fcntl.ioctl(fd, cmd, buf) finally: os.close(fd) def gb_session_idle() -> None: _gb_session_ioctl(_IOC_SESSION_IDLE) def gb_session_active()-> None: _gb_session_ioctl(_IOC_SESSION_ACTIVE) # ── Usage in a multi-model server ──────────────────────────────────────────── # When a request arrives for model A: gb_session_active() # pin model A's T2 weights to LRU head (last evicted) # Process request ... response = model_a.generate(prompt) # When model A goes idle (no pending requests): gb_session_idle() # move model A's T2 weights to LRU tail (first evicted) # Model B (active) will keep its T2 weights; model A's become eviction candidates ``` -------------------------------- ### Optimize Kubelet Plugin Resources Source: https://gitlab.com/isolatedoctopi/greenboost/-/blob/main/k8s-deployment/INSTALL_CLUSTER.md Configure resource requests and limits for the kubelet plugin in `values.yaml` to optimize its performance and stability. ```yaml kubeletPlugin: resources: requests: cpu: "100m" memory: "256Mi" limits: cpu: "500m" memory: "512Mi" ``` -------------------------------- ### Run PyTorch Script with GreenBoost Source: https://gitlab.com/isolatedoctopi/greenboost/-/blob/main/DOCUMENTATION.md Execute a generic Python script that uses PyTorch with GreenBoost. This can be done via the `greenboost run` command or by manually setting environment variables. ```bash greenboost run python your_script.py ``` ```bash GREENBOOST_ACTIVE=1 LD_PRELOAD=/usr/local/lib/libgreenboost_cuda.so python your_script.py ``` -------------------------------- ### Load Transformers Model with device_map='auto' Source: https://gitlab.com/isolatedoctopi/greenboost/-/blob/main/DOCUMENTATION.md Load a Hugging Face Transformers model using `device_map="auto"`. GreenBoost reports the total VRAM (T1+T2), allowing the entire model to be loaded onto the GPU. ```python from transformers import AutoTokenizer, AutoModelForCausalLM import torch model = AutoModelForCausalLM.from_pretrained( "/opt/models/glm-4.7-flash-hf", torch_dtype=torch.bfloat16, device_map="auto", # sees T1+T2 total — loads entire model onto T1+T2 ) tokenizer = AutoTokenizer.from_pretrained("/opt/models/glm-4.7-flash-hf") messages = [{"role": "user", "content": "Hello!"}] input_ids = tokenizer.apply_chat_template( messages, tokenize=True, return_tensors="pt", add_generation_prompt=True ).to(model.device) output = model.generate(input_ids, max_new_tokens=300) print(tokenizer.decode(output[0], skip_special_tokens=True)) ```