### Install and Install Pre-commit Hooks Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/docs/source/contributing.mdx Install pre-commit for managing Git hooks and then install the hooks for the project. These hooks run automatically on commit. ```bash pip install pre-commit pre-commit install ``` -------------------------------- ### Install Required Libraries Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/docs/source/fsdp_qlora.md Ensure you have the latest versions of bitsandbytes, accelerate, transformers, peft, and trl installed for FSDP-QLoRA training. ```bash pip install -U bitsandbytes accelerate transformers peft trl ``` -------------------------------- ### Install bitsandbytes Package Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/COMPILE_H100_L40.md Install the compiled bitsandbytes package. Use the -e flag for an editable/development install. ```bash pip install -e . ``` -------------------------------- ### Install Preview Wheel (Windows x86-64) Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/docs/source/installation.mdx Install the latest preview wheel for Windows x86-64. Use `--no-deps` to avoid reinstalling dependencies if they are already met. ```bash # Note: if you don't want to reinstall our dependencies, append the `--no-deps` flag! pip install --force-reinstall https://github.com/bitsandbytes-foundation/bitsandbytes/releases/download/continuous-release_main/bitsandbytes-1.33.7.preview-py3-none-win_amd64.whl ``` -------------------------------- ### Install Build Tools on Ubuntu Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/docs/source/installation.mdx Installs essential build tools like a compiler and CMake on Ubuntu systems. ```bash apt-get install -y build-essential cmake ``` -------------------------------- ### Install Preview Wheel (macOS ARM64) Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/docs/source/installation.mdx Install the latest preview wheel for macOS ARM64. Use `--no-deps` to avoid reinstalling dependencies if they are already met. ```bash # Note: if you don't want to reinstall our dependencies, append the `--no-deps` flag! pip install --force-reinstall https://github.com/bitsandbytes-foundation/bitsandbytes/releases/download/continuous-release_main/bitsandbytes-1.33.7.preview-py3-none-macosx_14_0_arm64.whl ``` -------------------------------- ### Install bitsandbytes Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/docs/source/quickstart.mdx Install the bitsandbytes library using pip. Requires Python 3.10+ and PyTorch 2.4+. ```bash pip install bitsandbytes ``` -------------------------------- ### Install Preview Wheel (Linux x86_64) Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/docs/source/installation.mdx Install the latest preview wheel for Linux x86_64. Use `--no-deps` to avoid reinstalling dependencies if they are already met. ```bash # Note: if you don't want to reinstall our dependencies, append the `--no-deps` flag! # x86_64 (most users) pip install --force-reinstall https://github.com/bitsandbytes-foundation/bitsandbytes/releases/download/continuous-release_main/bitsandbytes-1.33.7.preview-py3-none-manylinux_2_24_x86_64.whl ``` -------------------------------- ### Install Preview Wheel (Linux ARM/aarch64) Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/docs/source/installation.mdx Install the latest preview wheel for Linux ARM/aarch64. Use `--no-deps` to avoid reinstalling dependencies if they are already met. ```bash # Note: if you don't want to reinstall our dependencies, append the `--no-deps` flag! # ARM/aarch64 pip install --force-reinstall https://github.com/bitsandbytes-foundation/bitsandbytes/releases/download/continuous-release_main/bitsandbytes-1.33.7.preview-py3-none-manylinux_2_24_aarch64.whl ``` -------------------------------- ### Install Preview Wheel (Windows ARM64) Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/docs/source/installation.mdx Install the latest preview wheel for Windows ARM64. Requires Python 3.12 or newer. Use `--no-deps` to avoid reinstalling dependencies if they are already met. ```bash # Note: if you don't want to reinstall our dependencies, append the `--no-deps` flag! # Requires Python >= 3.12 pip install --force-reinstall https://github.com/bitsandbytes-foundation/bitsandbytes/releases/download/continuous-release_main/bitsandbytes-1.33.7.preview-py3-none-win_arm64.whl ``` -------------------------------- ### Install CPU-only bitsandbytes from Source (Linux/macOS) Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/docs/source/installation.mdx Clone the repository and install the package in editable mode for CPU-only builds. This is the standard method for Linux and macOS. ```bash git clone https://github.com/bitsandbytes-foundation/bitsandbytes.git && cd bitsandbytes/ pip install -e . ``` -------------------------------- ### Install Test Dependencies Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/agents/testing_guide.md Install the Python packages required for running tests. ```bash pip install einops lion-pytorch pytest pytest-xdist scipy transformers ``` -------------------------------- ### Verify bitsandbytes Installation Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/COMPILE_H100_L40.md Import the bitsandbytes library in Python to confirm a successful installation and check the version. ```python import bitsandbytes as bnb print(f'bitsandbytes version: {bnb.__version__}') print('Success!') ``` -------------------------------- ### Setup Worktree Instructions Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/agents/dispatch_guide.md Provides essential commands for setting up a worktree for issue fixing. It's crucial to use a worktree and not modify the main repository directly. ```markdown ## Setup IMPORTANT: You MUST create a worktree. Do NOT work in ~/git/bitsandbytes directly. cd ~/git/bitsandbytes git worktree add ~/git/bnb-fix- -b fix/issue- cd ~/git/bnb-fix- Read agents/testing_guide.md for build and test instructions. Build the project before making changes so you can verify your setup works. ``` -------------------------------- ### QLoRA Fine-tuning Setup Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/docs/source/quickstart.mdx Prepare a model for k-bit training using QLoRA by combining 4-bit quantization with LoRA adapters. Requires 'transformers' and 'peft' libraries. ```python from transformers import AutoModelForCausalLM, BitsAndBytesConfig from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training # Load 4-bit model bnb_config = BitsAndBytesConfig(load_in_4bit=True) model = AutoModelForCausalLM.from_pretrained( "meta-llama/Llama-2-7b-hf", quantization_config=bnb_config, ) # Prepare for training model = prepare_model_for_kbit_training(model) # Add LoRA adapters lora_config = LoraConfig( r=16, lora_alpha=32, target_modules=["q_proj", "v_proj"], task_type="CAUSAL_LM", ) model = get_peft_model(model, lora_config) # Now train with your preferred trainer ``` -------------------------------- ### Install ROCm SDK wheels for Windows Compilation Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/docs/source/installation.mdx Install necessary ROCm SDK wheels and tools for compiling bitsandbytes from source on Windows. This method uses pip-installable wheels instead of a system-wide ROCm installation. ```bash # Install ROCm SDK wheels (adjust version as needed) pip install ninja cmake pip install \ https://repo.radeon.com/rocm/windows/rocm-rel-7.2.1/rocm_sdk_core-7.2.1-py3-none-win_amd64.whl \ https://repo.radeon.com/rocm/windows/rocm-rel-7.2.1/rocm_sdk_devel-7.2.1-py3-none-win_amd64.whl \ https://repo.radeon.com/rocm/windows/rocm-rel-7.2.1/rocm_sdk_libraries_custom-7.2.1-py3-none-win_amd64.whl \ https://repo.radeon.com/rocm/windows/rocm-rel-7.2.1/rocm-7.2.1.tar.gz # Expand the devel tarball rocm-sdk init ``` -------------------------------- ### Start Issue Triage Session Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/agents/issue_triage_workflow.md Commands to navigate to the project directory and start the Claude agent for issue triage. Use this to initiate a session with specific instructions. ```bash cd ~/git/bitsandbytes claude ``` -------------------------------- ### Dispatching to Ops using torch.ops Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/agents/code_standards.md Example of dispatching to the bitsandbytes ops namespace using torch.ops for operations like quantize_4bit. ```python # GOOD: use torch.ops for dispatch _out, _absmax = torch.ops.bitsandbytes.quantize_4bit.default(A, blocksize, quant_type, quant_storage) ``` -------------------------------- ### Setup Working Environment Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/agents/dispatch_guide.md Commands to set up a new Git worktree for a specific issue branch. This isolates your changes and allows for a clean development environment. ```bash cd ~/git/bitsandbytes git worktree add ~/git/bnb-fix-1810 -b fix/issue-1810 cd ~/git/bnb-fix-1810 ``` -------------------------------- ### 4-bit Linear Module Example Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/agents/code_standards.md Demonstrates the usage of the `Linear4bit` module, including how to move it to the CUDA device for quantization. This class is based on the QLoRA paper. ```python class Linear4bit(nn.Linear): """ This class is the base module for the 4-bit quantization algorithm presented in [QLoRA](https://arxiv.org/abs/2305.14314). Example: ```python import bitsandbytes as bnb linear_q = bnb.nn.Linear4bit(64, 64) linear_q = linear_q.to("cuda") # Quantization happens here ``` """ ``` -------------------------------- ### Build System Dependencies Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/agents/security_guide.md Example of specifying build-time dependencies in pyproject.toml. These also require scrutiny for existence and legitimacy. ```toml # setup.py / pyproject.toml install hooks [build-system] requires = ["setuptools", "new-build-tool"] # Build-time dependencies too ``` -------------------------------- ### Output Launch Commands for Issues Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/agents/dispatch_guide.md Provides example commands to launch worker agents for specific issues. Each command includes a reference to a prompt file containing detailed instructions for the agent. ```bash ## Launch Commands Issue #1810 — LARS missing in str2optimizer32bit: claude "Please read /tmp/bnb-agents/issue-1810.md and follow the instructions." Issue #919 — Noisy logs: claude "Please read /tmp/bnb-agents/issue-919.md and follow the instructions." ``` -------------------------------- ### Verify System Dependencies Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/COMPILE_H100_L40.md Check if CMake, Python, GCC, and CUDA Toolkit are installed and meet the minimum version requirements. ```bash cmake --version python3 --version gcc --version nvcc --version ``` -------------------------------- ### Example of Downstream Breakage Report Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/agents/pr_review_guide.md This example shows how to report critical breakages in downstream projects due to API changes. It lists affected projects and the specific components that will break. ```text CRITICAL: Removing Params4bit.quant_state attribute - Transformers: dequantize_bnb_weight() accesses weight.quant_state -> breaks - PEFT: all 4-bit merge/unmerge operations access weight.quant_state -> breaks - Accelerate: set_module_tensor_to_device() checks getattr(weight, "quant_state") -> breaks - TGI: Linear4bit.forward() accesses self.weight.quant_state -> breaks ``` -------------------------------- ### Compile bitsandbytes from Source on Linux Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/docs/source/installation.mdx Clones the bitsandbytes repository, configures the build with CMake for CUDA, compiles the library, and installs it in editable mode. ```bash git clone https://github.com/bitsandbytes-foundation/bitsandbytes.git && cd bitsandbytes/ cmake -DCOMPUTE_BACKEND=cuda -S . make pip install -e . # `-e` for "editable" install, when developing BNB (otherwise leave that out) ``` -------------------------------- ### QuantState Serialization Keys Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/agents/architecture_guide.md Example keys used when storing QuantState components in a state dictionary for checkpointing. ```plaintext weight.quant_state.bitsandbytes__nf4 weight.absmax weight.quant_map weight.nested_absmax weight.nested_quant_map weight.quant_state.nested_blocksize weight.quant_state.nested_dtype weight.quant_state.nested_offset ``` -------------------------------- ### Compile bitsandbytes from Source on Windows Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/docs/source/installation.mdx Clones the bitsandbytes repository, configures the build with CMake for CUDA, compiles the library in Release mode, and installs it in editable mode. ```bash git clone https://github.com/bitsandbytes-foundation/bitsandbytes.git && cd bitsandbytes/ cmake -DCOMPUTE_BACKEND=cuda -S . cmake --build . --config Release pip install -e . # `-e` for "editable" install, when developing BNB (otherwise leave that out) ``` -------------------------------- ### Python Type Annotation Example Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/agents/code_standards.md Provides a good example of Python type annotations, adhering to project conventions such as using Optional[X] instead of X | None and using typing.Optional and collections.abc.Sequence. ```python # GOOD: matches the conventions used throughout def quantize_4bit( A: torch.Tensor, absmax: Optional[torch.Tensor] = None, out: Optional[torch.Tensor] = None, blocksize=None, # no annotation for simple defaults is OK compress_statistics=False, quant_type="fp4", quant_storage=torch.uint8, ) -> tuple[torch.Tensor, QuantState]: ``` -------------------------------- ### Quantization Codebook Creation Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/agents/architecture_guide.md Provides examples of creating pre-computed quantization codebooks for different bit precisions and configurations. These are essential for quantization operations. ```python # Pre-computed quantization maps: create_dynamic_map(signed=True, total_bits=8) # Creates 256-entry dynamic quantization codebook create_normal_map(offset=0.9677083, symmetric=False) # NF4 codebook from normal distribution create_fp4_map() # FP4 codebook ``` -------------------------------- ### Loading External Backends via Entrypoints Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/agents/security_guide.md Demonstrates how bitsandbytes loads external backends using Python entrypoints. This mechanism executes arbitrary code from installed packages and poses a supply chain risk. ```python # In __init__.py extensions = entry_points(group="bitsandbytes.backends") for ext in extensions: entry = ext.load() entry() # Executes arbitrary code from any installed package ``` -------------------------------- ### Build bitsandbytes Native Library Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/agents/testing_guide.md Compile the native library for your GPU. Ensure your CUDA toolkit version is compatible or create a symlink if necessary. Install in editable mode afterwards. ```bash # Find your GPU's compute capability nvidia-smi --query-gpu=compute_cap --format=csv,noheader # Build (replace 89 with your compute capability, e.g. 120 for Blackwell) cmake -DCOMPUTE_BACKEND=cuda -DCOMPUTE_CAPABILITY="89" -S . -B build cmake --build build -j$(nproc) # If your CUDA toolkit version differs from PyTorch's CUDA version, create a symlink: # e.g., toolkit is 12.4 but PyTorch expects 12.8: ln -sf bitsandbytes/libbitsandbytes_cuda124.so bitsandbytes/libbitsandbytes_cuda128.so # Install in editable mode pip install -e . ``` -------------------------------- ### Review pyproject.toml for Build System Changes Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/agents/security_guide.md Examine changes to build system requirements and backends, as well as custom build scripts, which could introduce arbitrary code execution at install time. ```toml # REVIEW — build system changes [build-system] requires = [...] # New build dependencies build-backend = "..." # Changing the build backend # BLOCK — custom build scripts that weren't there before [tool.setuptools.cmdclass] install = "custom_install.CustomInstall" # Arbitrary code at install time ``` -------------------------------- ### Quantize Tensor in 4-bit Blocks Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/agents/code_standards.md Example of a docstring for the `quantize_4bit` function, detailing its arguments, supported datatypes, potential exceptions, and return values. Type annotations use backtick format. ```python def quantize_4bit( A: torch.Tensor, ... ) -> tuple[torch.Tensor, QuantState]: """Quantize tensor A in blocks of 4-bit values. Quantizes tensor A by dividing it into blocks which are independently quantized. Args: A (`torch.Tensor`): The input tensor. Supports `float16`, `bfloat16`, or `float32` datatypes. blocksize (`int`, *optional*): The size of the blocks. Defaults to 128 on ROCm and 64 otherwise. Valid values are 32, 64, 128, 256, 512, 1024, 2048, and 4096. Raises: ValueError: Raised when the input data type is not supported. Returns: Tuple[`torch.Tensor`, `QuantState`]: A tuple containing the quantization results. """ ``` -------------------------------- ### Get Tensor Data Pointer Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/agents/api_surface.md Gets the data pointer of a tensor, useful for ctypes calls. This is an internal function. ```python F.get_ptr(A: Optional[Tensor]) -> Optional[ct.c_void_p] ``` -------------------------------- ### Initialize and Train with SFTTrainer Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/docs/source/fsdp_qlora.md Pass the configured model, PEFT config, tokenizer, and training arguments to the SFTTrainer for initiating the QLoRA training process. ```python from trl import SFTTrainer trainer = SFTTrainer( model=model, train_dataset=dataset, peft_config=peft_config, processing_class=tokenizer, args=training_arguments, ) trainer.train() ``` -------------------------------- ### Configure XPU Build Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/CMakeLists.txt Sets up the build for XPU (e.g., Intel Xe) by appending source files, defining the output name, and setting the C/C++ compilers to Intel's icx/icpx. ```cmake list(APPEND SRC_FILES ${XPU_FILES}) string(APPEND BNB_OUTPUT_NAME "_xpu") add_compile_definitions(BUILD_XPU) set(CMAKE_C_COMPILER icx) set(CMAKE_CXX_COMPILER icpx) if(WIN32) set(CMAKE_CXX_COMPILER icx) endif() ``` -------------------------------- ### Initialize 32-bit Adam Optimizer Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/docs/source/optimizers.mdx Demonstrates how to initialize a 32-bit Adam optimizer using `bitsandbytes`, specifying learning rate, betas, and optimizer bits. ```python import bitsandbytes as bnb adam = bnb.optim.Adam(model.parameters(), lr=0.001, betas=(0.9, 0.995), optim_bits=32) ``` -------------------------------- ### Handling Optional Dependencies with Try-Except Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/agents/code_standards.md When adding optional dependencies, use a `try-except ImportError` block to handle cases where the dependency is not installed. Raise a clear `ImportError` with instructions on how to install the necessary package, chaining the original exception. ```python try: from scipy.stats import norm except ImportError as ie: raise ImportError( "Scipy is required for `create_normal_map`. Install `bitsandbytes` with the `[test]` extra.", ) from ie ``` -------------------------------- ### HIP Backend Configuration Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/CMakeLists.txt Sets up the build for the HIP backend, finding hipblas and hiprand, linking against them, and configuring include and library paths. It also handles platform-specific linking and HIPBLASLT version checks. ```cmake if(DEFINED ENV{ROCM_PATH}) file(TO_CMAKE_PATH "$ENV{ROCM_PATH}" ROCM_PATH) else() set(ROCM_PATH /opt/rocm) endif() list(APPEND CMAKE_PREFIX_PATH ${ROCM_PATH}) macro(find_package_and_print_version PACKAGE_NAME) find_package("${PACKAGE_NAME}" ${ARGN}) message("${PACKAGE_NAME} VERSION: ${${PACKAGE_NAME}_VERSION}") endmacro() find_package_and_print_version(hipblas REQUIRED) find_package_and_print_version(hiprand REQUIRED) ## hacky way of excluding hip::amdhip64 (with it linked many tests unexpectedly fail e.g. adam8bit because of inaccuracies) ## On Windows, we need to link amdhip64 explicitly if(NOT WIN32) set_target_properties(hip::host PROPERTIES INTERFACE_LINK_LIBRARIES "") set_target_properties(hip-lang::host PROPERTIES INTERFACE_LINK_LIBRARIES "") set(CMAKE_HIP_IMPLICIT_LINK_LIBRARIES "") endif() target_include_directories(bitsandbytes PRIVATE ${CMAKE_SOURCE_DIR} ${CMAKE_SOURCE_DIR}/include ${ROCM_PATH}/include /include) target_link_directories(bitsandbytes PRIVATE ${ROCM_PATH}/lib /lib) target_link_libraries(bitsandbytes PUBLIC roc::hipblas hip::hiprand) # On Windows, rocblas is not pulled in transitively by roc::hipblas # and is needed because ops_hip.cuh uses rocblas_handle directly. if(WIN32) target_link_libraries(bitsandbytes PUBLIC rocblas) endif() target_compile_definitions(bitsandbytes PUBLIC BNB_USE_HIP) set_source_files_properties(${GPU_FILES} PROPERTIES LANGUAGE HIP) set_target_properties(bitsandbytes PROPERTIES LINKER_LANGUAGE CXX) if(HIP_VERSION VERSION_LESS "6.1") target_compile_definitions(bitsandbytes PUBLIC NO_HIPBLASLT) else() find_package(hipblaslt) target_link_libraries(bitsandbytes PUBLIC roc::hipblaslt) endif() ``` -------------------------------- ### Masked Fill Operation Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/agents/architecture_guide.md Example of using masked_fill to replace outliers with a specified value. ```python A = A.masked_fill(outlier_mask, 0.0) ``` -------------------------------- ### Refresh Issue Data Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/agents/issue_maintenance_guide.md Run this command to refresh the issue data before starting the triage process. ```bash python3 agents/fetch_issues.py ``` -------------------------------- ### SwitchBackLinear Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/agents/api_surface.md A Triton-based SwitchBack linear layer. Requires triton to be installed. It has a `prepare_for_eval()` method for pre-quantizing weights. ```APIDOC ## `SwitchBackLinear` — Triton-based SwitchBack ### Description A Triton-based SwitchBack linear layer. Requires triton to be installed. It has a `prepare_for_eval()` method for pre-quantizing weights. ### Constructor ```python bitsandbytes.nn.SwitchBackLinear( in_features: int, out_features: int, bias: bool = True, device=None, dtype=None, vector_wise_quantization: bool = False, mem_efficient: bool = False, ) ``` ### Parent `torch.nn.Linear` ### Stability Experimental — requires triton. ### Notes Has a `prepare_for_eval()` method that pre-quantizes weights. ``` -------------------------------- ### Linear4bit Module Initialization Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/agents/architecture_guide.md Initializes the Linear4bit module, a QLoRA component. Weights are wrapped in Params4bit and quantized lazily on device movement. ```python class Linear4bit(nn.Linear): def __init__(self, input_features, output_features, bias=True, compute_dtype=None, compress_statistics=True, quant_type="fp4", quant_storage=torch.uint8, device=None): # Weight is wrapped in Params4bit (quantizes on .to(device)) self.weight = Params4bit(self.weight.data, ...) ``` -------------------------------- ### Check CUDA Toolkit Version Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/COMPILE_H100_L40.md Verify the installed CUDA Toolkit version using the nvcc command. ```bash # Check your CUDA Toolkit version nvcc --version ``` -------------------------------- ### Basic CMake Build Commands Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/CMakeLists.txt Standard commands for building the project with GCC or MSVC. Ensure the CUDA Toolkit is on your PATH. ```bash # For GCC: cmake -B build . && cmake --build build ``` ```bash # For MSVC: cmake -B build . && cmake --build build --config Release ``` -------------------------------- ### Get CUBLAS Context Instance Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/agents/api_surface.md Retrieves the singleton instance of the CUBLAS context manager for the current device. This is an internal API. ```python F.CUBLAS_Context.get_instance() -> CUBLAS_Context ``` -------------------------------- ### Get Global Page Manager Instance Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/agents/api_surface.md Retrieves the singleton instance of the GlobalPageManager, which manages paged tensors for prefetching. This is an internal function. ```python F.GlobalPageManager.get_instance() -> GlobalPageManager ``` -------------------------------- ### Create Agent Prompt Directory Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/agents/dispatch_guide.md Ensure the directory for agent prompt files exists before writing prompts. ```bash mkdir -p /tmp/bnb-agents ``` -------------------------------- ### Source File Definitions Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/CMakeLists.txt Lists the source files for different backends (CPU, GPU, MPS, Metal, XPU). ```cmake set(CPP_FILES csrc/cpu_ops.cpp csrc/pythonInterface.cpp) ``` ```cmake set(GPU_FILES csrc/ops.cu csrc/kernels.cu) ``` ```cmake set(MPS_FILES csrc/mps_ops.mm) ``` ```cmake set(METAL_FILES csrc/mps_kernels.metal) ``` ```cmake set(XPU_FILES csrc/xpu_ops.cpp csrc/xpu_kernels.cpp) ``` -------------------------------- ### 8-bit Inference with Transformers Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/docs/source/quickstart.mdx Load and run a model using 8-bit quantization for inference. This reduces memory consumption by 50%. Ensure you have the 'transformers' library installed. ```python from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig model = AutoModelForCausalLM.from_pretrained( "meta-llama/Llama-2-7b-hf", device_map="auto", quantization_config=BitsAndBytesConfig(load_in_8bit=True), ) tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-hf") inputs = tokenizer("Hello, my name is", return_tensors="pt").to("cuda") outputs = model.generate(**inputs, max_new_tokens=20) print(tokenizer.decode(outputs[0])) ``` -------------------------------- ### Register Parameters for Optimization Management Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/docs/source/optimizers.mdx Initialize the `GlobalOptimManager` and register model parameters while they are on the CPU before moving the model to GPU. ```python import torch import bitsandbytes as bnb mng = bnb.optim.GlobalOptimManager.get_instance() model = MyModel() mng.register_parameters(model.parameters()) ``` -------------------------------- ### Run Full Test Suite with Timing Breakdown Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/agents/testing_guide.md Run the complete test suite and include a timing breakdown for the slowest tests. This is useful for identifying performance bottlenecks within the test suite. The `--durations=20` option displays the 20 slowest tests. ```bash pytest tests/ -v --tb=short -n 4 --durations=20 ``` -------------------------------- ### Check PyTorch CUDA and ROCm Versions Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/docs/source/errors.mdx Print the CUDA or ROCm version PyTorch was compiled against. This helps identify version mismatches with installed libraries. ```python import torch print(torch.version.cuda) print(torch.version.hip) ``` -------------------------------- ### GlobalOptimManager Get Instance Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/agents/api_surface.md Retrieves the singleton instance of GlobalOptimManager for per-parameter optimizer configuration overrides. Used by StableEmbedding and Embedding to force 32-bit states. ```python bitsandbytes.optim.GlobalOptimManager.get_instance() ``` -------------------------------- ### Run Full Pre-commit Linting Suite Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/CLAUDE.md Mandatory step before pushing a PR branch. This command runs all CI lint hooks, including ruff, ruff format, typos, and clang-format. Do not run individual hooks. ```bash pre-commit run --all-files ``` -------------------------------- ### Enable CUDA Language and Find Toolkit Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/CMakeLists.txt Enables the CUDA language for the project and finds the necessary CUDA Toolkit. This will fail if CUDA is not installed or configured correctly. ```cmake enable_language(CUDA) # This will fail if CUDA is not found find_package(CUDAToolkit REQUIRED) ``` -------------------------------- ### Configure bitsandbytes for FSDP-QLoRA Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/docs/source/fsdp_qlora.md Set `bnb_4bit_quant_storage` to match the model's `torch_dtype` for FSDP compatibility. `bnb_4bit_compute_dtype` determines the computation data type, with `torch.bfloat16` recommended for stability. ```python from transformers import BitsAndBytesConfig, AutoModelForCausalLM bnb_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.bfloat16, bnb_4bit_quant_storage=torch.bfloat16, ) model = AutoModelForCausalLM.from_pretrained( "meta-llama/Llama-2-70b", quantization_config=bnb_config, torch_dtype=torch.bfloat16, ) ``` -------------------------------- ### Configure LoraConfig for QLoRA Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/docs/source/integrations.mdx Set up a LoraConfig object for QLoRA fine-tuning, specifying parameters like rank (r), alpha, target modules, dropout, and task type. ```python from peft import LoraConfig config = LoraConfig( r=16, lora_alpha=8, target_modules="all-linear", lora_dropout=0.05 bias="none", task_type="CAUSAL_LM" ) ``` -------------------------------- ### Get 4-bit Quantization Type Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/agents/api_surface.md Retrieves a codebook tensor for a specified 4-bit quantization type. Valid types include 'nf4', 'fp4', 'int4', and 'af4'. ```python F.get_4bit_type(typename: str, device=None, blocksize=64) -> torch.Tensor ``` -------------------------------- ### Getting Ctypes Pointers from Tensors Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/agents/code_standards.md Uses the `get_ptr()` utility to obtain ctypes void pointers from PyTorch tensors, which are necessary for calling native C/C++ functions. ```python from bitsandbytes.functional import get_ptr ptrA = get_ptr(A) # ct.c_void_p or None if A is None ptrOut = get_ptr(out) ``` -------------------------------- ### Pytest Assertion Examples Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/agents/code_standards.md Write assertions that verify specific values, shapes, dtypes, and numerical accuracy using torch.testing.assert_close. Avoid assertions that only check for non-crashes. ```python # GOOD: verifies actual correctness assert out.shape == (10, 30) assert out.dtype == torch.int32 assert out.device == A.device # GOOD: numerical accuracy check torch.testing.assert_close(dequantized, original, rtol=0.1, atol=0.01) # GOOD: custom tolerance with count def assert_all_approx_close(a, b, rtol=1e-3, atol=1e-3, count=0): idx = torch.isclose(a, b, rtol=rtol, atol=atol) sumval = (idx == 0).sum().item() if sumval > count: torch.testing.assert_close(a, b, rtol=rtol, atol=atol) # BAD: only checks it doesn't crash result = my_function(input) assert result is not None # This proves nothing about correctness ``` -------------------------------- ### Modifying pyproject.toml Dependencies Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/agents/security_guide.md Shows examples of adding new dependencies to pyproject.toml, including runtime and optional dependencies. Any addition requires verification of existence and legitimacy. ```toml # Changes to pyproject.toml dependencies dependencies = [ "torch>=2.3,<3", "numpy>=1.17", "packaging>=20.9", "new-package>=1.0", # WHY? Verify existence and legitimacy. ] ``` ```toml # Optional dependencies [project.optional-dependencies] new_feature = ["suspicious-package"] # Same scrutiny applies ``` -------------------------------- ### bitsandbytes.nn.Linear4bit Layer Initialization Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/agents/api_surface.md Initializes a 4-bit quantized linear layer. Weights are stored as Params4bit and quantized on .to(device). Use this for QLoRA implementations. ```python bitsandbytes.nn.Linear4bit( input_features: int, output_features: int, bias: bool = True, compute_dtype: Optional[torch.dtype] = None, compress_statistics: bool = True, quant_type: str = "fp4", quant_storage: torch.dtype = torch.uint8, device = None, ) ``` -------------------------------- ### GlobalOptimManager Parameter Registration and Override Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/agents/architecture_guide.md Shows how to use the GlobalOptimManager singleton to register model parameters and override their optimizer configurations, such as forcing 32-bit optimizer states for specific parameters. ```python mng = bnb.optim.GlobalOptimManager.get_instance() mng.register_parameters(model.parameters()) mng.override_config(model.fc1.weight, 'optim_bits', 32) # Force 32-bit for this param ``` -------------------------------- ### List Open Issues Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/agents/dispatch_guide.md Fetch a list of all open issues. Use the --sort reactions flag to prioritize issues with more engagement. ```bash python3 agents/query_issues.py list python3 agents/query_issues.py list --sort reactions ``` -------------------------------- ### Getting Current CUDA Stream Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/agents/code_standards.md Retrieves the current CUDA stream associated with a tensor using `_get_tensor_stream`. This stream can then be passed to C functions that support asynchronous operations. ```python stream = _get_tensor_stream(A) # Pass as last argument to C functions that accept streams lib.cdequantize_blockwise_fp16(*args, stream) ``` -------------------------------- ### Get List of Changed Files in PR Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/agents/pr_review_guide.md Retrieve a list of all files that have been modified in a specific pull request. This is the first step before reading the full content of each changed file. ```bash gh pr diff --name-only ``` -------------------------------- ### Fetch Basic PR Metadata Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/agents/pr_review_guide.md Use the GitHub CLI to retrieve essential PR information like title, author, labels, and state. This helps in understanding the PR's context before diving into the code. ```bash # Basic PR info gh pr view --json title,body,author,labels,state,headRefName,baseRefName,additions,deletions,changedFiles,commits,reviews,comments,mergeStateStatus ``` ```bash # Changed files list gh pr diff --stat ``` ```bash # Full diff gh pr diff ``` ```bash # CI check status gh pr checks ``` ```bash # Comments and review threads gh pr view --comments ``` -------------------------------- ### Backend Module Imports Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/agents/code_standards.md Illustrates how backend modules should import from their relative parent modules using relative import paths. ```python # In backends/cuda/ops.py: from ..._ops import register_kernel from ...cextension import ROCM_WARP_SIZE_64, lib ``` -------------------------------- ### CUDA Backend Operation Call Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/agents/architecture_guide.md Demonstrates how CUDA backend operations call native functions through the global `lib` object. This specific example shows a `cquantize_blockwise_fp16` call. ```python from ...cextension import lib # In backends/cuda/ops.py: lib.cquantize_blockwise_fp16(code_ptr, A_ptr, absmax_ptr, out_ptr, blocksize, n) ``` -------------------------------- ### bitsandbytes.nn.LinearFP4 Layer Initialization Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/agents/api_surface.md Convenience wrapper for Linear4bit with quant_type="fp4" hardcoded. Use this for FP4 quantization. ```python bitsandbytes.nn.LinearFP4( input_features, output_features, bias=True, compute_dtype=None, compress_statistics=True, quant_storage=torch.uint8, device=None, ) ``` -------------------------------- ### Useless Test Case: Always Passing Assertion Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/agents/security_guide.md An example of a test that appears to check functionality but contains an assertion with a bound so loose it will always pass, rendering the test ineffective. ```python def test_quantization_error(): x = torch.randn(64, 64) qx = quantize_4bit(x) dx = dequantize_4bit(qx) error = (x - dx).abs().mean() assert error < 10.0 ``` -------------------------------- ### Ruff Format Example: Long Assert Statement Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/agents/linting_guide.md Demonstrates how 'ruff format' reformats a long assert statement with an f-string message by wrapping it. This highlights the difference between linting and formatting. ```python # Before (fails ruff format): assert err < threshold, f"Error {err:.6f} exceeds {threshold:.6f} + {N}*{std:.6f}" # After (ruff format wraps it): assert err < threshold, ( f"Error {err:.6f} exceeds {threshold:.6f} + {N}*{std:.6f}" ) ``` -------------------------------- ### Compile bitsandbytes from Source on Jetson Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/docs/source/installation.mdx For NVIDIA Jetson devices (L4T / JetPack), a source build on-device is required due to compatibility issues with standard PyPI wheels. Specify the device's compute capability during the CMake configuration. ```bash cmake -DCOMPUTE_BACKEND=cuda -DCOMPUTE_CAPABILITY=87 . make -j4 pip install . ``` -------------------------------- ### List GitHub Issues by Label Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/agents/github_tools_guide.md Filters and lists open issues belonging to a specific category, such as 'Bug', 'Optimizers', or 'CUDA Setup'. This helps in organizing and reviewing issues by type. ```bash python3 agents/query_issues.py list --label "Bug" ``` ```bash python3 agents/query_issues.py list --label "Optimizers" ``` ```bash python3 agents/query_issues.py list --label "CUDA Setup" ``` -------------------------------- ### List Open Issues Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/agents/issue_maintenance_guide.md Use these commands to list open issues, including filtering by labels like 'Duplicate', 'Proposing to Close', 'Waiting for Info', 'Question', and 'Likely Not a BNB Issue'. Also lists unlabeled issues. ```bash # All open issues python3 agents/query_issues.py list # Low-hanging fruit python3 agents/query_issues.py list --label "Duplicate" python3 agents/query_issues.py list --label "Proposing to Close" python3 agents/query_issues.py list --label "Waiting for Info" python3 agents/query_issues.py list --label "Question" python3 agents/query_issues.py list --label "Likely Not a BNB Issue" python3 agents/query_issues.py list --unlabeled ```