### Install and Install Pre-commit Hooks

Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/docs/source/contributing.mdx

Install pre-commit for managing Git hooks and then install the hooks for the project. These hooks run automatically on commit.

```bash
pip install pre-commit
pre-commit install
```

--------------------------------

### Install Required Libraries

Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/docs/source/fsdp_qlora.md

Ensure you have the latest versions of bitsandbytes, accelerate, transformers, peft, and trl installed for FSDP-QLoRA training.

```bash
pip install -U bitsandbytes accelerate transformers peft trl
```

--------------------------------

### Install bitsandbytes Package

Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/COMPILE_H100_L40.md

Install the compiled bitsandbytes package. Use the -e flag for an editable/development install.

```bash
pip install -e .
```

--------------------------------

### Install Preview Wheel (Windows x86-64)

Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/docs/source/installation.mdx

Install the latest preview wheel for Windows x86-64. Use `--no-deps` to avoid reinstalling dependencies if they are already met.

```bash
# Note: if you don't want to reinstall our dependencies, append the `--no-deps` flag!
pip install --force-reinstall https://github.com/bitsandbytes-foundation/bitsandbytes/releases/download/continuous-release_main/bitsandbytes-1.33.7.preview-py3-none-win_amd64.whl
```

--------------------------------

### Install Build Tools on Ubuntu

Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/docs/source/installation.mdx

Installs essential build tools like a compiler and CMake on Ubuntu systems.

```bash
apt-get install -y build-essential cmake
```

--------------------------------

### Install Preview Wheel (macOS ARM64)

Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/docs/source/installation.mdx

Install the latest preview wheel for macOS ARM64. Use `--no-deps` to avoid reinstalling dependencies if they are already met.

```bash
# Note: if you don't want to reinstall our dependencies, append the `--no-deps` flag!
pip install --force-reinstall https://github.com/bitsandbytes-foundation/bitsandbytes/releases/download/continuous-release_main/bitsandbytes-1.33.7.preview-py3-none-macosx_14_0_arm64.whl
```

--------------------------------

### Install bitsandbytes

Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/docs/source/quickstart.mdx

Install the bitsandbytes library using pip. Requires Python 3.10+ and PyTorch 2.4+.

```bash
pip install bitsandbytes
```

--------------------------------

### Install Preview Wheel (Linux x86_64)

Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/docs/source/installation.mdx

Install the latest preview wheel for Linux x86_64. Use `--no-deps` to avoid reinstalling dependencies if they are already met.

```bash
# Note: if you don't want to reinstall our dependencies, append the `--no-deps` flag!

# x86_64 (most users)
pip install --force-reinstall https://github.com/bitsandbytes-foundation/bitsandbytes/releases/download/continuous-release_main/bitsandbytes-1.33.7.preview-py3-none-manylinux_2_24_x86_64.whl
```

--------------------------------

### Install Preview Wheel (Linux ARM/aarch64)

Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/docs/source/installation.mdx

Install the latest preview wheel for Linux ARM/aarch64. Use `--no-deps` to avoid reinstalling dependencies if they are already met.

```bash
# Note: if you don't want to reinstall our dependencies, append the `--no-deps` flag!

# ARM/aarch64
pip install --force-reinstall https://github.com/bitsandbytes-foundation/bitsandbytes/releases/download/continuous-release_main/bitsandbytes-1.33.7.preview-py3-none-manylinux_2_24_aarch64.whl
```

--------------------------------

### Install Preview Wheel (Windows ARM64)

Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/docs/source/installation.mdx

Install the latest preview wheel for Windows ARM64. Requires Python 3.12 or newer. Use `--no-deps` to avoid reinstalling dependencies if they are already met.

```bash
# Note: if you don't want to reinstall our dependencies, append the `--no-deps` flag!
# Requires Python >= 3.12
pip install --force-reinstall https://github.com/bitsandbytes-foundation/bitsandbytes/releases/download/continuous-release_main/bitsandbytes-1.33.7.preview-py3-none-win_arm64.whl
```

--------------------------------

### Install CPU-only bitsandbytes from Source (Linux/macOS)

Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/docs/source/installation.mdx

Clone the repository and install the package in editable mode for CPU-only builds. This is the standard method for Linux and macOS.

```bash
git clone https://github.com/bitsandbytes-foundation/bitsandbytes.git && cd bitsandbytes/
pip install -e .
```

--------------------------------

### Install Test Dependencies

Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/agents/testing_guide.md

Install the Python packages required for running tests.

```bash
pip install einops lion-pytorch pytest pytest-xdist scipy transformers
```

--------------------------------

### Verify bitsandbytes Installation

Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/COMPILE_H100_L40.md

Import the bitsandbytes library in Python to confirm a successful installation and check the version.

```python
import bitsandbytes as bnb
print(f'bitsandbytes version: {bnb.__version__}')
print('Success!')
```

--------------------------------

### Setup Worktree Instructions

Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/agents/dispatch_guide.md

Provides essential commands for setting up a worktree for issue fixing. It's crucial to use a worktree and not modify the main repository directly.

```markdown
## Setup

IMPORTANT: You MUST create a worktree. Do NOT work in ~/git/bitsandbytes directly.

    cd ~/git/bitsandbytes
    git worktree add ~/git/bnb-fix-<NUMBER> -b fix/issue-<NUMBER>
    cd ~/git/bnb-fix-<NUMBER>

Read agents/testing_guide.md for build and test instructions. Build the
project before making changes so you can verify your setup works.
```

--------------------------------

### QLoRA Fine-tuning Setup

Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/docs/source/quickstart.mdx

Prepare a model for k-bit training using QLoRA by combining 4-bit quantization with LoRA adapters. Requires 'transformers' and 'peft' libraries.

```python
from transformers import AutoModelForCausalLM, BitsAndBytesConfig
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training

# Load 4-bit model
bnb_config = BitsAndBytesConfig(load_in_4bit=True)
model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-2-7b-hf",
    quantization_config=bnb_config,
)

# Prepare for training
model = prepare_model_for_kbit_training(model)

# Add LoRA adapters
lora_config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["q_proj", "v_proj"],
    task_type="CAUSAL_LM",
)
model = get_peft_model(model, lora_config)

# Now train with your preferred trainer
```

--------------------------------

### Install ROCm SDK wheels for Windows Compilation

Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/docs/source/installation.mdx

Install necessary ROCm SDK wheels and tools for compiling bitsandbytes from source on Windows. This method uses pip-installable wheels instead of a system-wide ROCm installation.

```bash
# Install ROCm SDK wheels (adjust version as needed)
pip install ninja cmake
pip install \
    https://repo.radeon.com/rocm/windows/rocm-rel-7.2.1/rocm_sdk_core-7.2.1-py3-none-win_amd64.whl \
    https://repo.radeon.com/rocm/windows/rocm-rel-7.2.1/rocm_sdk_devel-7.2.1-py3-none-win_amd64.whl \
    https://repo.radeon.com/rocm/windows/rocm-rel-7.2.1/rocm_sdk_libraries_custom-7.2.1-py3-none-win_amd64.whl \
    https://repo.radeon.com/rocm/windows/rocm-rel-7.2.1/rocm-7.2.1.tar.gz

# Expand the devel tarball
rocm-sdk init
```

--------------------------------

### Start Issue Triage Session

Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/agents/issue_triage_workflow.md

Commands to navigate to the project directory and start the Claude agent for issue triage. Use this to initiate a session with specific instructions.

```bash
cd ~/git/bitsandbytes
claude
```

--------------------------------

### Dispatching to Ops using torch.ops

Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/agents/code_standards.md

Example of dispatching to the bitsandbytes ops namespace using torch.ops for operations like quantize_4bit.

```python
# GOOD: use torch.ops for dispatch
_out, _absmax = torch.ops.bitsandbytes.quantize_4bit.default(A, blocksize, quant_type, quant_storage)
```

--------------------------------

### Setup Working Environment

Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/agents/dispatch_guide.md

Commands to set up a new Git worktree for a specific issue branch. This isolates your changes and allows for a clean development environment.

```bash
cd ~/git/bitsandbytes
git worktree add ~/git/bnb-fix-1810 -b fix/issue-1810
cd ~/git/bnb-fix-1810
```

--------------------------------

### 4-bit Linear Module Example

Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/agents/code_standards.md

Demonstrates the usage of the `Linear4bit` module, including how to move it to the CUDA device for quantization. This class is based on the QLoRA paper.

```python
class Linear4bit(nn.Linear):
    """
    This class is the base module for the 4-bit quantization algorithm presented in
    [QLoRA](https://arxiv.org/abs/2305.14314).

    Example:

    ```python
    import bitsandbytes as bnb
    linear_q = bnb.nn.Linear4bit(64, 64)
    linear_q = linear_q.to("cuda")  # Quantization happens here
    ```
    """
```

--------------------------------

### Build System Dependencies

Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/agents/security_guide.md

Example of specifying build-time dependencies in pyproject.toml. These also require scrutiny for existence and legitimacy.

```toml
# setup.py / pyproject.toml install hooks
[build-system]
requires = ["setuptools", "new-build-tool"]  # Build-time dependencies too
```

--------------------------------

### Output Launch Commands for Issues

Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/agents/dispatch_guide.md

Provides example commands to launch worker agents for specific issues. Each command includes a reference to a prompt file containing detailed instructions for the agent.

```bash
## Launch Commands

Issue #1810 — LARS missing in str2optimizer32bit:
    claude "Please read /tmp/bnb-agents/issue-1810.md and follow the instructions."

Issue #919 — Noisy logs:
    claude "Please read /tmp/bnb-agents/issue-919.md and follow the instructions."
```

--------------------------------

### Verify System Dependencies

Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/COMPILE_H100_L40.md

Check if CMake, Python, GCC, and CUDA Toolkit are installed and meet the minimum version requirements.

```bash
cmake --version
python3 --version
gcc --version
nvcc --version
```

--------------------------------

### Example of Downstream Breakage Report

Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/agents/pr_review_guide.md

This example shows how to report critical breakages in downstream projects due to API changes. It lists affected projects and the specific components that will break.

```text
CRITICAL: Removing Params4bit.quant_state attribute
- Transformers: dequantize_bnb_weight() accesses weight.quant_state -> breaks
- PEFT: all 4-bit merge/unmerge operations access weight.quant_state -> breaks
- Accelerate: set_module_tensor_to_device() checks getattr(weight, "quant_state") -> breaks
- TGI: Linear4bit.forward() accesses self.weight.quant_state -> breaks
```

--------------------------------

### Compile bitsandbytes from Source on Linux

Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/docs/source/installation.mdx

Clones the bitsandbytes repository, configures the build with CMake for CUDA, compiles the library, and installs it in editable mode.

```bash
git clone https://github.com/bitsandbytes-foundation/bitsandbytes.git && cd bitsandbytes/
cmake -DCOMPUTE_BACKEND=cuda -S .
make
pip install -e .   # `-e` for "editable" install, when developing BNB (otherwise leave that out)
```

--------------------------------

### QuantState Serialization Keys

Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/agents/architecture_guide.md

Example keys used when storing QuantState components in a state dictionary for checkpointing.

```plaintext
weight.quant_state.bitsandbytes__nf4
weight.absmax
weight.quant_map
weight.nested_absmax
weight.nested_quant_map
weight.quant_state.nested_blocksize
weight.quant_state.nested_dtype
weight.quant_state.nested_offset
```

--------------------------------

### Compile bitsandbytes from Source on Windows

Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/docs/source/installation.mdx

Clones the bitsandbytes repository, configures the build with CMake for CUDA, compiles the library in Release mode, and installs it in editable mode.

```bash
git clone https://github.com/bitsandbytes-foundation/bitsandbytes.git && cd bitsandbytes/
cmake -DCOMPUTE_BACKEND=cuda -S .
cmake --build . --config Release
pip install -e .   # `-e` for "editable" install, when developing BNB (otherwise leave that out)
```

--------------------------------

### Python Type Annotation Example

Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/agents/code_standards.md

Provides a good example of Python type annotations, adhering to project conventions such as using Optional[X] instead of X | None and using typing.Optional and collections.abc.Sequence.

```python
# GOOD: matches the conventions used throughout
def quantize_4bit(
    A: torch.Tensor,
    absmax: Optional[torch.Tensor] = None,
    out: Optional[torch.Tensor] = None,
    blocksize=None,  # no annotation for simple defaults is OK
    compress_statistics=False,
    quant_type="fp4",
    quant_storage=torch.uint8,
) -> tuple[torch.Tensor, QuantState]:
```

--------------------------------

### Quantization Codebook Creation

Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/agents/architecture_guide.md

Provides examples of creating pre-computed quantization codebooks for different bit precisions and configurations. These are essential for quantization operations.

```python
# Pre-computed quantization maps:
create_dynamic_map(signed=True, total_bits=8)  # Creates 256-entry dynamic quantization codebook
create_normal_map(offset=0.9677083, symmetric=False)  # NF4 codebook from normal distribution
create_fp4_map()  # FP4 codebook
```

--------------------------------

### Loading External Backends via Entrypoints

Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/agents/security_guide.md

Demonstrates how bitsandbytes loads external backends using Python entrypoints. This mechanism executes arbitrary code from installed packages and poses a supply chain risk.

```python
# In __init__.py
extensions = entry_points(group="bitsandbytes.backends")
for ext in extensions:
    entry = ext.load()
    entry()  # Executes arbitrary code from any installed package
```

--------------------------------

### Build bitsandbytes Native Library

Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/agents/testing_guide.md

Compile the native library for your GPU. Ensure your CUDA toolkit version is compatible or create a symlink if necessary. Install in editable mode afterwards.

```bash
# Find your GPU's compute capability
nvidia-smi --query-gpu=compute_cap --format=csv,noheader

# Build (replace 89 with your compute capability, e.g. 120 for Blackwell)
cmake -DCOMPUTE_BACKEND=cuda -DCOMPUTE_CAPABILITY="89" -S . -B build
cmake --build build -j$(nproc)

# If your CUDA toolkit version differs from PyTorch's CUDA version, create a symlink:
# e.g., toolkit is 12.4 but PyTorch expects 12.8:
ln -sf bitsandbytes/libbitsandbytes_cuda124.so bitsandbytes/libbitsandbytes_cuda128.so

# Install in editable mode
pip install -e .
```

--------------------------------

### Review pyproject.toml for Build System Changes

Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/agents/security_guide.md

Examine changes to build system requirements and backends, as well as custom build scripts, which could introduce arbitrary code execution at install time.

```toml
# REVIEW — build system changes
[build-system]
requires = [...]  # New build dependencies
build-backend = "..."  # Changing the build backend

# BLOCK — custom build scripts that weren't there before
[tool.setuptools.cmdclass]
install = "custom_install.CustomInstall"  # Arbitrary code at install time
```

--------------------------------

### Quantize Tensor in 4-bit Blocks

Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/agents/code_standards.md

Example of a docstring for the `quantize_4bit` function, detailing its arguments, supported datatypes, potential exceptions, and return values. Type annotations use backtick format.

```python
def quantize_4bit(
    A: torch.Tensor,
    ...
) -> tuple[torch.Tensor, QuantState]:
    """Quantize tensor A in blocks of 4-bit values.

    Quantizes tensor A by dividing it into blocks which are independently quantized.

    Args:
        A (`torch.Tensor`): The input tensor. Supports `float16`, `bfloat16`, or `float32` datatypes.
        blocksize (`int`, *optional*):
            The size of the blocks. Defaults to 128 on ROCm and 64 otherwise.
            Valid values are 32, 64, 128, 256, 512, 1024, 2048, and 4096.

    Raises:
        ValueError: Raised when the input data type is not supported.

    Returns:
        Tuple[`torch.Tensor`, `QuantState`]: A tuple containing the quantization results.
    """
```

--------------------------------

### Get Tensor Data Pointer

Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/agents/api_surface.md

Gets the data pointer of a tensor, useful for ctypes calls. This is an internal function.

```python
F.get_ptr(A: Optional[Tensor]) -> Optional[ct.c_void_p]
```

--------------------------------

### Initialize and Train with SFTTrainer

Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/docs/source/fsdp_qlora.md

Pass the configured model, PEFT config, tokenizer, and training arguments to the SFTTrainer for initiating the QLoRA training process.

```python
from trl import SFTTrainer

trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    peft_config=peft_config,
    processing_class=tokenizer,
    args=training_arguments,
)
trainer.train()
```

--------------------------------

### Configure XPU Build

Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/CMakeLists.txt

Sets up the build for XPU (e.g., Intel Xe) by appending source files, defining the output name, and setting the C/C++ compilers to Intel's icx/icpx.

```cmake
list(APPEND SRC_FILES ${XPU_FILES})
    string(APPEND BNB_OUTPUT_NAME "_xpu")
    add_compile_definitions(BUILD_XPU)
    set(CMAKE_C_COMPILER icx)
    set(CMAKE_CXX_COMPILER icpx)
    if(WIN32)
        set(CMAKE_CXX_COMPILER icx)
    endif()
```

--------------------------------

### Initialize 32-bit Adam Optimizer

Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/docs/source/optimizers.mdx

Demonstrates how to initialize a 32-bit Adam optimizer using `bitsandbytes`, specifying learning rate, betas, and optimizer bits.

```python
import bitsandbytes as bnb

adam = bnb.optim.Adam(model.parameters(), lr=0.001, betas=(0.9, 0.995), optim_bits=32)
```

--------------------------------

### Handling Optional Dependencies with Try-Except

Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/agents/code_standards.md

When adding optional dependencies, use a `try-except ImportError` block to handle cases where the dependency is not installed. Raise a clear `ImportError` with instructions on how to install the necessary package, chaining the original exception.

```python
try:
    from scipy.stats import norm
except ImportError as ie:
    raise ImportError(
        "Scipy is required for `create_normal_map`. Install `bitsandbytes` with the `[test]` extra.",
    ) from ie
```

--------------------------------

### HIP Backend Configuration

Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/CMakeLists.txt

Sets up the build for the HIP backend, finding hipblas and hiprand, linking against them, and configuring include and library paths. It also handles platform-specific linking and HIPBLASLT version checks.

```cmake
if(DEFINED ENV{ROCM_PATH})
      file(TO_CMAKE_PATH "$ENV{ROCM_PATH}" ROCM_PATH)
    else()
      set(ROCM_PATH /opt/rocm)
    endif()
    list(APPEND CMAKE_PREFIX_PATH ${ROCM_PATH})
    macro(find_package_and_print_version PACKAGE_NAME)
      find_package("${PACKAGE_NAME}" ${ARGN})
      message("${PACKAGE_NAME} VERSION: ${${PACKAGE_NAME}_VERSION}")
    endmacro()
    find_package_and_print_version(hipblas REQUIRED)
    find_package_and_print_version(hiprand REQUIRED)

    ## hacky way of excluding hip::amdhip64 (with it linked many tests unexpectedly fail e.g. adam8bit because of inaccuracies)
    ## On Windows, we need to link amdhip64 explicitly
    if(NOT WIN32)
        set_target_properties(hip::host PROPERTIES INTERFACE_LINK_LIBRARIES "")
        set_target_properties(hip-lang::host PROPERTIES INTERFACE_LINK_LIBRARIES "")
        set(CMAKE_HIP_IMPLICIT_LINK_LIBRARIES "")
    endif()

    target_include_directories(bitsandbytes PRIVATE ${CMAKE_SOURCE_DIR} ${CMAKE_SOURCE_DIR}/include ${ROCM_PATH}/include /include)
    target_link_directories(bitsandbytes PRIVATE ${ROCM_PATH}/lib /lib)
    target_link_libraries(bitsandbytes PUBLIC roc::hipblas hip::hiprand)

    # On Windows, rocblas is not pulled in transitively by roc::hipblas
    # and is needed because ops_hip.cuh uses rocblas_handle directly.
    if(WIN32)
        target_link_libraries(bitsandbytes PUBLIC rocblas)
    endif()

    target_compile_definitions(bitsandbytes PUBLIC BNB_USE_HIP)
    set_source_files_properties(${GPU_FILES} PROPERTIES LANGUAGE HIP)
    set_target_properties(bitsandbytes PROPERTIES LINKER_LANGUAGE CXX)

    if(HIP_VERSION VERSION_LESS "6.1")
	target_compile_definitions(bitsandbytes PUBLIC NO_HIPBLASLT)
    else()
	find_package(hipblaslt)
        target_link_libraries(bitsandbytes PUBLIC roc::hipblaslt)
    endif()
```

--------------------------------

### Masked Fill Operation

Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/agents/architecture_guide.md

Example of using masked_fill to replace outliers with a specified value.

```python
A = A.masked_fill(outlier_mask, 0.0)
```

--------------------------------

### Refresh Issue Data

Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/agents/issue_maintenance_guide.md

Run this command to refresh the issue data before starting the triage process.

```bash
python3 agents/fetch_issues.py
```

--------------------------------

### SwitchBackLinear

Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/agents/api_surface.md

A Triton-based SwitchBack linear layer. Requires triton to be installed. It has a `prepare_for_eval()` method for pre-quantizing weights.

```APIDOC
## `SwitchBackLinear` — Triton-based SwitchBack

### Description
A Triton-based SwitchBack linear layer. Requires triton to be installed. It has a `prepare_for_eval()` method for pre-quantizing weights.

### Constructor
```python
bitsandbytes.nn.SwitchBackLinear(
    in_features: int,
    out_features: int,
    bias: bool = True,
    device=None,
    dtype=None,
    vector_wise_quantization: bool = False,
    mem_efficient: bool = False,
)
```

### Parent
`torch.nn.Linear`

### Stability
Experimental — requires triton.

### Notes
Has a `prepare_for_eval()` method that pre-quantizes weights.
```

--------------------------------

### Linear4bit Module Initialization

Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/agents/architecture_guide.md

Initializes the Linear4bit module, a QLoRA component. Weights are wrapped in Params4bit and quantized lazily on device movement.

```python
class Linear4bit(nn.Linear):
    def __init__(self, input_features, output_features, bias=True,
                 compute_dtype=None, compress_statistics=True,
                 quant_type="fp4", quant_storage=torch.uint8, device=None):
        # Weight is wrapped in Params4bit (quantizes on .to(device))
        self.weight = Params4bit(self.weight.data, ...)
```

--------------------------------

### Check CUDA Toolkit Version

Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/COMPILE_H100_L40.md

Verify the installed CUDA Toolkit version using the nvcc command.

```bash
# Check your CUDA Toolkit version
nvcc --version
```

--------------------------------

### Basic CMake Build Commands

Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/CMakeLists.txt

Standard commands for building the project with GCC or MSVC. Ensure the CUDA Toolkit is on your PATH.

```bash
# For GCC:
cmake -B build . && cmake --build build
```

```bash
# For MSVC:
cmake -B build . && cmake --build build --config Release
```

--------------------------------

### Get CUBLAS Context Instance

Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/agents/api_surface.md

Retrieves the singleton instance of the CUBLAS context manager for the current device. This is an internal API.

```python
F.CUBLAS_Context.get_instance() -> CUBLAS_Context
```

--------------------------------

### Get Global Page Manager Instance

Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/agents/api_surface.md

Retrieves the singleton instance of the GlobalPageManager, which manages paged tensors for prefetching. This is an internal function.

```python
F.GlobalPageManager.get_instance() -> GlobalPageManager
```

--------------------------------

### Create Agent Prompt Directory

Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/agents/dispatch_guide.md

Ensure the directory for agent prompt files exists before writing prompts.

```bash
mkdir -p /tmp/bnb-agents
```

--------------------------------

### Source File Definitions

Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/CMakeLists.txt

Lists the source files for different backends (CPU, GPU, MPS, Metal, XPU).

```cmake
set(CPP_FILES csrc/cpu_ops.cpp csrc/pythonInterface.cpp)
```

```cmake
set(GPU_FILES csrc/ops.cu csrc/kernels.cu)
```

```cmake
set(MPS_FILES csrc/mps_ops.mm)
```

```cmake
set(METAL_FILES csrc/mps_kernels.metal)
```

```cmake
set(XPU_FILES csrc/xpu_ops.cpp csrc/xpu_kernels.cpp)
```

--------------------------------

### 8-bit Inference with Transformers

Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/docs/source/quickstart.mdx

Load and run a model using 8-bit quantization for inference. This reduces memory consumption by 50%. Ensure you have the 'transformers' library installed.

```python
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-2-7b-hf",
    device_map="auto",
    quantization_config=BitsAndBytesConfig(load_in_8bit=True),
)

tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-hf")
inputs = tokenizer("Hello, my name is", return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=20)
print(tokenizer.decode(outputs[0]))
```

--------------------------------

### Register Parameters for Optimization Management

Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/docs/source/optimizers.mdx

Initialize the `GlobalOptimManager` and register model parameters while they are on the CPU before moving the model to GPU.

```python
import torch
import bitsandbytes as bnb

mng = bnb.optim.GlobalOptimManager.get_instance()

model = MyModel()
mng.register_parameters(model.parameters())
```

--------------------------------

### Run Full Test Suite with Timing Breakdown

Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/agents/testing_guide.md

Run the complete test suite and include a timing breakdown for the slowest tests. This is useful for identifying performance bottlenecks within the test suite. The `--durations=20` option displays the 20 slowest tests.

```bash
pytest tests/ -v --tb=short -n 4 --durations=20
```

--------------------------------

### Check PyTorch CUDA and ROCm Versions

Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/docs/source/errors.mdx

Print the CUDA or ROCm version PyTorch was compiled against. This helps identify version mismatches with installed libraries.

```python
import torch
print(torch.version.cuda)
print(torch.version.hip)
```

--------------------------------

### GlobalOptimManager Get Instance

Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/agents/api_surface.md

Retrieves the singleton instance of GlobalOptimManager for per-parameter optimizer configuration overrides. Used by StableEmbedding and Embedding to force 32-bit states.

```python
bitsandbytes.optim.GlobalOptimManager.get_instance()
```

--------------------------------

### Run Full Pre-commit Linting Suite

Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/CLAUDE.md

Mandatory step before pushing a PR branch. This command runs all CI lint hooks, including ruff, ruff format, typos, and clang-format. Do not run individual hooks.

```bash
pre-commit run --all-files
```

--------------------------------

### Enable CUDA Language and Find Toolkit

Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/CMakeLists.txt

Enables the CUDA language for the project and finds the necessary CUDA Toolkit. This will fail if CUDA is not installed or configured correctly.

```cmake
enable_language(CUDA) # This will fail if CUDA is not found
find_package(CUDAToolkit REQUIRED)
```

--------------------------------

### Configure bitsandbytes for FSDP-QLoRA

Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/docs/source/fsdp_qlora.md

Set `bnb_4bit_quant_storage` to match the model's `torch_dtype` for FSDP compatibility. `bnb_4bit_compute_dtype` determines the computation data type, with `torch.bfloat16` recommended for stability.

```python
from transformers import BitsAndBytesConfig, AutoModelForCausalLM

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_quant_storage=torch.bfloat16,
)

model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-2-70b",
    quantization_config=bnb_config,
    torch_dtype=torch.bfloat16,
)
```

--------------------------------

### Configure LoraConfig for QLoRA

Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/docs/source/integrations.mdx

Set up a LoraConfig object for QLoRA fine-tuning, specifying parameters like rank (r), alpha, target modules, dropout, and task type.

```python
from peft import LoraConfig

config = LoraConfig(
    r=16,
    lora_alpha=8,
    target_modules="all-linear",
    lora_dropout=0.05
    bias="none",
    task_type="CAUSAL_LM"
)
```

--------------------------------

### Get 4-bit Quantization Type

Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/agents/api_surface.md

Retrieves a codebook tensor for a specified 4-bit quantization type. Valid types include 'nf4', 'fp4', 'int4', and 'af4'.

```python
F.get_4bit_type(typename: str, device=None, blocksize=64) -> torch.Tensor
```

--------------------------------

### Getting Ctypes Pointers from Tensors

Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/agents/code_standards.md

Uses the `get_ptr()` utility to obtain ctypes void pointers from PyTorch tensors, which are necessary for calling native C/C++ functions.

```python
from bitsandbytes.functional import get_ptr

ptrA = get_ptr(A)       # ct.c_void_p or None if A is None
ptrOut = get_ptr(out)
```

--------------------------------

### Pytest Assertion Examples

Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/agents/code_standards.md

Write assertions that verify specific values, shapes, dtypes, and numerical accuracy using torch.testing.assert_close. Avoid assertions that only check for non-crashes.

```python
# GOOD: verifies actual correctness
assert out.shape == (10, 30)
assert out.dtype == torch.int32
assert out.device == A.device

# GOOD: numerical accuracy check
torch.testing.assert_close(dequantized, original, rtol=0.1, atol=0.01)

# GOOD: custom tolerance with count
def assert_all_approx_close(a, b, rtol=1e-3, atol=1e-3, count=0):
    idx = torch.isclose(a, b, rtol=rtol, atol=atol)
    sumval = (idx == 0).sum().item()
    if sumval > count:
        torch.testing.assert_close(a, b, rtol=rtol, atol=atol)

# BAD: only checks it doesn't crash
result = my_function(input)
assert result is not None  # This proves nothing about correctness
```

--------------------------------

### Modifying pyproject.toml Dependencies

Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/agents/security_guide.md

Shows examples of adding new dependencies to pyproject.toml, including runtime and optional dependencies. Any addition requires verification of existence and legitimacy.

```toml
# Changes to pyproject.toml dependencies
dependencies = [
    "torch>=2.3,<3",
    "numpy>=1.17",
    "packaging>=20.9",
    "new-package>=1.0",          # WHY? Verify existence and legitimacy.
]
```

```toml
# Optional dependencies
[project.optional-dependencies]
new_feature = ["suspicious-package"]  # Same scrutiny applies
```

--------------------------------

### bitsandbytes.nn.Linear4bit Layer Initialization

Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/agents/api_surface.md

Initializes a 4-bit quantized linear layer. Weights are stored as Params4bit and quantized on .to(device). Use this for QLoRA implementations.

```python
bitsandbytes.nn.Linear4bit(
    input_features: int,
    output_features: int,
    bias: bool = True,
    compute_dtype: Optional[torch.dtype] = None,
    compress_statistics: bool = True,
    quant_type: str = "fp4",
    quant_storage: torch.dtype = torch.uint8,
    device = None,
)
```

--------------------------------

### GlobalOptimManager Parameter Registration and Override

Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/agents/architecture_guide.md

Shows how to use the GlobalOptimManager singleton to register model parameters and override their optimizer configurations, such as forcing 32-bit optimizer states for specific parameters.

```python
mng = bnb.optim.GlobalOptimManager.get_instance()
mng.register_parameters(model.parameters())
mng.override_config(model.fc1.weight, 'optim_bits', 32)  # Force 32-bit for this param
```

--------------------------------

### List Open Issues

Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/agents/dispatch_guide.md

Fetch a list of all open issues. Use the --sort reactions flag to prioritize issues with more engagement.

```bash
python3 agents/query_issues.py list
python3 agents/query_issues.py list --sort reactions
```

--------------------------------

### Getting Current CUDA Stream

Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/agents/code_standards.md

Retrieves the current CUDA stream associated with a tensor using `_get_tensor_stream`. This stream can then be passed to C functions that support asynchronous operations.

```python
stream = _get_tensor_stream(A)
# Pass as last argument to C functions that accept streams
lib.cdequantize_blockwise_fp16(*args, stream)
```

--------------------------------

### Get List of Changed Files in PR

Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/agents/pr_review_guide.md

Retrieve a list of all files that have been modified in a specific pull request. This is the first step before reading the full content of each changed file.

```bash
gh pr diff <NUMBER> --name-only
```

--------------------------------

### Fetch Basic PR Metadata

Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/agents/pr_review_guide.md

Use the GitHub CLI to retrieve essential PR information like title, author, labels, and state. This helps in understanding the PR's context before diving into the code.

```bash
# Basic PR info
gh pr view <NUMBER> --json title,body,author,labels,state,headRefName,baseRefName,additions,deletions,changedFiles,commits,reviews,comments,mergeStateStatus
```

```bash
# Changed files list
gh pr diff <NUMBER> --stat
```

```bash
# Full diff
gh pr diff <NUMBER>
```

```bash
# CI check status
gh pr checks <NUMBER>
```

```bash
# Comments and review threads
gh pr view <NUMBER> --comments
```

--------------------------------

### Backend Module Imports

Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/agents/code_standards.md

Illustrates how backend modules should import from their relative parent modules using relative import paths.

```python
# In backends/cuda/ops.py:
from ..._ops import register_kernel
from ...cextension import ROCM_WARP_SIZE_64, lib
```

--------------------------------

### CUDA Backend Operation Call

Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/agents/architecture_guide.md

Demonstrates how CUDA backend operations call native functions through the global `lib` object. This specific example shows a `cquantize_blockwise_fp16` call.

```python
from ...cextension import lib

# In backends/cuda/ops.py:
lib.cquantize_blockwise_fp16(code_ptr, A_ptr, absmax_ptr, out_ptr, blocksize, n)
```

--------------------------------

### bitsandbytes.nn.LinearFP4 Layer Initialization

Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/agents/api_surface.md

Convenience wrapper for Linear4bit with quant_type="fp4" hardcoded. Use this for FP4 quantization.

```python
bitsandbytes.nn.LinearFP4(
    input_features, output_features, bias=True,
    compute_dtype=None, compress_statistics=True,
    quant_storage=torch.uint8, device=None,
)
```

--------------------------------

### Useless Test Case: Always Passing Assertion

Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/agents/security_guide.md

An example of a test that appears to check functionality but contains an assertion with a bound so loose it will always pass, rendering the test ineffective.

```python
def test_quantization_error():
    x = torch.randn(64, 64)
    qx = quantize_4bit(x)
    dx = dequantize_4bit(qx)
    error = (x - dx).abs().mean()
    assert error < 10.0
```

--------------------------------

### Ruff Format Example: Long Assert Statement

Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/agents/linting_guide.md

Demonstrates how 'ruff format' reformats a long assert statement with an f-string message by wrapping it. This highlights the difference between linting and formatting.

```python
# Before (fails ruff format):
assert err < threshold, f"Error {err:.6f} exceeds {threshold:.6f} + {N}*{std:.6f}"

# After (ruff format wraps it):
assert err < threshold, (
    f"Error {err:.6f} exceeds {threshold:.6f} + {N}*{std:.6f}"
)
```

--------------------------------

### Compile bitsandbytes from Source on Jetson

Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/docs/source/installation.mdx

For NVIDIA Jetson devices (L4T / JetPack), a source build on-device is required due to compatibility issues with standard PyPI wheels. Specify the device's compute capability during the CMake configuration.

```bash
cmake -DCOMPUTE_BACKEND=cuda -DCOMPUTE_CAPABILITY=87 .
make -j4
pip install .
```

--------------------------------

### List GitHub Issues by Label

Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/agents/github_tools_guide.md

Filters and lists open issues belonging to a specific category, such as 'Bug', 'Optimizers', or 'CUDA Setup'. This helps in organizing and reviewing issues by type.

```bash
python3 agents/query_issues.py list --label "Bug"
```

```bash
python3 agents/query_issues.py list --label "Optimizers"
```

```bash
python3 agents/query_issues.py list --label "CUDA Setup"
```

--------------------------------

### List Open Issues

Source: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/agents/issue_maintenance_guide.md

Use these commands to list open issues, including filtering by labels like 'Duplicate', 'Proposing to Close', 'Waiting for Info', 'Question', and 'Likely Not a BNB Issue'. Also lists unlabeled issues.

```bash
# All open issues
python3 agents/query_issues.py list

# Low-hanging fruit
python3 agents/query_issues.py list --label "Duplicate"
python3 agents/query_issues.py list --label "Proposing to Close"
python3 agents/query_issues.py list --label "Waiting for Info"
python3 agents/query_issues.py list --label "Question"
python3 agents/query_issues.py list --label "Likely Not a BNB Issue"
python3 agents/query_issues.py list --unlabeled
```