### Install InstantTensor via Pip Source: https://github.com/scitix/instanttensor/blob/main/README.md Install the InstantTensor package using pip for quick setup. ```bash pip install instanttensor ``` -------------------------------- ### Install Documentation Requirements Source: https://github.com/scitix/instanttensor/blob/main/docs/build_doc.md Install the necessary Python packages for Sphinx and type hint support using pip. ```bash pip install sphinx sphinx-autodoc-typehints ``` -------------------------------- ### Build InstantTensor from Source Source: https://github.com/scitix/instanttensor/blob/main/README.md Clone the repository, checkout submodules, and install InstantTensor locally. A debug build can be enabled by setting the DEBUG environment variable. ```bash git clone https://github.com/scitix/InstantTensor.git cd InstantTensor ./checkout_submodules.sh pip install . # For a debug build, set "DEBUG=1" before "pip" ``` -------------------------------- ### Configure InstantTensor via Environment Variables Source: https://context7.com/scitix/instanttensor/llms.txt Tune InstantTensor's I/O behavior and performance using environment variables without modifying code. Examples include enabling GDS, changing chunk sizes, and adjusting concurrency. ```bash # Enable GPUDirect Storage (cuFile) backend — requires GDS-capable hardware export INSTANTTENSOR_USE_CUFILE=1 # Enable io_uring backend instead of libaio export INSTANTTENSOR_USE_URING=1 # Disable O_DIRECT (direct I/O) — useful for benchmarking or non-aligned reads export INSTANTTENSOR_DIRECT_IO=0 # Override I/O chunk size (bytes) — default: 8MB for disk, 2MB for tmpfs export INSTANTTENSOR_CHUNK_SIZE=16777216 # 16 MB # Override number of concurrent I/O threads export INSTANTTENSOR_CONCURRENCY=8 # Override async I/O queue depth per thread export INSTANTTENSOR_IO_DEPTH=64 # Override GPU ring buffer size (bytes) export INSTANTTENSOR_BUFFER_SIZE=1073741824 # 1 GB # Fraction of free GPU memory to use for I/O staging buffers (default: 0.5) export INSTANTTENSOR_MAX_FREE_MEM_USAGE=0.7 # Enable verbose timing output export INSTANTTENSOR_DEBUG=1 # Then run normally python my_model_loader.py ``` -------------------------------- ### Initialize NCCL Process Group for Distributed Training Source: https://context7.com/scitix/instanttensor/llms.txt Initializes the NCCL process group for distributed training. Ensure the LOCAL_RANK environment variable is set correctly. This setup is crucial for distributed operations. ```python local_rank = int(os.environ.get("LOCAL_RANK", 0)) torch.cuda.set_device(local_rank) dist.init_process_group(backend="nccl") world_size = dist.get_world_size() global_rank = dist.get_rank() tp = 4 # Tensor Parallel degree pp = 2 # Pipeline Parallel degree # Create TP sub-groups — each group of `tp` ranks loads independently tp_group_id = global_rank // tp tp_group = dist.new_group(ranks=[tp_group_id * tp + i for i in range(tp)]) ``` -------------------------------- ### Get Tensor Metadata with safe_open.get_tensor_metadata() Source: https://context7.com/scitix/instanttensor/llms.txt Retrieve dtype and shape for a named tensor without loading its data. Useful for pre-allocating buffers or inspecting model structure before full loading. ```python from instanttensor import safe_open import torch # load_now=False reads only file metadata — no GPU data transfer occurs f = safe_open("model.safetensors", framework="pt", device=0, load_now=False) tensor_names = f.keys() for name in tensor_names[:5]: dtype, shape = f.get_tensor_metadata(name) num_params = torch.Size(shape).numel() size_mb = num_params * torch.empty([], dtype=dtype).element_size() / 1024**2 print(f"{name}: dtype={dtype}, shape={shape}, size={size_mb:.2f} MB") # model.layers.0.self_attn.q_proj.weight: dtype=torch.bfloat16, shape=torch.Size([4096, 4096]), size=32.00 MB # Pre-allocate all destination tensors (useful for TP/PP sharding) tensors = { name: torch.empty(shape, dtype=dtype, device="cuda") for name in tensor_names for dtype, shape in [f.get_tensor_metadata(name)] } ``` -------------------------------- ### Generate HTML Documentation Source: https://github.com/scitix/instanttensor/blob/main/docs/build_doc.md Navigate to the docs directory and run the make html command to generate the HTML documentation. The output will be located in the build/html/ directory. ```bash cd docs make html ``` -------------------------------- ### init() Source: https://context7.com/scitix/instanttensor/llms.txt Explicitly initializes the C++ backend for InstantTensor. This can be called early to avoid lazy initialization overhead on the first operation. ```APIDOC ## init() ### Description Initializes the underlying C++ runtime. Called automatically on first `safe_open()` use, but can be invoked explicitly to control initialization timing. ### Method `init()` ### Request Example ```python import instanttensor instanttensor.init() from instanttensor import safe_open with safe_open("model.safetensors", framework="pt", device=0) as f: tensors = {name: tensor.clone() for name, tensor in f.tensors()} ``` ``` -------------------------------- ### Load Safetensors with `safe_open` (Single File) Source: https://context7.com/scitix/instanttensor/llms.txt Use `safe_open` as a context manager to load tensors from a single Safetensors file directly to a specified GPU device. Remember to clone the tensor immediately as the internal buffer is reused. ```python from instanttensor import safe_open # --- Single-file loading --- tensors = {} with safe_open("model.safetensors", framework="pt", device=0) as f: for name, tensor in f.tensors(): # IMPORTANT: tensor points to an internal reusable buffer — clone immediately tensors[name] = tensor.clone() print(f"Loaded {len(tensors)} tensors") # Example output: Loaded 242 tensors ``` -------------------------------- ### safe_open - High-performance context manager for loading Safetensors files to GPU Source: https://context7.com/scitix/instanttensor/llms.txt The primary API for opening Safetensors files and streaming tensor data directly to GPU memory. It can be used as a context manager for single-file, multi-file, or distributed loading, and supports advanced performance tuning. ```APIDOC ## safe_open ### Description Opens one or more Safetensors files, reads their metadata, and streams tensor data directly to GPU memory with maximum I/O throughput. It is used as a context manager and exposes tensor iteration, key enumeration, and metadata access methods. ### Usage #### Single-file loading ```python from instanttensor import safe_open tensors = {} with safe_open("model.safetensors", framework="pt", device=0) as f: for name, tensor in f.tensors(): # IMPORTANT: tensor points to an internal reusable buffer — clone immediately tensors[name] = tensor.clone() print(f"Loaded {len(tensors)} tensors") ``` #### Multi-file loading ```python from instanttensor import safe_open files = [ "model-00001-of-00002.safetensors", "model-00002-of-00002.safetensors", ] tensors = {} with safe_open(files, framework="pt", device=0) as f: for name, tensor in f.tensors(): tensors[name] = tensor.clone() print(f"Loaded {len(tensors)} tensors from {len(files)} files") ``` #### Distributed loading with torch.distributed NCCL process group ```python import torch import torch.distributed as dist from instanttensor import safe_open dist.init_process_group(backend="nccl") process_group = dist.GroupMember.WORLD files = [ "model-00001-of-00002.safetensors", "model-00002-of-00002.safetensors", ] tensors = {} with safe_open( files, framework="pt", device=torch.cuda.current_device(), process_group=process_group, ) as f: for name, tensor in f.tensors(): tensors[name] = tensor.clone() # All ranks now hold a copy of the weights on their respective GPUs ``` #### Advanced: manual performance tuning ```python from instanttensor import safe_open files = [ "model-00001-of-00002.safetensors", "model-00002-of-00002.safetensors", ] tensors = {} with safe_open( files, framework="pt", device=0, buffer_size=512 * 1024 * 1024, # 512 MB GPU buffer chunk_size=8 * 1024 * 1024, # 8 MB I/O chunks concurrency=4, # 4 concurrent I/O threads io_depth=32, # 32 queued async I/O operations max_free_mem_usage=0.6, # use up to 60% of free GPU memory ) as f: for name, tensor in f.tensors(): tensors[name] = tensor.clone() ``` ### Parameters - **files** (str or list[str]): Path to the Safetensors file(s). - **framework** (str): The framework to use (e.g., "pt" for PyTorch). - **device** (int or torch.device): The GPU device to load tensors onto. - **process_group** (torch.distributed.ProcessGroup, optional): The distributed process group for NCCL-based loading. - **buffer_size** (int, optional): The size of the GPU buffer in bytes. - **chunk_size** (int, optional): The size of I/O chunks in bytes. - **concurrency** (int, optional): The number of concurrent I/O threads. - **io_depth** (int, optional): The depth of queued asynchronous I/O operations. - **max_free_mem_usage** (float, optional): The maximum fraction of free GPU memory to use. - **load_now** (bool, optional): If True, load tensors immediately. Defaults to True. ``` -------------------------------- ### Explicit C++ Backend Initialization with init() Source: https://context7.com/scitix/instanttensor/llms.txt Initializes the underlying C++ runtime explicitly. Useful for controlling initialization timing to avoid latency on the first `safe_open()` call. ```python import instanttensor # Explicit early initialization — avoids lazy init overhead on first load instanttensor.init() # Subsequent safe_open calls will not pay the initialization cost from instanttensor import safe_open with safe_open("model.safetensors", framework="pt", device=0) as f: tensors = {name: tensor.clone() for name, tensor in f.tensors()} ``` -------------------------------- ### Load Safetensors with `safe_open` (Multi-File) Source: https://context7.com/scitix/instanttensor/llms.txt Load tensors from multiple Safetensors files using `safe_open`. This approach allows for higher throughput due to coordinated I/O planning. Ensure to clone tensors immediately. ```python files = [ "model-00001-of-00002.safetensors", "model-00002-of-00002.safetensors", ] tensors = {} with safe_open(files, framework="pt", device=0) as f: for name, tensor in f.tensors(): tensors[name] = tensor.clone() print(f"Loaded {len(tensors)} tensors from {len(files)} files") ``` -------------------------------- ### Load Multiple Safetensors Files Source: https://github.com/scitix/instanttensor/blob/main/README.md Load tensors from multiple Safetensors files by passing a list of filenames to `safe_open`. This approach allows for optimized read planning and higher throughput compared to individual file loads. ```python from instanttensor import safe_open files = ["model-00001-of-00002.safetensors", "model-00002-of-00002.safetensors"] tensors = {} with safe_open(files, framework="pt", device=0) as f: for name, tensor in f.tensors(): tensors[name] = tensor.clone() ``` -------------------------------- ### List Tensor Names with `safe_open.keys()` Source: https://context7.com/scitix/instanttensor/llms.txt Retrieve a list of all tensor names from a Safetensors file, sorted by their byte offset, using `safe_open.keys()` (or `offset_keys()`). This order is necessary for correct sequential streaming when using `get_tensor()`. ```python from instanttensor import safe_open with safe_open("model.safetensors", framework="pt", device=0, load_now=False) as f: keys = f.keys() # same as offset_keys() print(f"Total tensors: {len(keys)}") print("First 5 tensor names:", keys[:5]) # ['model.embed_tokens.weight', 'model.layers.0.input_layernorm.weight', ...] ``` -------------------------------- ### Load Single Safetensors File Source: https://github.com/scitix/instanttensor/blob/main/README.md Use `safe_open` to load tensors from a single Safetensors file into PyTorch and copy them to the specified device. Ensure tensors are copied immediately to avoid data overwrites due to buffer reuse. ```python from instanttensor import safe_open tensors = {} with safe_open("model.safetensors", framework="pt", device=0) as f: for name, tensor in f.tensors(): tensors[name] = tensor.clone() ``` -------------------------------- ### safe_open.keys() / safe_open.offset_keys() - List all tensor names in file order Source: https://context7.com/scitix/instanttensor/llms.txt Returns tensor names sorted by their byte offset in the file. This order is required for correct sequential streaming behavior when using `get_tensor()`. ```APIDOC ## safe_open.keys() / safe_open.offset_keys() ### Description Returns tensor names sorted by their byte offset in the file. Using `get_tensor()` in this order is required for correct sequential streaming behavior. ### Usage ```python from instanttensor import safe_open with safe_open("model.safetensors", framework="pt", device=0, load_now=False) as f: keys = f.keys() # same as offset_keys() print(f"Total tensors: {len(keys)}") print("First 5 tensor names:", keys[:5]) # ['model.embed_tokens.weight', 'model.layers.0.input_layernorm.weight', ...] ``` ### Returns - list[str]: A list of tensor names sorted by their file offset. ``` -------------------------------- ### Advanced Performance Tuning with `safe_open` Source: https://context7.com/scitix/instanttensor/llms.txt Manually tune performance parameters like buffer size, chunk size, concurrency, and I/O depth when using `safe_open` for maximum I/O throughput. Ensure to clone tensors immediately. ```python files = [ "model-00001-of-00002.safetensors", "model-00002-of-00002.safetensors", ] with safe_open( files, framework="pt", device=0, buffer_size=512 * 1024 * 1024, # 512 MB GPU buffer chunk_size=8 * 1024 * 1024, # 8 MB I/O chunks concurrency=4, # 4 concurrent I/O threads io_depth=32, # 32 queued async I/O operations max_free_mem_usage=0.6, # use up to 60% of free GPU memory ) as f: for name, tensor in f.tensors(): tensors[name] = tensor.clone() ``` -------------------------------- ### Distributed Safetensors Loading with NCCL Source: https://context7.com/scitix/instanttensor/llms.txt Load Safetensors files across multiple GPUs using `safe_open` with a PyTorch distributed process group. Each rank loads its portion of the weights onto its respective GPU. ```python import torch import torch.distributed as dist from instanttensor import safe_open dist.init_process_group(backend="nccl") process_group = dist.GroupMember.WORLD files = [ "model-00001-of-00002.safetensors", "model-00002-of-00002.safetensors", ] tensors = {} with safe_open( files, framework="pt", device=torch.cuda.current_device(), process_group=process_group, ) as f: for name, tensor in f.tensors(): tensors[name] = tensor.clone() # All ranks now hold a copy of the weights on their respective GPUs ``` -------------------------------- ### Read File-Level Metadata with safe_open.metadata() Source: https://context7.com/scitix/instanttensor/llms.txt Access the `__metadata__` dictionary from the Safetensors header, which may contain information like model format or provenance. Returns None if no metadata is present. ```python from instanttensor import safe_open with safe_open("model.safetensors", framework="pt", device=0) as f: meta = f.metadata() if meta: print("File metadata:", meta) # e.g. {'format': 'pt', 'transformers_version': '4.40.0'} else: print("No file-level metadata found") ``` -------------------------------- ### Distributed Tensor Loading with PyTorch Source: https://github.com/scitix/instanttensor/blob/main/README.md Use this snippet to load tensors in a distributed manner using torch.distributed and NCCL. Ensure torch.distributed is initialized before use. ```python import torch import torch.distributed as dist from instanttensor import safe_open dist.init_process_group(backend="nccl") process_group = dist.GroupMember.WORLD files = ["model-00001-of-00002.safetensors", "model-00002-of-00002.safetensors"] tensors = {} with safe_open(files, framework="pt", device=torch.cuda.current_device(), process_group=process_group) as f: for name, tensor in f.tensors(): tensors[name] = tensor.clone() ``` -------------------------------- ### Load Sharded Tensors with InstantTensor Source: https://context7.com/scitix/instanttensor/llms.txt Loads sharded tensors using InstantTensor, distributing the loading process across Tensor Parallel groups. Each rank loads only its assigned shard, optimizing memory usage and load times. Requires `safe_open` from `instanttensor` and a `torch.distributed` process group. ```python files = [f"model-{i:05d}-of-00008.safetensors" for i in range(1, 9)] tensors = {} with safe_open( files, framework="pt", device=torch.cuda.current_device(), process_group=tp_group, # load within each TP group independently ) as f: for name, tensor in f.tensors(): # Each rank loads only its TP shard tp_rank = global_rank % tp shard_size = tensor.shape[0] // tp shard = tensor[tp_rank * shard_size : (tp_rank + 1) * shard_size].clone() tensors[name] = shard print(f"[Rank {global_rank}] Loaded {len(tensors)} tensor shards") ``` -------------------------------- ### safe_open.metadata() Source: https://context7.com/scitix/instanttensor/llms.txt Reads file-level metadata from the Safetensors header, such as model format or provenance information. ```APIDOC ## safe_open.metadata() ### Description Returns the `__metadata__` dictionary stored in the Safetensors file header (e.g., model format, provenance info), or `None` if absent. ### Method `metadata()` ### Request Example ```python from instanttensor import safe_open with safe_open("model.safetensors", framework="pt", device=0) as f: meta = f.metadata() if meta: print("File metadata:", meta) else: print("No file-level metadata found") ``` ### Response #### Success Response - **metadata** (dict or None) - A dictionary containing file-level metadata, or None if no metadata is present. ``` -------------------------------- ### Detect In-Memory Filesystem with file_in_memory() Source: https://context7.com/scitix/instanttensor/llms.txt Determines if a given file path resides on an in-memory filesystem like tmpfs or ramfs. InstantTensor uses this to optimize I/O operations. ```python from instanttensor._impl import file_in_memory path = "/dev/shm/model.safetensors" # tmpfs on Linux if file_in_memory(path): print("File is in memory — using optimized in-memory I/O path") else: print("File is on disk — using libaio/cuFile I/O path") ``` -------------------------------- ### Low-Level Header Reading with read_safetensors_metadata() Source: https://context7.com/scitix/instanttensor/llms.txt Reads the raw Safetensors file header without loading tensor data. Returns file metadata, a dictionary of tensor metadata, and the header size in bytes. ```python from instanttensor._impl import read_safetensors_metadata file_meta, tensor_meta, header_size = read_safetensors_metadata("model.safetensors") print(f"Header size: {header_size} bytes") print(f"File metadata: {file_meta}") print(f"Number of tensors: {len(tensor_meta)}") # Inspect a single tensor's raw metadata entry name, entry = next(iter(tensor_meta.items())) print(f"{name}: {entry}") # model.embed_tokens.weight: {'dtype': 'BF16', 'shape': [32000, 4096], 'data_offsets': [0, 262144000]} ``` -------------------------------- ### Iterate and Clone Tensors with `safe_open.tensors()` Source: https://context7.com/scitix/instanttensor/llms.txt Use the `tensors()` generator method within the `safe_open` context to iterate over tensor names and their corresponding GPU buffers. It's crucial to clone each tensor immediately as the internal buffer is reused. ```python from instanttensor import safe_open model_weights = {} with safe_open("model.safetensors", framework="pt", device=0) as f: for name, tensor in f.tensors(): print(f"{name}: shape={tensor.shape}, dtype={tensor.dtype}, device={tensor.device}") # e.g. model.layers.0.self_attn.q_proj.weight: shape=torch.Size([4096, 4096]), dtype=torch.bfloat16, device=cuda:0 # Must copy — internal buffer is reused on the next iteration model_weights[name] = tensor.clone() # tensors() can only be called once per safe_open instance ``` -------------------------------- ### Distributed Loading with Sub-Groups (TP/PP Parallelism) Source: https://context7.com/scitix/instanttensor/llms.txt Load weights independently for each parallelism group using a `torch.distributed` sub-group, suitable for Tensor Parallel (TP) or Pipeline Parallel (PP) inference. ```python import os import torch import torch.distributed as dist from instanttensor import safe_open ``` -------------------------------- ### safe_open.tensors() - Iterate over all tensors in loaded file(s) Source: https://context7.com/scitix/instanttensor/llms.txt A generator method on the `safe_open` context manager that yields `(name, tensor)` pairs in file-offset order. The yielded tensor resides in an internal GPU buffer and must be copied immediately. ```APIDOC ## safe_open.tensors() ### Description Generator method on the `safe_open` context manager object that yields `(name, tensor)` pairs in file-offset order. The tensor lives in an internal GPU buffer that is reused across iterations — it must be copied immediately via `.clone()` or `.copy_()`. ### Usage ```python from instanttensor import safe_open model_weights = {} with safe_open("model.safetensors", framework="pt", device=0) as f: for name, tensor in f.tensors(): print(f"{name}: shape={tensor.shape}, dtype={tensor.dtype}, device={tensor.device}") # e.g. model.layers.0.self_attn.q_proj.weight: shape=torch.Size([4096, 4096]), dtype=torch.bfloat16, device=cuda:0 # Must copy — internal buffer is reused on the next iteration model_weights[name] = tensor.clone() # tensors() can only be called once per safe_open instance ``` ### Returns - Generator yielding `(name, tensor)` pairs. ``` -------------------------------- ### read_safetensors_metadata() Source: https://context7.com/scitix/instanttensor/llms.txt A low-level function to read the raw Safetensors file header without loading any tensor data. ```APIDOC ## read_safetensors_metadata() ### Description Reads the raw Safetensors file header without loading any tensor data. Returns `(file_metadata, tensor_metadata_dict, header_size_bytes)`. ### Method `read_safetensors_metadata(filename: str)` ### Parameters #### Path Parameters - **filename** (str) - Required - The path to the Safetensors file. ### Request Example ```python from instanttensor._impl import read_safetensors_metadata file_meta, tensor_meta, header_size = read_safetensors_metadata("model.safetensors") print(f"Header size: {header_size} bytes") print(f"File metadata: {file_meta}") print(f"Number of tensors: {len(tensor_meta)}") name, entry = next(iter(tensor_meta.items())) print(f"{name}: {entry}") ``` ### Response #### Success Response - **file_metadata** (dict) - Dictionary containing file-level metadata. - **tensor_metadata_dict** (dict) - Dictionary mapping tensor names to their metadata. - **header_size_bytes** (int) - The size of the header in bytes. ``` -------------------------------- ### file_in_memory() Source: https://context7.com/scitix/instanttensor/llms.txt Detects if a given file path resides on a temporary filesystem (tmpfs/ramfs). ```APIDOC ## file_in_memory() ### Description Returns `True` if the given file path is on an in-memory filesystem. InstantTensor uses this internally to select the optimal I/O path. ### Method `file_in_memory(path: str)` ### Parameters #### Path Parameters - **path** (str) - Required - The path to the file to check. ### Request Example ```python from instanttensor._impl import file_in_memory path = "/dev/shm/model.safetensors" if file_in_memory(path): print("File is in memory — using optimized in-memory I/O path") else: print("File is on disk — using libaio/cuFile I/O path") ``` ### Response #### Success Response - **result** (bool) - True if the file is on an in-memory filesystem, False otherwise. ``` -------------------------------- ### safe_open.get_tensor_metadata() Source: https://context7.com/scitix/instanttensor/llms.txt Retrieves the dtype and shape for a named tensor from the file header without loading the tensor data. This is useful for pre-allocating buffers or inspecting model structure. ```APIDOC ## safe_open.get_tensor_metadata() ### Description Returns the `(torch.dtype, torch.Size)` tuple for a named tensor from the file header. Useful for pre-allocating destination buffers or inspecting the model structure before loading. ### Method `get_tensor_metadata(name: str)` ### Parameters #### Path Parameters - **name** (str) - Required - The name of the tensor to retrieve metadata for. ### Request Example ```python from instanttensor import safe_open import torch f = safe_open("model.safetensors", framework="pt", device=0, load_now=False) tensor_names = f.keys() for name in tensor_names[:5]: dtype, shape = f.get_tensor_metadata(name) num_params = torch.Size(shape).numel() size_mb = num_params * torch.empty([], dtype=dtype).element_size() / 1024**2 print(f"{name}: dtype={dtype}, shape={shape}, size={size_mb:.2f} MB") ``` ### Response #### Success Response - **dtype** (torch.dtype) - The data type of the tensor. - **shape** (torch.Size) - The shape of the tensor. ``` === COMPLETE CONTENT === This response contains all available snippets from this library. No additional content exists. Do not make further requests.