### Getting Parameter Bounds in CParameter Source: https://github.com/rogdham/pyzstd/blob/master/docs/pyzstd.md Shows how to retrieve the inclusive lower and upper bounds for a given CParameter using its bounds() method. This is useful for validating parameter values before use. ```python >>> CParameter.compressionLevel.bounds() (-131072, 22) >>> CParameter.windowLog.bounds() (10, 31) >>> CParameter.enableLongDistanceMatching.bounds() (0, 1) ``` -------------------------------- ### Get Zstd Frame Information and Size Source: https://context7.com/rogdham/pyzstd/llms.txt Demonstrates how to retrieve decompressed size and dictionary ID from a zstd frame header using `get_frame_info`, and the total compressed size of a frame using `get_frame_size`. The input must be a valid zstd frame. ```python from pyzstd import compress, get_frame_info, get_frame_size, CParameter raw = b"Inspect me!" * 500 compressed = compress(raw, {CParameter.checksumFlag: 1, CParameter.dictIDFlag: 0}) # Frame info from first 20 bytes (header is 6–18 bytes) info = get_frame_info(compressed[:20]) print(info) # frame_info(decompressed_size=5500, dictionary_id=0) print("Decompressed size:", info.decompressed_size) # 5500 print("Dictionary ID:", info.dictionary_id) # 0 # Total size of the first frame (requires the complete frame) size = get_frame_size(compressed) print("Frame size:", size) # equals len(compressed) for a single frame assert size == len(compressed) ``` -------------------------------- ### Train and Use Zstd Dictionaries Source: https://context7.com/rogdham/pyzstd/llms.txt Shows how to train a Zstandard dictionary from sample data and then use it for compressing and decompressing small, similar data items to improve compression ratios. Dictionaries are thread-safe. ```python import io import os import pyzstd from pyzstd import ZstdDict, compress, decompress, train_dict # --- Training a dictionary from sample files --- samples = [] for fname in os.listdir("/path/to/samples"): with open(os.path.join("/path/to/samples", fname), "rb") as f: samples.append(f.read()) zd = train_dict(samples, dict_size=100 * 1024) # 100 KiB dict # Save dictionary to file with open("mydict.zstd", "wb") as f: f.write(zd.dict_content) print(f"Dict ID: {zd.dict_id}, size: {len(zd)} bytes") # --- Loading and using a saved dictionary --- with open("mydict.zstd", "rb") as f: zd2 = ZstdDict(f.read()) small_data = b'{"user": "alice", "action": "login", "ts": 1700000000}' # Compress with dictionary (undigested by default, good for single use per compressor) compressed = compress(small_data, zstd_dict=zd2) # Compress with digested dict (cached; faster when reusing ZstdCompressor many times) compressed2 = compress(small_data, zstd_dict=zd2.as_digested_dict) # Decompress (always uses digested dict internally for speed) result = decompress(compressed, zstd_dict=zd2) assert result == small_data ``` -------------------------------- ### Get Frame Information with pyzstd Source: https://github.com/rogdham/pyzstd/blob/master/docs/pyzstd.md Use get_frame_info to retrieve decompressed size and dictionary ID from a zstd frame header. Ensure the buffer starts at the frame beginning and contains at least the header. ```python >>> pyzstd.get_frame_info(compressed_dat[:20]) frame_info(decompressed_size=687379, dictionary_id=1040992268) ``` -------------------------------- ### Get Zstd Library Version (Tuple) Source: https://github.com/rogdham/pyzstd/blob/master/docs/pyzstd.md Access the pyzstd.zstd_version_info variable to get the version of the underlying zstd library as a tuple. ```python >>> pyzstd.zstd_version_info (1, 4, 5) ``` -------------------------------- ### Using CParameter for Compression Options Source: https://github.com/rogdham/pyzstd/blob/master/docs/pyzstd.md Demonstrates how to define compression options using the CParameter enum and apply them with the compress() function or ZstdCompressor object. Ensure parameter values are within their valid bounds. ```python option = {CParameter.compressionLevel : 10, CParameter.checksumFlag : 1} # used with compress() function compressed_dat = compress(raw_dat, option) # used with ZstdCompressor object c = ZstdCompressor(level_or_option=option) compressed_dat1 = c.compress(raw_dat) compressed_dat2 = c.flush() ``` -------------------------------- ### open() Source: https://github.com/rogdham/pyzstd/blob/master/docs/pyzstd.md Opens a zstd-compressed file in binary or text mode, similar to Python's built-in bz2.open(), gzip.open(), and lzma.open(). ```APIDOC ## open(filename, mode='rb', level_or_option=None, zstd_dict=None, encoding=None, errors=None, newline=None) Open a zstd-compressed file in binary or text mode, returning a file object. This function is very similar to [bz2.open()](https://docs.python.org/3/library/bz2.html#bz2.open) / [gzip.open()](https://docs.python.org/3/library/gzip.html#gzip.open) / [lzma.open()](https://docs.python.org/3/library/lzma.html#lzma.open) functions in Python standard library. The *filename* parameter can be an existing [file object](https://docs.python.org/3/glossary.html#term-file-object) to wrap, or the name of the file to open (as a `str`, `bytes` or [path-like](https://docs.python.org/3/glossary.html#term-path-like-object) object). When wrapping an existing file object, the wrapped file will not be closed when the returned file object is closed. The *mode* parameter can be any of “r”, “rb”, “w”, “wb”, “x”, “xb”, “a” or “ab” for binary mode, or “rt”, “wt”, “xt”, or “at” for text mode. The default is “rb”. If in reading mode (decompression), the *level_or_option* parameter can only be a `dict` object, that represents decompression option. It doesn’t support `int` type compression level in this case. In binary mode, a [`ZstdFile`](#ZstdFile) object is returned. In text mode, a [`ZstdFile`](#ZstdFile) object is created, and wrapped in an [io.TextIOWrapper](https://docs.python.org/3/library/io.html#io.TextIOWrapper) object with the specified encoding, error handling behavior, and line ending(s). ``` -------------------------------- ### Get Zstd Library Version (String) Source: https://github.com/rogdham/pyzstd/blob/master/docs/pyzstd.md Access the pyzstd.zstd_version variable to get the version of the underlying zstd library as a string. ```python >>> pyzstd.zstd_version '1.4.5' ``` -------------------------------- ### Train and Finalize Zstd Dictionaries Source: https://context7.com/rogdham/pyzstd/llms.txt Illustrates training a Zstandard dictionary from an iterable of byte samples and finalizing a raw dictionary basis to tune it for a specific compression level. Ensure sample data is representative. ```python import os import pyzstd from pyzstd import train_dict, finalize_dict, ZstdDict # Collect samples def iter_samples(directory): for name in os.listdir(directory): path = os.path.join(directory, name) if os.path.isfile(path): with open(path, "rb") as f: yield f.read() # Train from scratch zd = train_dict(iter_samples("/data/samples"), dict_size=64 * 1024) print(f"Trained dict: id={zd.dict_id}, size={len(zd)}") # Finalize a raw-content basis into a proper dict tuned for level 3 basis_content = b"\x00" * 1024 # custom seed content raw_dict = ZstdDict(basis_content, is_raw=True) finalized = finalize_dict( raw_dict, iter_samples("/data/samples"), dict_size=64 * 1024, level=3, ) print(f"Finalized dict: id={finalized.dict_id}") ``` -------------------------------- ### Get Compression Level Values Source: https://github.com/rogdham/pyzstd/blob/master/docs/pyzstd.md Access pyzstd.compressionLevel_values to get a namedtuple containing the default, minimum, and maximum compression levels supported by the zstd library. ```python >>> pyzstd.compressionLevel_values # 131072 = 128*1024 values(default=3, min=-131072, max=22) ``` -------------------------------- ### Handling Compression Level vs. Options Source: https://github.com/rogdham/pyzstd/blob/master/docs/stdlib.md Demonstrates the change in how compression levels and options are passed. `pyzstd` used a single `level_or_option` parameter, while the standard library separates these into `level` and `options` parameters. ```python # before pyzstd.compress(data, 10) pyzstd.compress(data, level_or_option=10) # after zstd.compress(data, 10) zstd.compress(data, level=10) ``` ```python # before pyzstd.compress(data, {pyzstd.CParameter.checksumFlag: True}) pyzstd.compress(data, level_or_option={pyzstd.CParameter.checksumFlag: True}) # after zstd.compress(data, options={zstd.CompressionParameter.checksum_flag: True}) ``` -------------------------------- ### Get parameter bounds Source: https://github.com/rogdham/pyzstd/blob/master/docs/pyzstd.md The bounds() method of a DParameter enum member returns the inclusive lower and upper bounds for that parameter's value. ```python >>> DParameter.windowLogMax.bounds() (10, 31) ``` -------------------------------- ### Load and Use Zstd Dictionary Source: https://github.com/rogdham/pyzstd/blob/master/docs/pyzstd.md Demonstrates loading a zstd dictionary from a file and then using it for compression and decompression. Reusing compressor objects can improve performance when using a dictionary multiple times. ```python import io from pyzstd import ZstdDict, compress, decompress # load a zstd dictionary from file with io.open(dict_path, 'rb') as f: file_content = f.read() zd = ZstdDict(file_content) # use the dictionary to compress. # if use a dictionary for compressor multiple times, reusing # a compressor object is faster, see .as_undigested_dict doc. compressed_dat = compress(raw_dat, zstd_dict=zd) # use the dictionary to decompress decompressed_dat = decompress(compressed_dat, zstd_dict=zd) ``` -------------------------------- ### Decompression with Dictionary Source: https://github.com/rogdham/pyzstd/blob/master/docs/pyzstd.md Pyzstd uses a digested dictionary for decompression by default, which is faster when loading again. This example shows how to decompress data using a provided zstd dictionary. ```APIDOC ## Decompress with Dictionary ### Description Decompresses data using a provided zstd dictionary. Pyzstd defaults to using a digested dictionary for decompression, which offers performance benefits when the dictionary is reused. ### Method `decompress(dat, zstd_dict=zd)` ### Parameters - **dat** (bytes): The compressed data to decompress. - **zstd_dict** (ZstdDict): The zstd dictionary to use for decompression. Defaults to using a digested dictionary. ### Response - **decompressed_data** (bytes): The decompressed data. ``` -------------------------------- ### Inspect Compression and Decompression Parameter Bounds Source: https://context7.com/rogdham/pyzstd/llms.txt Shows how to inspect the valid ranges for compression and decompression parameters using the `bounds()` method. These parameters can be used to fine-tune compression and decompression behavior. ```python from pyzstd import compress, decompress, CParameter, DParameter, Strategy # Inspect available ranges print(CParameter.compressionLevel.bounds()) # (-131072, 22) print(CParameter.windowLog.bounds()) # (10, 31) print(DParameter.windowLogMax.bounds()) # (10, 31) raw = b"Repeated data " * 10_000 # Full option dict: strategy, long-distance matching, multi-thread, checksum option = { CParameter.compressionLevel: 19, CParameter.strategy: Strategy.btultra2, CParameter.enableLongDistanceMatching: 1, CParameter.windowLog: 27, # 128 MiB window CParameter.nbWorkers: 4, CParameter.checksumFlag: 1, CParameter.contentSizeFlag: 1, } compressed = compress(raw, option) # Decompress allowing a large window decomp_option = {DParameter.windowLogMax: 28} # up to 256 MiB result = decompress(compressed, option=decomp_option) assert result == raw ``` -------------------------------- ### Replace compress_stream with pyzstd.open Source: https://github.com/rogdham/pyzstd/blob/master/docs/deprecated.md Use `pyzstd.open` for stream compression instead of `compress_stream`. For more control, consider using `ZstdCompressor`. ```python # before with io.open(input_file_path, 'rb') as ifh: with io.open(output_file_path, 'wb') as ofh: compress_stream(ifh, ofh, level_or_option=5) ``` ```python # after with io.open(input_file_path, 'rb') as ifh: with pyzstd.open(output_file_path, 'w', level_or_option=5) as ofh: shutil.copyfileobj(ifh, ofh) ``` -------------------------------- ### Get Frame Size with pyzstd Source: https://github.com/rogdham/pyzstd/blob/master/docs/pyzstd.md Use get_frame_size to determine the total size of a zstd frame, including headers and checksums. The buffer must contain at least one complete frame. ```python >>> pyzstd.get_frame_size(compressed_dat) 252874 ``` -------------------------------- ### ZstdFile and open() Source: https://context7.com/rogdham/pyzstd/llms.txt Provides a file-like interface for reading and writing zstd-compressed files in binary or text mode, mirroring standard library file APIs. ```APIDOC ## `class ZstdFile` and `open()` File-like interface for reading and writing zstd-compressed files in binary or text mode. API mirrors `bz2.BZ2File` / `lzma.LZMAFile`. Supports `seek()`, `readline()`, iteration, and use as a `tarfile` stream. ```python import shutil import tarfile import pyzstd from pyzstd import ZstdFile, CParameter, DParameter COMPRESS_OPT = {CParameter.compressionLevel: 6, CParameter.checksumFlag: 1} # --- Write and read a binary zstd file --- with ZstdFile("data.zst", "w", level_or_option=COMPRESS_OPT) as f: f.write(b"line one\n") f.write(b"line two\n") f.flush() # flush block (data recoverable if interrupted) with ZstdFile("data.zst", "r") as f: for line in f: print(line) # b"line one\n", b"line two\n" # --- Text mode via pyzstd.open() --- with pyzstd.open("text.zst", "wt", level_or_option=6, encoding="utf-8") as f: f.write("Hello from text mode\n") with pyzstd.open("text.zst", "rt", encoding="utf-8") as f: print(f.read()) # "Hello from text mode\n" # --- Use with tarfile --- with ZstdFile("archive.tar.zst", "w", level_or_option=COMPRESS_OPT) as fobj: with tarfile.open(fileobj=fobj, mode="w") as tar: tar.add("/path/to/dir", arcname="dir") with ZstdFile("archive.tar.zst", "r") as fobj: with tarfile.open(fileobj=fobj) as tar: tar.extractall("/output/dir") ``` ``` -------------------------------- ### Streaming Compression with ZstdCompressor Source: https://github.com/rogdham/pyzstd/blob/master/docs/pyzstd.md Demonstrates traditional streaming compression using the ZstdCompressor object. Data is fed in chunks, and flush methods are used to manage frame and block boundaries. ```python c = ZstdCompressor() # traditional streaming compression dat1 = c.compress(b'123456') dat2 = c.compress(b'abcdef') dat3 = c.flush() # use .compress() method with mode argument compressed_dat1 = c.compress(raw_dat1, c.FLUSH_BLOCK) compressed_dat2 = c.compress(raw_dat2, c.FLUSH_FRAME) ``` -------------------------------- ### ZstdDecompressor.__init__ Source: https://github.com/rogdham/pyzstd/blob/master/docs/pyzstd.md Initializes a ZstdDecompressor object for streaming decompression. It can optionally use a pre-trained dictionary or advanced decompression parameters. ```APIDOC ## __init__(self, zstd_dict=None, option=None) ### Description Initialize a ZstdDecompressor object. ### Parameters #### Parameters - **zstd_dict** ([*ZstdDict*](#ZstdDict)) – Pre-trained dictionary for decompression. - **option** (*dict*) – A `dict` object that contains [advanced decompression parameters](#dparameter). The default value `None` means to use zstd’s default decompression parameters. ``` -------------------------------- ### compress() Source: https://github.com/rogdham/pyzstd/blob/master/docs/pyzstd.md Compresses data using the Zstandard algorithm. It supports various compression levels and options, as well as pre-trained dictionaries. ```APIDOC ## compress(data, level_or_option=None, zstd_dict=None) ### Description Compress *data*, return the compressed data. Compressing `b''` will get an empty content frame (9 bytes or more). ### Parameters * **data** (*bytes-like object*) – Data to be compressed. * **level_or_option** (*int* *or* *dict*) – When it’s an `int` object, it represents [compression level](#id14). When it’s a `dict` object, it contains [advanced compression parameters](#cparameter). The default value `None` means to use zstd’s default compression level/parameters. * **zstd_dict** ([*ZstdDict*](#ZstdDict)) – Pre-trained dictionary for compression. ### Returns Compressed data ### Return type bytes ### Example ```python # int compression level compressed_dat = compress(raw_dat, 10) # dict option, use 6 threads to compress, and append a 4-byte checksum. option = { CParameter.compressionLevel : 10, CParameter.nbWorkers : 6, CParameter.checksumFlag : 1 } compressed_dat = compress(raw_dat, option) ``` ``` -------------------------------- ### Using Dictionary as Prefix Source: https://github.com/rogdham/pyzstd/blob/master/docs/pyzstd.md Load a dictionary as a prefix for compression or decompression. This is useful for patching scenarios and is compatible with long-distance matching. ```APIDOC ## Load Dictionary as Prefix ### Description Loads a dictionary's content to be used as a prefix for compression or decompression. This feature is particularly useful for implementing patching engines and is compatible with Zstandard's long-distance matching. ### Method `zstd_dict.as_prefix` ### Usage Pass `zstd_dict.as_prefix` as the `zstd_dict` argument to `compress()` or `decompress()`. ### Notes - Prefix is compatible with "long distance matching", while a standard dictionary is not. - A prefix is only effective for the first frame. After that, the compressor/decompressor reverts to a no-prefix state. This differs from a dictionary, which can be used for all subsequent frames. Exercise caution when using with `ZstdFile` or `SeekableZstdFile`. - When decompressing, the same prefix used during compression must be employed. - Loading a prefix into a compressor can be computationally expensive. - Loading a prefix into a decompressor is not costly. ``` -------------------------------- ### Advanced stream compression with ZstdCompressor Source: https://github.com/rogdham/pyzstd/blob/master/docs/deprecated.md This alternative shows how to use `ZstdCompressor` for more granular control over stream compression, including optional input size pledging and progress callbacks. ```python # after: more complex alternative with io.open(input_file_path, 'rb') as ifh: with io.open(output_file_path, 'wb') as ofh: compressor = ZstdCompressor(level_or_option=5) compressor._set_pledged_input_size(pledged_input_size) # optional while data := ifh.read(read_size): ofh.write(compressor.compress(data)) callback_progress(ifh.tell(), ofh.tell()) # optional ofh.write(compressor.flush()) ``` -------------------------------- ### Compress and Decompress Multi-Frame Streams Source: https://context7.com/rogdham/pyzstd/llms.txt Demonstrates how to compress data into multiple frames and then decompress the concatenated stream. Ensure the decompressor is aware of frame boundaries. ```python from pyzstd import ZstdCompressor, EndlessZstdDecompressor # Build two concatenated frames c = ZstdCompressor() frame1 = c.compress(b"Frame ONE data", ZstdCompressor.FLUSH_FRAME) frame2 = c.compress(b"Frame TWO data", ZstdCompressor.FLUSH_FRAME) stream = frame1 + frame2 # Decompress the entire multi-frame stream d = EndlessZstdDecompressor() output_parts = [] chunk_size = 32 pos = 0 while pos < len(stream) or not d.at_frame_edge: if d.needs_input: chunk = stream[pos:pos + chunk_size] pos += chunk_size else: chunk = b"" out = d.decompress(chunk, max_length=256) if out: output_parts.append(out) assert d.at_frame_edge, "Stream ended in an incomplete frame" print(b"".join(output_parts)) # b"Frame ONE dataFrame TWO data" ``` -------------------------------- ### Strategy Source: https://github.com/rogdham/pyzstd/blob/master/docs/pyzstd.md Available compression strategies. ```APIDOC ### Strategy(IntEnum) Attributes of [`Strategy`](#Strategy) class: [`fast`](#Strategy.fast), [`dfast`](#Strategy.dfast), [`greedy`](#Strategy.greedy), [`lazy`](#Strategy.lazy), [`lazy2`](#Strategy.lazy2), [`btlazy2`](#Strategy.btlazy2), [`btopt`](#Strategy.btopt), [`btultra`](#Strategy.btultra), [`btultra2`](#Strategy.btultra2). ``` -------------------------------- ### ZstdFile Initialization Source: https://github.com/rogdham/pyzstd/blob/master/docs/pyzstd.md Initializes a ZstdFile object for reading, writing, exclusive creation, or appending compressed data. It can wrap an existing file object or open a file by name. ```APIDOC ## ZstdFile.__init__ ### Description Initializes a ZstdFile object for compressed file operations. ### Parameters - **filename**: (str, bytes, path-like object, file object) The path to the file or an existing file object to wrap. - **mode**: (str) The mode to open the file in ('r', 'w', 'x', 'a', 'rb', 'wb', 'xb', 'ab'). Defaults to 'r'. - **level_or_option**: (int or None) Compression level or specific zstd option. - **zstd_dict**: (ZstdDict or None) A dictionary for dictionary compression. ``` -------------------------------- ### Create .tar.zst Archive with pyzstd CLI Source: https://context7.com/rogdham/pyzstd/llms.txt Creates a .tar.zst archive from a directory using the pyzstd command-line interface, specifying compression level and threads. ```bash python -m pyzstd --tar-input-dir /my/project -o project.tar.zst -l 6 -t 2 ``` -------------------------------- ### Train Zstandard Dictionary with pyzstd CLI Source: https://context7.com/rogdham/pyzstd/llms.txt Trains a zstd dictionary from files matching a pattern, specifying the maximum dictionary size and output file. ```bash python -m pyzstd --train "data/**/*.json" --maxdict 102400 -o mydict.zstd ``` -------------------------------- ### Generate and Apply Zstd Patch Source: https://context7.com/rogdham/pyzstd/llms.txt Demonstrates using Zstd as a binary patch generator. `ZstdDict.as_prefix` is used to create a patch from a previous version of a file, and the patch can be applied using the same prefix. ```python from pyzstd import compress, decompress, ZstdDict, CParameter, DParameter VER_1 = b"Version 1 content..." * 1000 VER_2 = b"Version 2 content..." * 1000 # slightly different # --- Generate patch --- v1_prefix = ZstdDict(VER_1, is_raw=True) window_log = max(len(VER_1), len(VER_2)).bit_length() window_log = min(window_log, 31) # clamp to valid range for 64-bit patch_option = { CParameter.windowLog: window_log, CParameter.enableLongDistanceMatching: 1, } PATCH = compress(VER_2, level_or_option=patch_option, zstd_dict=v1_prefix.as_prefix) print(f"Patch size: {len(PATCH)} bytes (vs {len(VER_2)} raw bytes)") ``` -------------------------------- ### compress(data, level_or_option=None, zstd_dict=None) Source: https://context7.com/rogdham/pyzstd/llms.txt Performs one-shot compression on a bytes-like object. Supports integer compression levels or a dictionary of CParameter options for fine-grained control. Returns the compressed bytes. ```APIDOC ## compress(data, level_or_option=None, zstd_dict=None) ### Description One-shot compression of a bytes-like object. `level_or_option` can be an integer compression level (1–22, negatives for faster/worse, 0 for default) or a `dict` of `CParameter` keys for fine-grained control. Returns compressed `bytes`. ### Parameters #### Path Parameters None #### Query Parameters None #### Request Body None ### Request Example ```python import pyzstd from pyzstd import compress, decompress, CParameter raw = b"Hello, Zstandard!" * 1000 # Simple compression at default level compressed = compress(raw) print(len(raw), "->", len(compressed)) # e.g. 17000 -> 54 # Integer level compressed_l10 = compress(raw, 10) # Advanced options: level 10, 4 worker threads, append checksum option = { CParameter.compressionLevel: 10, CParameter.nbWorkers: 4, CParameter.checksumFlag: 1, } compressed_mt = compress(raw, option) # Verify round-trip assert decompress(compressed_mt) == raw ``` ### Response #### Success Response (200) - **compressed_data** (bytes) - The compressed bytes. ``` -------------------------------- ### CParameter and DParameter Source: https://context7.com/rogdham/pyzstd/llms.txt Enumerate advanced compression (`CParameter`) and decompression (`DParameter`) options. Both expose a `bounds()` method to inspect the valid range for each parameter. ```APIDOC ## `class CParameter(IntEnum)` and `class DParameter(IntEnum)` `CParameter` enumerates all advanced compression parameters (passed as a `dict` to `compress()`, `ZstdCompressor`, `ZstdFile`, etc.). `DParameter` enumerates decompression parameters. Both expose a `bounds()` method. ### Usage Example: ```python from pyzstd import compress, decompress, CParameter, DParameter, Strategy # Inspect available ranges print(CParameter.compressionLevel.bounds()) # (-131072, 22) print(CParameter.windowLog.bounds()) # (10, 31) print(DParameter.windowLogMax.bounds()) # (10, 31) raw = b"Repeated data " * 10_000 # Full option dict: strategy, long-distance matching, multi-thread, checksum option = { CParameter.compressionLevel: 19, CParameter.strategy: Strategy.btultra2, CParameter.enableLongDistanceMatching: 1, CParameter.windowLog: 27, # 128 MiB window CParameter.nbWorkers: 4, CParameter.checksumFlag: 1, CParameter.contentSizeFlag: 1, } compressed = compress(raw, option) # Decompress allowing a large window decomp_option = {DParameter.windowLogMax: 28} # up to 256 MiB result = decompress(compressed, option=decomp_option) assert result == raw ``` ### Methods: - `bounds()`: Returns a tuple `(min, max)` representing the valid range for the parameter. ``` -------------------------------- ### Write and Read Seekable Zstd File Source: https://context7.com/rogdham/pyzstd/llms.txt Demonstrates creating a seekable Zstd file with specified frame sizes and performing random reads using SeekableZstdFile. Ensure the input file exists. ```python from pyzstd import SeekableZstdFile, ZstdFile, CParameter IN_FILE = "large_input.bin" SEEKABLE_FILE = "large.seekable.zst" # Write a seekable zstd file with 10 MiB frames with ZstdFile(IN_FILE, "r") as src: with SeekableZstdFile( SEEKABLE_FILE, "w", level_or_option={CParameter.compressionLevel: 3}, max_frame_content_size=10 * 1024 * 1024, # 10 MiB per frame ) as dst: while chunk := src.read(30 * 1024 * 1024): dst.write(chunk) # Verify format print(SeekableZstdFile.is_seekable_format_file(SEEKABLE_FILE)) # True # Random seek: read 100 bytes at offset 50 MiB (no full decompression needed) with SeekableZstdFile(SEEKABLE_FILE, "r") as f: f.seek(50 * 1024 * 1024) data = f.read(100) frames, c_size, d_size = f.seek_table_info print(f"Frames: {frames}, compressed: {c_size}, decompressed: {d_size}") ``` -------------------------------- ### ZstdDict Class Source: https://context7.com/rogdham/pyzstd/llms.txt Demonstrates how to train, save, load, and use Zstd dictionaries for improved compression of similar data. ```APIDOC ## `class ZstdDict` Represents a pre-trained zstd dictionary. Dictionaries dramatically improve compression ratio on small, similar data items (e.g., JSON records, HTTP headers). Thread-safe and shareable between multiple compressor/decompressor objects. ```python import io import os import pyzstd from pyzstd import ZstdDict, compress, decompress, train_dict # --- Training a dictionary from sample files --- samples = [] for fname in os.listdir("/path/to/samples"): with open(os.path.join("/path/to/samples", fname), "rb") as f: samples.append(f.read()) zd = train_dict(samples, dict_size=100 * 1024) # 100 KiB dict # Save dictionary to file with open("mydict.zstd", "wb") as f: f.write(zd.dict_content) print(f"Dict ID: {zd.dict_id}, size: {len(zd)} bytes") # --- Loading and using a saved dictionary --- with open("mydict.zstd", "rb") as f: zd2 = ZstdDict(f.read()) small_data = b'{"user": "alice", "action": "login", "ts": 1700000000}' # Compress with dictionary (undigested by default, good for single use per compressor) compressed = compress(small_data, zstd_dict=zd2) # Compress with digested dict (cached; faster when reusing ZstdCompressor many times) compressed2 = compress(small_data, zstd_dict=zd2.as_digested_dict) # Decompress (always uses digested dict internally for speed) result = decompress(compressed, zstd_dict=zd2) assert result == small_data ``` ``` -------------------------------- ### Compress File Using a Dictionary with pyzstd CLI Source: https://context7.com/rogdham/pyzstd/llms.txt Compresses a file using a pre-trained zstd dictionary with the pyzstd command-line interface. ```bash python -m pyzstd -c smallfile.json -D mydict.zstd -o smallfile.json.zst ``` -------------------------------- ### ZstdDict.as_prefix Source: https://context7.com/rogdham/pyzstd/llms.txt Enables Zstandard's binary patching capability. A `ZstdDict` can be converted to a prefix using `as_prefix` for generating compact patches between file versions. Applying the patch requires the same prefix. ```APIDOC ## Patching Engine: `ZstdDict.as_prefix` Zstd can act as a binary patch generator. Pass version 1 of a file as a "prefix" (`ZstdDict.as_prefix`) during compression of version 2, producing a compact patch. Applying the patch requires the same prefix. ### Usage Example: ```python from pyzstd import compress, decompress, ZstdDict, CParameter, DParameter VER_1 = b"Version 1 content..." * 1000 VER_2 = b"Version 2 content..." * 1000 # slightly different # --- Generate patch --- v1_prefix = ZstdDict(VER_1, is_raw=True) window_log = max(len(VER_1), len(VER_2)).bit_length() window_log = min(window_log, 31) # clamp to valid range for 64-bit patch_option = { CParameter.windowLog: window_log, CParameter.enableLongDistanceMatching: 1, } PATCH = compress(VER_2, level_or_option=patch_option, zstd_dict=v1_prefix.as_prefix) print(f"Patch size: {len(PATCH)} bytes (vs {len(VER_2)} raw bytes)") ``` ### Methods: - `as_prefix()`: Converts the `ZstdDict` into a format suitable for use as a prefix in patch generation. ``` -------------------------------- ### Read and Write Zstd Files with ZstdFile Source: https://context7.com/rogdham/pyzstd/llms.txt Provides a file-like interface for reading and writing zstd-compressed files in binary or text mode, mirroring `bz2.BZ2File` and `lzma.LZMAFile`. Supports seek, readline, and iteration. ```python import shutil import tarfile import pyzstd from pyzstd import ZstdFile, CParameter, DParameter COMPRESS_OPT = {CParameter.compressionLevel: 6, CParameter.checksumFlag: 1} # --- Write and read a binary zstd file --- with ZstdFile("data.zst", "w", level_or_option=COMPRESS_OPT) as f: f.write(b"line one\n") f.write(b"line two\n") f.flush() # flush block (data recoverable if interrupted) with ZstdFile("data.zst", "r") as f: for line in f: print(line) # b"line one\n", b"line two\n" # --- Text mode via pyzstd.open() --- with pyzstd.open("text.zst", "wt", level_or_option=6, encoding="utf-8") as f: f.write("Hello from text mode\n") with pyzstd.open("text.zst", "rt", encoding="utf-8") as f: print(f.read()) # "Hello from text mode\n" # --- Use with tarfile --- with ZstdFile("archive.tar.zst", "w", level_or_option=COMPRESS_OPT) as fobj: with tarfile.open(fileobj=fobj, mode="w") as tar: tar.add("/path/to/dir", arcname="dir") with ZstdFile("archive.tar.zst", "r") as fobj: with tarfile.open(fileobj=fobj) as tar: tar.extractall("/output/dir") ``` -------------------------------- ### Apply Patch with pyzstd Source: https://github.com/rogdham/pyzstd/blob/master/docs/pyzstd.md Decompress a patch using VER_1 as the prefix. Explicitly set DParameter.windowLogMax to allow for larger windows if necessary, as the actual windowLog is determined by the frame header. Failure to use the correct prefix will result in a ZstdError. ```python # use VER_1 as prefix v1 = ZstdDict(VER_1, is_raw=True) # allow large window, the actual windowLog is from frame header. option = {DParameter.windowLogMax: 31} # get VER_2 from (VER_1 + PATCH) VER_2 = decompress(PATCH, zstd_dict=v1.as_prefix, option=option) ``` -------------------------------- ### Apply Patch with Decompression Options Source: https://context7.com/rogdham/pyzstd/llms.txt Applies a patch using decompression with specific window log maximum options. Ensure the patch and dictionary are correctly defined. ```python decomp_option = {DParameter.windowLogMax: 31} RECOVERED = decompress(PATCH, zstd_dict=v1_prefix.as_prefix, option=decomp_option) assert RECOVERED == VER_2 print("Patch applied successfully") ``` -------------------------------- ### Replace decompress_stream with pyzstd.open Source: https://github.com/rogdham/pyzstd/blob/master/docs/deprecated.md Use `pyzstd.open` for stream decompression instead of `decompress_stream`. For more control, consider using `EndlessZstdDecompressor`. ```python # before with io.open(input_file_path, 'rb') as ifh: with io.open(output_file_path, 'wb') as ofh: decompress_stream(ifh, ofh) # after with pyzstd.open(input_file_path) as ifh: with io.open(output_file_path, 'wb') as ofh: shutil.copyfileobj(ifh, ofh) ``` -------------------------------- ### Train Dictionary Source: https://github.com/rogdham/pyzstd/blob/master/docs/pyzstd.md Trains a Zstandard dictionary from a collection of samples. This allows for more efficient compression of similar data. ```APIDOC ## Train Dictionary ### Description Trains a Zstandard dictionary from a provided set of samples. This function is useful for creating custom dictionaries that can improve compression ratios for specific types of data. ### Method `train_dict(samples, dict_size)` ### Parameters - **samples** (*iterable*): An iterable where each element is a bytes-like object representing a sample file. - **dict_size** (*int*): The maximum desired size of the trained dictionary in bytes. ### Returns - A trained `ZstdDict` object. The dictionary content can be accessed via the `dict_content` attribute for saving to a file. ### Example ```python import pyzstd import os import io def samples(): rootdir = r"E:\data" for parent, dirnames, filenames in os.walk(rootdir): for filename in filenames: path = os.path.join(parent, filename) with io.open(path, 'rb') as f: dat = f.read() yield dat dic = pyzstd.train_dict(samples(), 100*1024) ``` ``` -------------------------------- ### Generate Patch with pyzstd Source: https://github.com/rogdham/pyzstd/blob/master/docs/pyzstd.md Use VER_1 as a prefix dictionary to compress VER_2, enabling long distance matching and setting an appropriate window log. This generates a small patch. Ensure windowLog is clamped to the valid range. ```python # use VER_1 as prefix v1 = ZstdDict(VER_1, is_raw=True) # let the window cover the longest version. # don't forget to clamp windowLog to valid range. # enable "long distance matching". windowLog = max(len(VER_1), len(VER_2)).bit_length() option = {CParameter.windowLog: windowLog, CParameter.enableLongDistanceMatching: 1} # get a small PATCH PATCH = compress(VER_2, level_or_option=option, zstd_dict=v1.as_prefix) ``` -------------------------------- ### Set Compression Strategy and Checksum Flag Source: https://github.com/rogdham/pyzstd/blob/master/docs/pyzstd.md Use this snippet to set a specific compression strategy (e.g., Strategy.lazy2) and enable the checksum flag when compressing data. This is useful for fine-tuning compression performance and integrity. ```python option = {CParameter.strategy : Strategy.lazy2, CParameter.checksumFlag : 1} compressed_dat = compress(raw_dat, option) ``` -------------------------------- ### Advanced stream decompression with EndlessZstdDecompressor Source: https://github.com/rogdham/pyzstd/blob/master/docs/deprecated.md This alternative demonstrates using `EndlessZstdDecompressor` for advanced stream decompression, handling potential input needs and progress callbacks, and includes error checking for incomplete frames. ```python # after: more complex alternative with io.open(input_file_path, 'rb') as ifh: with io.open(output_file_path, 'wb') as ofh: decompressor = EndlessZstdDecompressor() while True: if decompressor.needs_input: data = input_stream.read(read_size) if not data: break else: data = b"") ofh.write(decompressor.decompress(data, write_size)) callback_progress(ifh.tell(), ofh.tell()) # optional if not decompressor.at_frame_edge: raise ValueError("zstd data ends in an incomplete frame") ``` -------------------------------- ### ZstdFile Common Methods and Properties Source: https://github.com/rogdham/pyzstd/blob/master/docs/pyzstd.md Methods and properties available in both reading and writing modes for ZstdFile objects. ```APIDOC ## ZstdFile Common Operations ### Description Common methods and properties applicable to ZstdFile in both reading and writing modes. ### Methods and Properties - **close()**: Closes the ZstdFile, flushing any buffered data. - **tell()**: Returns the current position in the uncompressed data stream. - **fileno()**: Returns the file descriptor of the underlying file object, if available. - **closed**: A property indicating whether the file is closed (True) or open (False). - **writable()**: Returns True if the file was opened in a writeable mode, False otherwise. - **readable()**: Returns True if the file was opened in a readable mode, False otherwise. - **seekable()**: Returns True if the file supports seeking, False otherwise. ``` -------------------------------- ### Compress File with pyzstd CLI Source: https://context7.com/rogdham/pyzstd/llms.txt Compresses a file using the pyzstd command-line interface, specifying compression level, number of threads, and checksum. ```bash python -m pyzstd -c input.bin -o output.zst -l 10 -t 4 ``` -------------------------------- ### One-shot Compression with pyzstd Source: https://context7.com/rogdham/pyzstd/llms.txt Perform simple or advanced one-shot compression using the `compress` function. Supports integer levels, dictionaries, and fine-grained control via `CParameter` options like worker threads and checksums. ```python import pyzstd from pyzstd import compress, decompress, CParameter raw = b"Hello, Zstandard!" * 1000 # Simple compression at default level compressed = compress(raw) print(len(raw), "->", len(compressed)) # e.g. 17000 -> 54 # Integer level compressed_l10 = compress(raw, 10) # Advanced options: level 10, 4 worker threads, append checksum option = { CParameter.compressionLevel: 10, CParameter.nbWorkers: 4, CParameter.checksumFlag: 1, } compressed_mt = compress(raw, option) # Verify round-trip assert decompress(compressed_mt) == raw ``` -------------------------------- ### Access Zstd Library Version and Compression Level Bounds Source: https://context7.com/rogdham/pyzstd/llms.txt Shows how to access module-level variables like `zstd_version`, `zstd_version_info`, `compressionLevel_values`, and `zstd_support_multithread` for introspection. These provide runtime information about the underlying zstd library. ```python import pyzstd print(pyzstd.zstd_version) # e.g. '1.5.7' print(pyzstd.zstd_version_info) # e.g. (1, 5, 7) print(pyzstd.compressionLevel_values) # values(default=3, min=-131072, max=22) print(pyzstd.zstd_support_multithread) # True (almost always) # Use bounds dynamically lv = pyzstd.compressionLevel_values print(f"Level range: {lv.min} to {lv.max}, default: {lv.default}") ``` -------------------------------- ### get_frame_info and get_frame_size Source: https://context7.com/rogdham/pyzstd/llms.txt Functions to inspect Zstandard frame headers without full decompression. `get_frame_info` retrieves the decompressed size and dictionary ID, while `get_frame_size` returns the total compressed byte length of a frame. ```APIDOC ## `get_frame_info(frame_buffer)` and `get_frame_size(frame_buffer)` Inspect a zstd frame header without decompressing: retrieve the stored decompressed size and dictionary ID (`get_frame_info`), or the total compressed byte length of a complete frame (`get_frame_size`). ### Usage Example: ```python from pyzstd import compress, get_frame_info, get_frame_size, CParameter raw = b"Inspect me!" * 500 compressed = compress(raw, {CParameter.checksumFlag: 1, CParameter.dictIDFlag: 0}) # Frame info from first 20 bytes (header is 6–18 bytes) info = get_frame_info(compressed[:20]) print(info) # frame_info(decompressed_size=5500, dictionary_id=0) print("Decompressed size:", info.decompressed_size) # 5500 print("Dictionary ID:", info.dictionary_id) # 0 # Total size of the first frame (requires the complete frame) size = get_frame_size(compressed) print("Frame size:", size) # equals len(compressed) for a single frame assert size == len(compressed) ``` ### Parameters: - `frame_buffer` (bytes): A buffer containing at least the Zstandard frame header. ``` -------------------------------- ### Compress Data with Advanced Options Source: https://github.com/rogdham/pyzstd/blob/master/docs/pyzstd.md Compresses data using a dictionary of advanced compression parameters, including compression level, number of workers, and checksum flag. Ensure raw_dat is bytes-like. ```python option = { CParameter.compressionLevel : 10, CParameter.nbWorkers : 6, CParameter.checksumFlag : 1 } compressed_dat = compress(raw_dat, option) ``` -------------------------------- ### Importing zstd Module Based on Python Version Source: https://github.com/rogdham/pyzstd/blob/master/docs/stdlib.md This code snippet shows how to import the zstd module, using the standard library `compression.zstd` for Python 3.14+ and `backports.zstd` for older versions. ```python import pyzstd import sys if sys.version_info >= (3, 14): from compression import zstd else: from backports import zstd ``` -------------------------------- ### compressionLevel_values Source: https://github.com/rogdham/pyzstd/blob/master/docs/pyzstd.md Defines the available compression level values, including default, minimum, and maximum. ```APIDOC ### compressionLevel_values A 3-item namedtuple, values defined by the underlying zstd library, see [compression level](#id14) for details. `default` is default compression level, it is used when compression level is set to `0` or not set. `min`/`max` are minimum/maximum available values of compression level, both inclusive. ```python >>> pyzstd.compressionLevel_values # 131072 = 128*1024 values(default=3, min=-131072, max=22) ``` ``` -------------------------------- ### Compress Data with Integer Level Source: https://github.com/rogdham/pyzstd/blob/master/docs/pyzstd.md Compresses data using a specified integer compression level. Ensure raw_dat is bytes-like. ```python compressed_dat = compress(raw_dat, 10) ``` -------------------------------- ### zstd_version_info Source: https://github.com/rogdham/pyzstd/blob/master/docs/pyzstd.md Provides the version of the underlying zstd library as a tuple. ```APIDOC ### zstd_version_info Underlying zstd library’s version, `tuple` form. ```python >>> pyzstd.zstd_version_info (1, 4, 5) ``` ``` -------------------------------- ### Renamed ZstdCompressor Method Source: https://github.com/rogdham/pyzstd/blob/master/docs/stdlib.md The `_set_pledged_input_size` method in `ZstdCompressor` has been renamed to `set_pledged_input_size` in the standard library. ```python # before # ZstdCompressor._set_pledged_input_size # after # ZstdCompressor.set_pledged_input_size ```