### Install Tools with Mise Source: https://github.com/darvid/python-hyperscan/blob/main/CLAUDE.md Installs development tools using mise. Ensure mise is installed and configured before running. ```bash mise install ``` -------------------------------- ### Clone and Install Hyperscan from Source Source: https://github.com/darvid/python-hyperscan/blob/main/docs/index.md Standard workflow for cloning the repository, setting up a virtual environment, and installing the package from source. This process compiles the extension and vendors the scanning engine. ```shell git clone https://github.com/darvid/python-hyperscan.git cd python-hyperscan python -m venv .venv source .venv/bin/activate # .\.venv\Scripts\activate on Windows pip install --upgrade pip build[uv] pip install . ``` -------------------------------- ### Verify Hyperscan Installation Source: https://github.com/darvid/python-hyperscan/blob/main/docs/index.md Run this command to verify the installation by printing the Hyperscan engine information. This confirms the bundled engine version and features. ```shell python - <<'PY' import hyperscan print(hyperscan.Database().info()) PY ``` -------------------------------- ### Install Python Dependencies with UV Source: https://github.com/darvid/python-hyperscan/blob/main/CLAUDE.md Synchronizes Python dependencies using uv. Use --no-editable and --no-install-project for a clean installation. ```bash uv sync --no-editable --no-install-project ``` -------------------------------- ### Install Python Hyperscan Source: https://github.com/darvid/python-hyperscan/blob/main/README.md Install the python-hyperscan package using pip. No external Hyperscan/Vectorscan library installation is required as it's statically linked. ```shell pip install hyperscan ``` -------------------------------- ### Link Directories Setup Source: https://github.com/darvid/python-hyperscan/blob/main/CMakeLists.txt Sets up link directories for Hyperscan build artifacts. ```cmake link_directories(${hyperscan_BINARY_DIR}) link_directories(${hyperscan_BINARY_DIR}/lib) ``` -------------------------------- ### Include Directories Setup Source: https://github.com/darvid/python-hyperscan/blob/main/CMakeLists.txt Sets up include directories for Hyperscan and its components. Includes directories for Hyperscan source, chimera, and optionally vectorscan. ```cmake include_directories(${hyperscan_SOURCE_DIR}/src) include_directories(${hyperscan_SOURCE_DIR}/chimera) if(USE_VECTORSCAN) include_directories(${hyperscan_PREFIX_DIR}) endif() ``` -------------------------------- ### Install Development Dependencies with UV Source: https://github.com/darvid/python-hyperscan/blob/main/CLAUDE.md Synchronizes only development dependencies using uv. Use --only-dev for installing packages listed in development groups. ```bash uv sync --only-dev --no-editable --no-install-project ``` -------------------------------- ### Install Ragel on Windows via MSYS2 Source: https://github.com/darvid/python-hyperscan/blob/main/CMakeLists.txt Installs the Ragel package using MSYS2's pacman on Windows if Hyperscan is being built from source and Ragel is not found. It includes updating MSYS2 packages first. ```cmake if(WIN32) # if Windows, expect MSYS2 to be installed at C:/msys64 or abort # this should be the case on Windows Server GitHub Actions runners # simply use the MSYS2/MinGW package for ragel since we just need # the binary if(NOT EXISTS C:/msys64) message(FATAL_ERROR "MSYS2 not found at C:/msys64") endif() set(BASH_PATH C:/msys64/usr/bin/bash.exe) execute_process( COMMAND ${BASH_PATH} -c "/usr/bin/pacman -Syuu --noconfirm" RESULT_VARIABLE MSYS2_UPDATE_RESULT ) if(MSYS2_UPDATE_RESULT) message(FATAL_ERROR "Failed to update MSYS2 packages") endif() execute_process( COMMAND ${BASH_PATH} -c "/usr/bin/pacman -S --noconfirm mingw-w64-x86_64-ragel" RESULT_VARIABLE MSYS2_RAGEL_INSTALL_RESULT ) if(MSYS2_RAGEL_INSTALL_RESULT) message(FATAL_ERROR "Failed to install ragel") endif() set(RAGEL_EXECUTABLE C:/msys64/mingw64/bin/ragel.exe CACHE PATH "Ragel executable" FORCE) set(RAGEL ${RAGEL_EXECUTABLE}) set(RAGEL_FOUND TRUE) message(STATUS "Ragel executable: ${RAGEL_EXECUTABLE}") else() # prerequisites (in addition to a build toolchain): autoconf, kelbt find_program(AUTORECONF autoreconf REQUIRED) find_program(KELBT kelbt REQUIRED) ExternalProject_Add( ragel GIT_REPOSITORY ${RAGEL_REPO} GIT_TAG ragel-${RAGEL_VERSION} BUILD_IN_SOURCE TRUE CONFIGURE_COMMAND ${AUTORECONF} -f -i COMMAND ./configure --prefix=${CMAKE_BINARY_DIR} --disable-manual BUILD_COMMAND make -j4 INSTALL_COMMAND "" ) set(RAGEL_EXECUTABLE ${ragel_BINARY_DIR}/bin/ragel CACHE PATH "Ragel executable" FORCE) set(RAGEL_VERSION ${RAGEL_VERSION}) set(RAGEL_FOUND TRUE) set(RAGEL_FOUND TRUE) endif() ``` -------------------------------- ### Install Python Hyperscan Extension Target Source: https://github.com/darvid/python-hyperscan/blob/main/CMakeLists.txt Defines the installation rules for the Python Hyperscan extension target. It specifies the destination directory and component for installation. ```cmake install( TARGETS ${HS_EXT_NAME} LIBRARY DESTINATION hyperscan COMPONENT hyperscan ) ``` -------------------------------- ### Format C Code Source: https://github.com/darvid/python-hyperscan/blob/main/CLAUDE.md Formats the C source file 'src/hyperscan/extension.c' in-place using clang-format. Ensure clang-format is installed and in your PATH. ```bash clang-format -i src/hyperscan/extension.c ``` -------------------------------- ### Lint Python Code with Ruff Source: https://github.com/darvid/python-hyperscan/blob/main/CLAUDE.md Lints Python code in the 'src/' directory using ruff and automatically fixes linting issues. Ensure ruff is installed. ```bash ruff check src/ --fix ``` -------------------------------- ### Hyperscan Pattern Flags Reference and Usage Source: https://context7.com/darvid/python-hyperscan/llms.txt Illustrates Hyperscan's pattern flags that modify compilation and matching behavior. Flags can be combined using bitwise OR and applied per-pattern or globally. This example demonstrates combining flags for case-insensitive, multiline, and dotall matching. ```python import hyperscan # Flag constants and their effects flags_example = { 'HS_FLAG_CASELESS': hyperscan.HS_FLAG_CASELESS, # Case-insensitive matching 'HS_FLAG_DOTALL': hyperscan.HS_FLAG_DOTALL, # Dot matches newlines 'HS_FLAG_MULTILINE': hyperscan.HS_FLAG_MULTILINE, # ^ and $ match line boundaries 'HS_FLAG_SINGLEMATCH': hyperscan.HS_FLAG_SINGLEMATCH,# Report only first match per pattern 'HS_FLAG_ALLOWEMPTY': hyperscan.HS_FLAG_ALLOWEMPTY, # Allow patterns that match empty strings 'HS_FLAG_UTF8': hyperscan.HS_FLAG_UTF8, # Enable UTF-8 mode 'HS_FLAG_UCP': hyperscan.HS_FLAG_UCP, # Unicode character properties 'HS_FLAG_PREFILTER': hyperscan.HS_FLAG_PREFILTER, # Prefilter mode for complex patterns 'HS_FLAG_SOM_LEFTMOST': hyperscan.HS_FLAG_SOM_LEFTMOST, # Report start-of-match offset } # Example: Combine multiple flags db = hyperscan.Database() db.compile( expressions=[b'hello', b'^world$', b'foo.bar'], ids=[1, 2, 3], flags=[ hyperscan.HS_FLAG_CASELESS, # 'HELLO' matches hyperscan.HS_FLAG_MULTILINE | hyperscan.HS_FLAG_CASELESS, # Match at line start hyperscan.HS_FLAG_DOTALL | hyperscan.HS_FLAG_SOM_LEFTMOST, # Dot matches newline ] ) matches = [] def on_match(pattern_id, from_offset, to_offset, flags, context): matches.append((pattern_id, from_offset, to_offset)) return None db.scan(b'HELLO\nworld\nfoo\nbar', match_event_handler=on_match) print(matches) # Matches 'HELLO' (caseless), 'world' at line start, and 'foo\nbar' (dotall) ``` -------------------------------- ### Compile Hyperscan Database and Scan with Threads Source: https://context7.com/darvid/python-hyperscan/llms.txt Demonstrates compiling a Hyperscan database with multiple patterns and scanning data concurrently using threads. Each thread requires its own scratch space for independent matching. ```python import hyperscan from concurrent.futures import ThreadPoolExecutor # Create and compile database db = hyperscan.Database() db.compile(expressions=[b'pattern1', b'pattern2'], ids=[1, 2]) # Create primary scratch space primary_scratch = hyperscan.Scratch(db) # Clone scratch for each thread def scan_in_thread(data, thread_id): # Each thread needs its own scratch space thread_scratch = primary_scratch.clone() matches = [] def on_match(pattern_id, from_offset, to_offset, flags, context): matches.append((thread_id, pattern_id, from_offset, to_offset)) return None db.scan(data, match_event_handler=on_match, scratch=thread_scratch) return matches # Run scans in parallel with ThreadPoolExecutor(max_workers=4) as executor: futures = [ executor.submit(scan_in_thread, b'test pattern1 data', i) for i in range(4) ] results = [f.result() for f in futures] print(results) # Each thread finds matches independently with its own scratch space ``` -------------------------------- ### Build Wheels with Cibuildwheel Source: https://github.com/darvid/python-hyperscan/blob/main/CLAUDE.md Builds binary wheels for the project using cibuildwheel. This command is typically run in a CI environment for cross-platform compatibility. ```bash cibuildwheel --platform linux ``` -------------------------------- ### Compile Database with Extended Parameters Source: https://github.com/darvid/python-hyperscan/blob/main/docs/usage.md Compiles a Hyperscan database using extended parameters, such as minimum offset for matches. Uses the ExpressionExt helper tuple. ```python db.compile( expressions=[b'foobar'], flags=hyperscan.HS_FLAG_SOM_LEFTMOST, ext=[ hyperscan.ExpressionExt( flags=hyperscan.HS_EXT_FLAG_MIN_OFFSET, min_offset=12 ) ], ) ``` -------------------------------- ### Build Source Distribution Source: https://github.com/darvid/python-hyperscan/blob/main/CLAUDE.md Builds the source distribution (sdist) of the Python package using pyproject-build via uvx. The --verbose flag provides detailed output. ```bash uvx --from build pyproject-build --installer=uv --sdist --verbose ``` -------------------------------- ### Compile Patterns into Database - Python Hyperscan Source: https://context7.com/darvid/python-hyperscan/llms.txt Demonstrates compiling regular expression patterns into a Hyperscan database in block mode. Supports custom IDs and flags like case-insensitivity and start-of-match reporting. ```python import hyperscan # Create a database in block mode (default) db = hyperscan.Database() # Define patterns with ids and flags patterns = [ (b'fo+', 0, 0), # Match 'fo', 'foo', 'fooo', etc. (b'^foobar$', 1, hyperscan.HS_FLAG_CASELESS), # Case-insensitive anchored match (b'BAR', 2, hyperscan.HS_FLAG_CASELESS | hyperscan.HS_FLAG_SOM_LEFTMOST), ] expressions, ids, flags = zip(*patterns) # Compile the patterns into the database db.compile( expressions=expressions, ids=ids, elements=len(patterns), flags=flags ) # Get database information print(db.info().decode()) # Output: Version: 5.4.12 Features: AVX2 Mode: BLOCK # Get database size in bytes print(f"Database size: {db.size()} bytes") ``` -------------------------------- ### Preview Git-cliff Changelog Range Source: https://github.com/darvid/python-hyperscan/blob/main/docs/releases.md Use this command to preview the changelog range that git-cliff will generate before a release. It requires specifying the previous release tag, configuration file, and the next version tag. ```shell git cliff ..HEAD --config cliff.toml --tag v --output release-notes.md ``` -------------------------------- ### Configure Hyperscan/VectorScan Source Source: https://github.com/darvid/python-hyperscan/blob/main/CMakeLists.txt Sets up the Hyperscan or VectorScan source repository and version based on the operating system. For Windows, it defaults to Hyperscan and requires a Visual Studio generator. For other systems, it defaults to VectorScan. ```cmake if(WIN32) # Ensure we're using MSVC on Windows if(NOT CMAKE_GENERATOR MATCHES "Visual Studio") message(FATAL_ERROR "On Windows, only MSVC/Visual Studio generators are supported for building Python extensions") endif() set(USE_VECTORSCAN FALSE) set(HYPERSCAN_VERSION 5.4.2) set(HYPERSCAN_TAG v5.4.2) set(HYPERSCAN_REPO https://github.com/intel/hyperscan.git) message(STATUS "Using Hyperscan ${HYPERSCAN_VERSION} from ${HYPERSCAN_REPO}") else() set(HYPERSCAN_VERSION 5.4.12) set(HYPERSCAN_TAG vectorscan/5.4.12) set(HYPERSCAN_REPO https://github.com/VectorCamp/vectorscan.git) message(STATUS "Using VectorScan ${HYPERSCAN_VERSION} from ${HYPERSCAN_REPO}") endif() ``` -------------------------------- ### Define Hyperscan CMake Arguments Source: https://github.com/darvid/python-hyperscan/blob/main/CMakeLists.txt This sets up CMake arguments for Hyperscan, specifically enabling static libraries for Boost and the main build. ```cmake set( HS_CMAKE_ARGS -DBOOST_USE_STATIC_LIBS=ON -DBUILD_STATIC_LIBS=ON ) ``` -------------------------------- ### Initialize Streaming Mode Database Source: https://github.com/darvid/python-hyperscan/blob/main/docs/usage.md Initializes a Hyperscan database specifically for streaming mode. This is required before using the Database.stream method. ```python db = hyperscan.Database(mode=hyperscan.HS_MODE_STREAM) ``` -------------------------------- ### Block Mode Scanning with Database.scan() - Python Hyperscan Source: https://context7.com/darvid/python-hyperscan/llms.txt Shows how to use the `scan()` method for matching patterns against a complete text block. It utilizes a callback function to process each match found, with an option to halt scanning by returning a truthy value. ```python import hyperscan # Create and compile database db = hyperscan.Database() db.compile( expressions=[b'foo', b'bar', b'baz'], ids=[1, 2, 3], flags=hyperscan.HS_FLAG_SOM_LEFTMOST # Enable start-of-match reporting ) # Store matches in a list matches = [] def on_match(pattern_id, from_offset, to_offset, flags, context): """ Callback invoked for each match. Args: pattern_id: The ID assigned to the matching pattern from_offset: Start offset (0 if SOM not enabled) to_offset: End offset of the match flags: Match flags context: User-provided context object Returns: None to continue scanning, truthy value to halt """ matches.append({ 'id': pattern_id, 'start': from_offset, 'end': to_offset, 'text': context[from_offset:to_offset] if context else None }) return None # Continue scanning # Scan text block text = b'hello foo and bar and baz world' db.scan(text, match_event_handler=on_match, context=text) print(matches) # Output: [{'id': 1, 'start': 6, 'end': 9, 'text': b'foo'}, # {'id': 2, 'start': 14, 'end': 17, 'text': b'bar'}, # {'id': 3, 'start': 22, 'end': 25, 'text': b'baz'}] ``` -------------------------------- ### Import Hyperscan Libraries Source: https://github.com/darvid/python-hyperscan/blob/main/CMakeLists.txt Defines and imports static libraries for Hyperscan, including runtime and utility libraries. It checks for the existence of these libraries if Hyperscan is not being built from source. ```cmake set(HS_LIBS hs hs_runtime chimera pcre) set(HS_BUILD_BYPRODUCTS) foreach(lib ${HS_LIBS}) add_library(${lib} STATIC IMPORTED) if(WIN32) set(object_name "${lib}${CMAKE_STATIC_LIBRARY_SUFFIX}") else() set(object_name "lib${lib}${CMAKE_STATIC_LIBRARY_SUFFIX}") endif() set(object_path "${HS_BUILD_LIB_ROOT}/${object_name}") list(APPEND HS_BUILD_BYPRODUCTS "${object_path}") if(NOT HS_BUILD_REQUIRED AND NOT EXISTS "${object_path}") message(FATAL_ERROR "${object_name} not found at ${HS_BUILD_LIB_ROOT}") endif() set_target_properties(${lib} PROPERTIES IMPORTED_LOCATION "${object_path}") endforeach() ``` -------------------------------- ### Full Lint Workflow Source: https://github.com/darvid/python-hyperscan/blob/main/CLAUDE.md Performs a full code quality check by linting and formatting Python code. This command combines ruff check and black formatting, matching CI behavior. ```bash ruff check --fix src/ && black src/ ``` -------------------------------- ### Extended Pattern Parameters with ExpressionExt Source: https://context7.com/darvid/python-hyperscan/llms.txt Control pattern matching with `ExpressionExt`, specifying minimum/maximum offsets, minimum length, and approximate matching using edit or Hamming distance. ```python import hyperscan db = hyperscan.Database() matches = [] def on_match(pattern_id, from_offset, to_offset, flags, context): matches.append((pattern_id, from_offset, to_offset)) return None # Min offset: only match after position 12 db.compile( expressions=[b'foobar'], flags=hyperscan.HS_FLAG_SOM_LEFTMOST, ext=[hyperscan.ExpressionExt( flags=hyperscan.HS_EXT_FLAG_MIN_OFFSET, min_offset=12 )] ) matches.clear() db.scan(b'foobarfoobar', match_event_handler=on_match) print(f"Min offset matches: {matches}") # [(0, 6, 12)] - second 'foobar' only # Max offset: only match before position 6 db.compile( expressions=[b'foobar'], flags=hyperscan.HS_FLAG_SOM_LEFTMOST, ext=[hyperscan.ExpressionExt( flags=hyperscan.HS_EXT_FLAG_MAX_OFFSET, max_offset=6 )] ) matches.clear() db.scan(b'foobarfoobar', match_event_handler=on_match) print(f"Max offset matches: {matches}") # [(0, 0, 6)] - first 'foobar' only # Min length: require at least 3 characters for 'fo+' db.compile( expressions=[b'fo+'], flags=hyperscan.HS_FLAG_SOM_LEFTMOST, ext=[hyperscan.ExpressionExt( flags=hyperscan.HS_EXT_FLAG_MIN_LENGTH, min_length=3 )] ) matches.clear() db.scan(b'fo', match_event_handler=on_match) print(f"'fo' matches: {matches}") # [] - no match, too short matches.clear() db.scan(b'foo', match_event_handler=on_match) print(f"'foo' matches: {matches}") # [(0, 0, 3)] - matches # Edit distance: allow up to 3 character substitutions/insertions/deletions db.compile( expressions=[b'foobar'], flags=hyperscan.HS_FLAG_SOM_LEFTMOST, ext=[hyperscan.ExpressionExt( flags=hyperscan.HS_EXT_FLAG_EDIT_DISTANCE, edit_distance=3 )] ) matches.clear() db.scan(b'fxxxar', match_event_handler=on_match) print(f"Edit distance matches: {matches}") # [(0, 0, 6)] - 'fxxxar' matches 'foobar' ``` -------------------------------- ### Download and Extract Boost Dependency Source: https://github.com/darvid/python-hyperscan/blob/main/CMakeLists.txt Downloads a specific version of Boost source code, extracts it into the vendor directory, and renames the extracted folder. Includes expected SHA256 hash for verification. ```cmake file( DOWNLOAD https://archives.boost.io/release/${BOOST_VERSION}/source/boost_${BOOST_FILENAME_VERSION}.tar.gz ${hyperscan_VENDOR_DIR}/boost.tar.gz EXPECTED_HASH SHA256=f55c340aa49763b1925ccf02b2e83f35fdcf634c9d5164a2acb87540173c741d ) if(EXISTS ${hyperscan_VENDOR_DIR}/boost) file(REMOVE_RECURSE ${hyperscan_VENDOR_DIR}/boost) endif() if(EXISTS ${hyperscan_VENDOR_DIR}/boost_${BOOST_FILENAME_VERSION}) file(REMOVE_RECURSE ${hyperscan_VENDOR_DIR}/boost_${BOOST_FILENAME_VERSION}) endif() file( ARCHIVE_EXTRACT INPUT ${hyperscan_VENDOR_DIR}/boost.tar.gz DESTINATION ${hyperscan_VENDOR_DIR} PATTERNS "boost_${BOOST_FILENAME_VERSION}/boost/*" ) file(RENAME ${hyperscan_VENDOR_DIR}/boost_${BOOST_FILENAME_VERSION} ${hyperscan_VENDOR_DIR}/boost) message(STATUS "Boost downloaded to ${hyperscan_VENDOR_DIR}/boost") ``` -------------------------------- ### Serialize and Deserialize Hyperscan Database Source: https://github.com/darvid/python-hyperscan/blob/main/docs/usage.md Demonstrates how to serialize a Hyperscan database to bytes for storage or transmission and then deserialize it back into a usable database object. This is useful for saving compiled patterns. ```python # Serializing (dumping to bytes) serialized = hyperscan.dumpb(db) with open('hs.db', 'wb') as f: f.write(serialized) ``` ```python # Deserializing (loading from bytes): db = hyperscan.loadb(serialized) ``` -------------------------------- ### Scan for Second Foobar Match Source: https://github.com/darvid/python-hyperscan/blob/main/docs/usage.md This snippet shows how to scan a byte string for a pattern and specifies a callback function to handle matches. It is useful for finding specific occurrences within text. ```python db.scan(b'foobarfoobar', match_event_handler=callback) ``` -------------------------------- ### Mirror Semantic-Release Decision Locally Source: https://github.com/darvid/python-hyperscan/blob/main/docs/releases.md This command mirrors the semantic-release decision locally by performing a no-op version check with debug verbosity. It helps in understanding if a new semantic version will be cut. ```shell uv run python -m semantic_release version --noop --verbosity DEBUG ``` -------------------------------- ### Compile Hyperscan Database in Literal Mode Source: https://context7.com/darvid/python-hyperscan/llms.txt Demonstrates compiling a Hyperscan database in literal mode, where patterns are treated as exact strings rather than regular expressions. This mode offers faster compilation and matching for fixed strings, and special regex characters lose their meaning. ```python import hyperscan # Create database for literal matching db = hyperscan.Database() # These are treated as literal strings, not regex patterns # Characters like +, *, . have no special meaning db.compile( expressions=[b'foo+bar', b'test.*pattern', b'[literal]'], ids=[1, 2, 3], literal=True # Enable literal mode ) matches = [] def on_match(pattern_id, from_offset, to_offset, flags, context): matches.append((pattern_id, context[from_offset:to_offset])) return None # Must match exactly - '+' and '.*' are literal characters text = b'foo+bar test.*pattern [literal]' db.scan(text, match_event_handler=on_match, context=text) print(matches) # Output: [(1, b'foo+bar'), (2, b'test.*pattern'), (3, b'[literal]') ``` -------------------------------- ### Windows MSVC Runtime Configuration Source: https://github.com/darvid/python-hyperscan/blob/main/CMakeLists.txt Configures consistent MSVC runtime flags for Windows builds. Sets the parallel build level if not already defined. ```cmake set(CMAKE_MSVC_RUNTIME_LIBRARY "MultiThreadedDLL") set(HS_CMAKE_COMMON_FLAGS "/arch:SSE2 /FS /GS-") if(NOT CMAKE_BUILD_PARALLEL_LEVEL) set(CMAKE_BUILD_PARALLEL_LEVEL 2) endif() set(HS_CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${HS_CMAKE_COMMON_FLAGS}") set(HS_CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${HS_CMAKE_COMMON_FLAGS}") set(HS_CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS}") ``` -------------------------------- ### Streaming Mode Scanning with Database.stream() - Python Hyperscan Source: https://context7.com/darvid/python-hyperscan/llms.txt Illustrates scanning data in chunks using the `stream()` method, suitable for large datasets or network traffic. Patterns can span across chunk boundaries, and the context can be overridden for specific chunks. ```python import hyperscan # Create database in streaming mode db = hyperscan.Database(mode=hyperscan.HS_MODE_STREAM) db.compile( expressions=[b'foobar', b'hello world'], ids=[1, 2], flags=hyperscan.HS_FLAG_SOM_LEFTMOST ) matches = [] def on_match(pattern_id, from_offset, to_offset, flags, context): matches.append((pattern_id, from_offset, to_offset, context)) return None # Use stream context manager for chunked processing with db.stream(match_event_handler=on_match, context='default') as stream: # Pattern 'foobar' spans chunks stream.scan(b'foo') stream.scan(b'bar') # Match detected here # Override context for specific chunk stream.scan(b' hello', context='chunk2') stream.scan(b' world', context='chunk3') # Match detected here print(matches) # Output: [(1, 0, 6, 'default'), (2, 6, 17, 'chunk3')] # Get stream state size with db.stream(match_event_handler=on_match) as stream: print(f"Stream state size: {stream.size()} bytes") ``` -------------------------------- ### Non-Windows Architecture Flags and SIMDE Backend Source: https://github.com/darvid/python-hyperscan/blob/main/CMakeLists.txt Configures compiler flags and SIMDE_BACKEND selection for non-x86 architectures. SIMDE_BACKEND is enabled for ARM to provide SIMD support where native support is lacking. On x86-64, it's disabled to leverage native ISA extensions for better performance. ```cmake set(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS}") # Architecture-specific compiler flags and SIMDE_BACKEND selection. # SIMDE_BACKEND is only enabled for non-x86 architectures (ARM, etc.) # where vectorscan has no native SIMD support. On x86-64, the native # backend provides runtime CPU feature detection (SSE4.2/AVX2/AVX512) # which is critical for performance. Enabling SIMDE_BACKEND on x86-64 # disables all higher ISA code paths and caps performance at SSE2 # level (~10-15x slower). See: https://github.com/darvid/python-hyperscan/issues/253 # # For macOS cross-compilation (e.g. building x86_64 on ARM runner), # CMAKE_OSX_ARCHITECTURES reflects the TARGET arch and takes priority # over CMAKE_SYSTEM_PROCESSOR (which reflects the HOST). set(HS_USE_SIMDE_BACKEND OFF) set(_HS_TARGET_ARCH "${CMAKE_SYSTEM_PROCESSOR}") if(APPLE AND CMAKE_OSX_ARCHITECTURES) set(_HS_TARGET_ARCH "${CMAKE_OSX_ARCHITECTURES}") endif() if(_HS_TARGET_ARCH MATCHES "(arm|aarch64|arm64)") set(HS_CMAKE_COMMON_FLAGS "-fPIC") set(HS_USE_SIMDE_BACKEND ON) elseif(_HS_TARGET_ARCH MATCHES "(x86|X86|amd64|AMD64|x86_64|i[3-6]86)") set(HS_CMAKE_COMMON_FLAGS "-march=x86-64 -fPIC") else() set(HS_CMAKE_COMMON_FLAGS "-fPIC") set(HS_USE_SIMDE_BACKEND ON) endif() set(HS_CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${HS_CMAKE_COMMON_FLAGS}") set(HS_CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${HS_CMAKE_COMMON_FLAGS} -D_GLIBCXX_USE_CXX11_ABI=1") ``` -------------------------------- ### Windows Generator and Architecture Settings Source: https://github.com/darvid/python-hyperscan/blob/main/CMakeLists.txt Sets the Visual Studio generator and architecture for Windows builds. Defaults to 'Visual Studio 17 2022' and 'x64' if not using an existing Visual Studio generator. ```cmake if(NOT ${CMAKE_GENERATOR} MATCHES "^Visual Studio") set(HS_GENERATOR "Visual Studio 17 2022") # Ensure x64 architecture set(HS_CMAKE_ARGS ${HS_CMAKE_ARGS} -A x64) else() set(HS_GENERATOR ${CMAKE_GENERATOR}) endif() ``` -------------------------------- ### Test with Coverage Source: https://github.com/darvid/python-hyperscan/blob/main/CLAUDE.md Runs tests and generates a coverage report using pytest. The --pyargs flag specifies the package to test. ```bash pytest --pyargs hyperscan/tests -vvv ``` -------------------------------- ### Run All Tests with Pytest Source: https://github.com/darvid/python-hyperscan/blob/main/CLAUDE.md Executes all tests in the tests/ directory using pytest. The -vvv flag increases verbosity. ```bash pytest tests/ -vvv ``` -------------------------------- ### Remove Existing Boost and PCRE Directories Source: https://github.com/darvid/python-hyperscan/blob/main/CMakeLists.txt Before downloading new versions, this script ensures that any existing Boost or PCRE directories are removed to prevent conflicts. ```cmake if(EXISTS ${HS_SRC_ROOT}/boost) file(REMOVE_RECURSE ${HS_SRC_ROOT}/boost) endif() if(EXISTS ${HS_SRC_ROOT}/pcre) file(REMOVE_RECURSE ${HS_SRC_ROOT}/pcre) endif() ``` -------------------------------- ### Compile and Scan with Chimera Mode Source: https://github.com/darvid/python-hyperscan/blob/main/docs/usage.md Illustrates the use of Chimera mode in Hyperscan, which allows mixing PCRE literals with Hyperscan's multi-pattern engine. It's recommended to reuse a Scratch object per thread when using Chimera mode to prevent reallocations. ```python chimera_db = hyperscan.Database(chimera=True) chimera_db.compile(expressions=[br'(foo)+', br'b(ar|az)']) chimera_db.scan(b'foobaz', match_event_handler=on_match) ``` -------------------------------- ### Database.stream() - Streaming Mode Scanning Source: https://context7.com/darvid/python-hyperscan/llms.txt Provides a context manager for scanning data that arrives in chunks, ideal for large or streaming data. ```APIDOC ## Database.stream() - Streaming Mode Scanning ### Description The `stream()` method returns a context manager for scanning data that arrives in chunks. This is ideal for processing network traffic, log files, or any data that cannot be loaded entirely into memory. Patterns can match across chunk boundaries. ### Method `Database.stream(match_event_handler, context)` ### Parameters #### Path Parameters None #### Query Parameters None #### Request Body - **match_event_handler** (callable) - Required - A function to be called for each match found. It receives `pattern_id`, `from_offset`, `to_offset`, `flags`, and `context`. Returning a truthy value from this handler will stop the scan. - **context** (any) - Optional - A user-provided object that will be passed to the `match_event_handler` for the initial stream. ### Request Example ```python import hyperscan db = hyperscan.Database(mode=hyperscan.HS_MODE_STREAM) db.compile( expressions=[b'foobar', b'hello world'], ids=[1, 2], flags=hyperscan.HS_FLAG_SOM_LEFTMOST ) matches = [] def on_match(pattern_id, from_offset, to_offset, flags, context): matches.append((pattern_id, from_offset, to_offset, context)) return None with db.stream(match_event_handler=on_match, context='default') as stream: stream.scan(b'foo') stream.scan(b'bar') stream.scan(b' hello', context='chunk2') stream.scan(b' world', context='chunk3') print(matches) ``` ### Method `Stream.scan(chunk, context)` ### Description Scans a chunk of data. The `context` parameter can override the context provided during `stream()` initialization for this specific chunk. ### Parameters #### Path Parameters None #### Query Parameters None #### Request Body - **chunk** (bytes) - Required - The data chunk to scan. - **context** (any) - Optional - A user-provided object to be passed to the `match_event_handler` for matches found within this chunk. ### Method `Stream.size()` ### Description Returns the size of the stream state in bytes. ### Response #### Success Response (200) - **size** (int) - The size of the stream state in bytes. ### Response Example ``` Stream state size: 512 bytes ``` ``` -------------------------------- ### Scratch Class for Thread-Safe Scanning Source: https://context7.com/darvid/python-hyperscan/llms.txt Manage memory for scanning operations using the `Scratch` class. Each thread requires its own scratch space, which can be cloned for efficiency. ```python import hyperscan from concurrent.futures import ThreadPoolExecutor ``` -------------------------------- ### Download and Extract PCRE Library Source: https://github.com/darvid/python-hyperscan/blob/main/CMakeLists.txt This snippet downloads a specific version of the PCRE library, verifies its integrity using a SHA256 hash, and extracts it into the hyperscan vendor directory. ```cmake message(STATUS "Downloading PCRE ${PCRE_VERSION}") file( DOWNLOAD https://sourceforge.net/projects/pcre/files/pcre/${PCRE_VERSION}/pcre-${PCRE_VERSION}.tar.bz2 ${hyperscan_VENDOR_DIR}/pcre-${PCRE_VERSION}.tar.bz2 EXPECTED_HASH SHA256=4dae6fdcd2bb0bb6c37b5f97c33c2be954da743985369cddac3546e3218bffb8 ) if(EXISTS ${hyperscan_VENDOR_DIR}/pcre) file(REMOVE_RECURSE ${hyperscan_VENDOR_DIR}/pcre) endif() file( ARCHIVE_EXTRACT INPUT ${hyperscan_VENDOR_DIR}/pcre-${PCRE_VERSION}.tar.bz2 DESTINATION ${hyperscan_VENDOR_DIR} PATTERNS "pcre-${PCRE_VERSION}/*" ) file(RENAME ${hyperscan_VENDOR_DIR}/pcre-${PCRE_VERSION} ${hyperscan_VENDOR_DIR}/pcre) message(STATUS "PCRE downloaded to ${hyperscan_VENDOR_DIR}/pcre") ``` -------------------------------- ### Write Patched PCRE CMakeLists.txt Source: https://github.com/darvid/python-hyperscan/blob/main/CMakeLists.txt After applying all necessary patches, this command writes the modified content back to the PCRE CMakeLists.txt file if any changes were made. ```cmake if(_any_patch_applied) file(WRITE ${PCRE_CMAKE_FILE} "${_pcre_cmake_content_current}") message(STATUS "Final patched PCRE CMakeLists.txt written to ${PCRE_CMAKE_FILE} as changes were applied.") else() message(STATUS "No changes made to PCRE CMakeLists.txt after attempting all patches, file not rewritten.") endif() ``` -------------------------------- ### Database.scan() - Block Mode Scanning Source: https://context7.com/darvid/python-hyperscan/llms.txt Performs pattern matching against a complete block of text using the compiled database. ```APIDOC ## Database.scan() - Block Mode Scanning ### Description The `scan()` method performs pattern matching against a complete block of text. It invokes the match callback for each match found, passing the pattern ID, start offset (if SOM enabled), end offset, flags, and optional context object. Returning a truthy value from the callback halts scanning. ### Method `Database.scan(text, match_event_handler, context)` ### Parameters #### Path Parameters None #### Query Parameters None #### Request Body - **text** (bytes) - Required - The block of text to scan. - **match_event_handler** (callable) - Required - A function to be called for each match found. It receives `pattern_id`, `from_offset`, `to_offset`, `flags`, and `context`. Returning a truthy value from this handler will stop the scan. - **context** (any) - Optional - A user-provided object that will be passed to the `match_event_handler`. ### Request Example ```python import hyperscan db = hyperscan.Database() db.compile( expressions=[b'foo', b'bar', b'baz'], ids=[1, 2, 3], flags=hyperscan.HS_FLAG_SOM_LEFTMOST ) matches = [] def on_match(pattern_id, from_offset, to_offset, flags, context): matches.append({ 'id': pattern_id, 'start': from_offset, 'end': to_offset, 'text': context[from_offset:to_offset] if context else None }) return None text = b'hello foo and bar and baz world' db.scan(text, match_event_handler=on_match, context=text) print(matches) ``` ### Response #### Success Response (200) - **matches** (list) - A list containing details of each match found, as populated by the `match_event_handler`. #### Response Example ```json [ {'id': 1, 'start': 6, 'end': 9, 'text': b'foo'}, {'id': 2, 'start': 14, 'end': 17, 'text': b'bar'}, {'id': 3, 'start': 22, 'end': 25, 'text': b'baz'} ] ``` ``` -------------------------------- ### Linux Specific Linking Options for manylinux2014 Source: https://github.com/darvid/python-hyperscan/blob/main/CMakeLists.txt Configures linker options for the Python Hyperscan extension on manylinux2014 Docker images. Includes flags to control symbol visibility and library handling. ```cmake target_link_options(${HS_EXT_NAME} PRIVATE -Wl,--no-as-needed -Wl,--copy-dt-needed-entries -Wl,--no-allow-shlib-undefined -Wl,--exclude-libs,ALL ) target_link_libraries(${HS_EXT_NAME} PRIVATE ${HS_LIBS}) target_link_libraries(${HS_EXT_NAME} PRIVATE -Wl,--push-state -Wl,-Bstatic -lstdc++ -Wl,--pop-state) ``` -------------------------------- ### Compile Hyperscan Database Source: https://github.com/darvid/python-hyperscan/blob/main/docs/usage.md Compiles a database of regular expressions with optional IDs and flags. Ensure the Hyperscan engine version is 5.4 or newer. ```python import hyperscan db = hyperscan.Database() patterns = ( # expression, id, flags (br'fo+', 0, 0), (br'^foobar$', 1, hyperscan.HS_FLAG_CASELESS), (br'BAR', 2, hyperscan.HS_FLAG_CASELESS | hyperscan.HS_FLAG_SOM_LEFTMOST), ) expressions, ids, flags = zip(*patterns) db.compile( expressions=expressions, ids=ids, elements=len(patterns), flags=flags ) print(db.info().decode()) # Version: 5.4.12 Features: AVX2 Mode: BLOCK ``` -------------------------------- ### Format Python Code Source: https://github.com/darvid/python-hyperscan/blob/main/CLAUDE.md Formats Python code in the 'src/' directory using black. This command modifies files in-place. ```bash black src/ ``` -------------------------------- ### Locate Ragel Executable Source: https://github.com/darvid/python-hyperscan/blob/main/CMakeLists.txt A CMake function to find the Ragel executable and determine its version. It checks if the found version meets a minimum requirement if specified. ```cmake function(hs_locate_ragel) set(options) set(oneValueArgs VERSION) set(multiValueArgs) cmake_parse_arguments(HS_R "${options}" "${oneValueArgs}" "${multiValueArgs}" ${ARGN}) find_program(_ragel_executable NAMES ragel) if(NOT _ragel_executable) set(RAGEL_FOUND FALSE PARENT_SCOPE) return() endif() set(_ragel_version "") execute_process( COMMAND ${_ragel_executable} --version OUTPUT_VARIABLE _ragel_stdout ERROR_VARIABLE _ragel_stderr OUTPUT_STRIP_TRAILING_WHITESPACE RESULT_VARIABLE _ragel_result) if(_ragel_result EQUAL 0) string(REGEX MATCH "([0-9]+\\.[0-9]+(\\.[0-9]+)?)" _ragel_version "${_ragel_stdout}") endif() if(NOT _ragel_version) set(_ragel_version "0.0") endif() if(HS_R_VERSION AND _ragel_version VERSION_LESS HS_R_VERSION) message(STATUS "Found ragel ${_ragel_version} at ${_ragel_executable} but ${HS_R_VERSION}+ is required") set(RAGEL_FOUND FALSE PARENT_SCOPE) return() endif() set(RAGEL_EXECUTABLE "${_ragel_executable}" PARENT_SCOPE) set(RAGEL_VERSION "${_ragel_version}" PARENT_SCOPE) set(RAGEL_FOUND TRUE PARENT_SCOPE) endfunction() ``` -------------------------------- ### Scan Text in Streaming Mode Source: https://github.com/darvid/python-hyperscan/blob/main/docs/usage.md Uses the Stream context manager to scan data in chunks. Match handlers and context objects can be provided at initialization or overridden per scan. ```python with db.stream(match_event_handler=on_match, context=2345) as stream: stream.scan(b'foobar') # Override context only for one chunk stream.scan(b'barfoofoobarbarfoobar', context=1234) # Override match handler only for one chunk stream.scan(b'qux', match_event_handler=on_qux_match) ``` -------------------------------- ### Linux Specific Linking Options (Non-manylinux2014) Source: https://github.com/darvid/python-hyperscan/blob/main/CMakeLists.txt Configures linker options for the Python Hyperscan extension on other Linux platforms, such as local development or CI environments without auditwheel. It uses flags to control symbol visibility. ```cmake target_link_options(${HS_EXT_NAME} PRIVATE -Wl,--no-as-needed -Wl,--exclude-libs,ALL ) target_link_libraries(${HS_EXT_NAME} PRIVATE ${HS_LIBS}) target_link_libraries(${HS_EXT_NAME} PRIVATE stdc++) ``` -------------------------------- ### Add Custom Target for Ragel Source: https://github.com/darvid/python-hyperscan/blob/main/CMakeLists.txt Configures a custom target for the Ragel executable. This is used to ensure the Ragel tool is available and its version is checked. ```cmake message(STATUS "Ragel executable: ${RAGEL_EXECUTABLE}") add_custom_target(ragel COMMAND ${RAGEL_EXECUTABLE} -V) ``` ```cmake set(RAGEL_EXECUTABLE ${RAGEL_EXECUTABLE}) add_custom_target(ragel COMMAND ${RAGEL_EXECUTABLE} -V) message(STATUS "Ragel executable: ${RAGEL_EXECUTABLE}") ``` -------------------------------- ### Link Libraries for Python Hyperscan Extension Source: https://github.com/darvid/python-hyperscan/blob/main/CMakeLists.txt Specifies the libraries to link against for the Python Hyperscan extension. This is a common configuration step in build systems. ```cmake target_link_libraries(${HS_EXT_NAME} PRIVATE ${HS_LIBS}) target_link_libraries(${HS_EXT_NAME} PRIVATE c++) ``` -------------------------------- ### Set CMake Policy Version Source: https://github.com/darvid/python-hyperscan/blob/main/CMakeLists.txt Ensures downstream projects opt into modern policy behavior by setting the CMake policy version. Requires CMake 3.5 or higher. ```cmake set(_cmake_policy_version "${CMAKE_MAJOR_VERSION}.${CMAKE_MINOR_VERSION}") set(HS_CMAKE_CACHE_ARGS -DCMAKE_POLICY_VERSION:STRING=${_cmake_policy_version} -DCMAKE_POLICY_VERSION_MINIMUM:STRING=3.5 ) ``` -------------------------------- ### Windows Specific Linking for Python Hyperscan Extension Source: https://github.com/darvid/python-hyperscan/blob/main/CMakeLists.txt Specifies the libraries to link against for the Python Hyperscan extension on Windows. This is a simplified configuration for the Windows platform. ```cmake target_link_libraries(${HS_EXT_NAME} PRIVATE ${HS_LIBS}) ``` -------------------------------- ### Database Class - Core Pattern Database Source: https://context7.com/darvid/python-hyperscan/llms.txt The Database class is used to compile and store regular expression patterns. It supports block, stream, and vectored scanning modes. ```APIDOC ## Database Class - Core Pattern Database ### Description The `Database` class is the primary interface for compiling and storing regular expression patterns. It supports three scanning modes: block mode for complete text blocks, stream mode for processing data in chunks, and vectored mode for scanning multiple non-contiguous buffers. The database automatically manages scratch space for efficient scanning. ### Method `hyperscan.Database()` ### Parameters #### Initialization Parameters - **mode** (int) - Optional - The scanning mode (e.g., `hyperscan.HS_MODE_BLOCK`, `hyperscan.HS_MODE_STREAM`). Defaults to `hyperscan.HS_MODE_BLOCK`. ### Method `Database.compile()` ### Description Compiles a list of regular expressions into the database. ### Parameters #### Path Parameters None #### Query Parameters None #### Request Body - **expressions** (list of bytes) - Required - A list of regular expression patterns to compile. - **ids** (list of int) - Required - A list of unique identifiers for each pattern. - **elements** (int) - Required - The total number of patterns. - **flags** (int) - Optional - Flags to apply to the patterns (e.g., `hyperscan.HS_FLAG_CASELESS`). ### Request Example ```python import hyperscan db = hyperscan.Database() patterns = [ (b'fo+', 0, 0), (b'^foobar$', 1, hyperscan.HS_FLAG_CASELESS), ] expressions, ids, flags = zip(*patterns) db.compile( expressions=expressions, ids=ids, elements=len(patterns), flags=flags ) ``` ### Method `Database.info()` ### Description Retrieves information about the compiled database. ### Response #### Success Response (200) - **info** (bytes) - A byte string containing database information (e.g., version, features, mode). ### Response Example ``` Version: 5.4.12 Features: AVX2 Mode: BLOCK ``` ### Method `Database.size()` ### Description Returns the size of the compiled database in bytes. ### Response #### Success Response (200) - **size** (int) - The size of the database in bytes. ### Response Example ``` Database size: 1234 bytes ``` ``` -------------------------------- ### Run Specific Test with Pytest Source: https://github.com/darvid/python-hyperscan/blob/main/CLAUDE.md Runs a specific test case using pytest. Replace 'test_hyperscan.py::test_name' with the actual test file and function name. ```bash pytest tests/test_hyperscan.py::test_name -vvv ``` -------------------------------- ### Chimera Mode Scanning with Capture Groups Source: https://context7.com/darvid/python-hyperscan/llms.txt Utilize Chimera mode for PCRE compatibility, supporting features like capture groups. The match handler receives captured group information. ```python import hyperscan # Create database with Chimera support and capture groups db = hyperscan.Database(chimera=True, mode=hyperscan.CH_MODE_GROUPS) # Compile patterns using Chimera flags db.compile( expressions=[b'(foo)+', b'b(ar|az)'], ids=[1, 2], flags=[ hyperscan.CH_FLAG_CASELESS, hyperscan.CH_FLAG_CASELESS ] ) matches = [] def on_chimera_match(pattern_id, from_offset, to_offset, flags, captured, context): """ Chimera callback includes captured groups. Args: captured: List of (group_id, start, end) tuples for capture groups """ matches.append({ 'id': pattern_id, 'start': from_offset, 'end': to_offset, 'groups': captured }) return None db.scan(b'foofoobar', match_event_handler=on_chimera_match) print(matches) # Output: [{'id': 1, 'start': 0, 'end': 6, 'groups': [(1, 3, 6)]}, # {'id': 2, 'start': 6, 'end': 9, 'groups': [(1, 7, 9)]}] ```