### Install Tools with Mise

Source: https://github.com/darvid/python-hyperscan/blob/main/CLAUDE.md

Installs development tools using mise. Ensure mise is installed and configured before running.

```bash
mise install
```

--------------------------------

### Clone and Install Hyperscan from Source

Source: https://github.com/darvid/python-hyperscan/blob/main/docs/index.md

Standard workflow for cloning the repository, setting up a virtual environment, and installing the package from source. This process compiles the extension and vendors the scanning engine.

```shell
git clone https://github.com/darvid/python-hyperscan.git
cd python-hyperscan
python -m venv .venv
source .venv/bin/activate  # .\.venv\Scripts\activate on Windows
pip install --upgrade pip build[uv]
pip install .
```

--------------------------------

### Verify Hyperscan Installation

Source: https://github.com/darvid/python-hyperscan/blob/main/docs/index.md

Run this command to verify the installation by printing the Hyperscan engine information. This confirms the bundled engine version and features.

```shell
python - <<'PY'
import hyperscan
print(hyperscan.Database().info())
PY
```

--------------------------------

### Install Python Dependencies with UV

Source: https://github.com/darvid/python-hyperscan/blob/main/CLAUDE.md

Synchronizes Python dependencies using uv. Use --no-editable and --no-install-project for a clean installation.

```bash
uv sync --no-editable --no-install-project
```

--------------------------------

### Install Python Hyperscan

Source: https://github.com/darvid/python-hyperscan/blob/main/README.md

Install the python-hyperscan package using pip. No external Hyperscan/Vectorscan library installation is required as it's statically linked.

```shell
pip install hyperscan
```

--------------------------------

### Link Directories Setup

Source: https://github.com/darvid/python-hyperscan/blob/main/CMakeLists.txt

Sets up link directories for Hyperscan build artifacts.

```cmake
link_directories(${hyperscan_BINARY_DIR})
link_directories(${hyperscan_BINARY_DIR}/lib)
```

--------------------------------

### Include Directories Setup

Source: https://github.com/darvid/python-hyperscan/blob/main/CMakeLists.txt

Sets up include directories for Hyperscan and its components. Includes directories for Hyperscan source, chimera, and optionally vectorscan.

```cmake
include_directories(${hyperscan_SOURCE_DIR}/src)
include_directories(${hyperscan_SOURCE_DIR}/chimera)

if(USE_VECTORSCAN)
  include_directories(${hyperscan_PREFIX_DIR})
endif()
```

--------------------------------

### Install Development Dependencies with UV

Source: https://github.com/darvid/python-hyperscan/blob/main/CLAUDE.md

Synchronizes only development dependencies using uv. Use --only-dev for installing packages listed in development groups.

```bash
uv sync --only-dev --no-editable --no-install-project
```

--------------------------------

### Install Ragel on Windows via MSYS2

Source: https://github.com/darvid/python-hyperscan/blob/main/CMakeLists.txt

Installs the Ragel package using MSYS2's pacman on Windows if Hyperscan is being built from source and Ragel is not found. It includes updating MSYS2 packages first.

```cmake
if(WIN32)
      # if Windows, expect MSYS2 to be installed at C:/msys64 or abort
      # this should be the case on Windows Server GitHub Actions runners
      # simply use the MSYS2/MinGW package for ragel since we just need
      # the binary
      if(NOT EXISTS C:/msys64)
        message(FATAL_ERROR "MSYS2 not found at C:/msys64")
      endif()

      set(BASH_PATH C:/msys64/usr/bin/bash.exe)
      execute_process(
        COMMAND ${BASH_PATH} -c "/usr/bin/pacman -Syuu --noconfirm"
        RESULT_VARIABLE MSYS2_UPDATE_RESULT
      )

      if(MSYS2_UPDATE_RESULT)
        message(FATAL_ERROR "Failed to update MSYS2 packages")
      endif()

      execute_process(
        COMMAND ${BASH_PATH} -c "/usr/bin/pacman -S --noconfirm mingw-w64-x86_64-ragel"
        RESULT_VARIABLE MSYS2_RAGEL_INSTALL_RESULT
      )

      if(MSYS2_RAGEL_INSTALL_RESULT)
        message(FATAL_ERROR "Failed to install ragel")
      endif()

      set(RAGEL_EXECUTABLE C:/msys64/mingw64/bin/ragel.exe CACHE PATH "Ragel executable" FORCE)
      set(RAGEL ${RAGEL_EXECUTABLE})
      set(RAGEL_FOUND TRUE)
      message(STATUS "Ragel executable: ${RAGEL_EXECUTABLE}")
    else()
      # prerequisites (in addition to a build toolchain): autoconf, kelbt
      find_program(AUTORECONF autoreconf REQUIRED)
      find_program(KELBT kelbt REQUIRED)
      ExternalProject_Add(
        ragel
        GIT_REPOSITORY ${RAGEL_REPO}
        GIT_TAG ragel-${RAGEL_VERSION}
        BUILD_IN_SOURCE TRUE
        CONFIGURE_COMMAND ${AUTORECONF} -f -i
        COMMAND ./configure --prefix=${CMAKE_BINARY_DIR} --disable-manual
        BUILD_COMMAND make -j4
        INSTALL_COMMAND ""
      )
      set(RAGEL_EXECUTABLE ${ragel_BINARY_DIR}/bin/ragel CACHE PATH "Ragel executable" FORCE)
      set(RAGEL_VERSION ${RAGEL_VERSION})
      set(RAGEL_FOUND TRUE)
      set(RAGEL_FOUND TRUE)
    endif()
```

--------------------------------

### Install Python Hyperscan Extension Target

Source: https://github.com/darvid/python-hyperscan/blob/main/CMakeLists.txt

Defines the installation rules for the Python Hyperscan extension target. It specifies the destination directory and component for installation.

```cmake
install(
  TARGETS ${HS_EXT_NAME}
  LIBRARY
  DESTINATION hyperscan
  COMPONENT hyperscan
)
```

--------------------------------

### Format C Code

Source: https://github.com/darvid/python-hyperscan/blob/main/CLAUDE.md

Formats the C source file 'src/hyperscan/extension.c' in-place using clang-format. Ensure clang-format is installed and in your PATH.

```bash
clang-format -i src/hyperscan/extension.c
```

--------------------------------

### Lint Python Code with Ruff

Source: https://github.com/darvid/python-hyperscan/blob/main/CLAUDE.md

Lints Python code in the 'src/' directory using ruff and automatically fixes linting issues. Ensure ruff is installed.

```bash
ruff check src/ --fix
```

--------------------------------

### Hyperscan Pattern Flags Reference and Usage

Source: https://context7.com/darvid/python-hyperscan/llms.txt

Illustrates Hyperscan's pattern flags that modify compilation and matching behavior. Flags can be combined using bitwise OR and applied per-pattern or globally. This example demonstrates combining flags for case-insensitive, multiline, and dotall matching.

```python
import hyperscan

# Flag constants and their effects
flags_example = {
    'HS_FLAG_CASELESS': hyperscan.HS_FLAG_CASELESS,      # Case-insensitive matching
    'HS_FLAG_DOTALL': hyperscan.HS_FLAG_DOTALL,          # Dot matches newlines
    'HS_FLAG_MULTILINE': hyperscan.HS_FLAG_MULTILINE,    # ^ and $ match line boundaries
    'HS_FLAG_SINGLEMATCH': hyperscan.HS_FLAG_SINGLEMATCH,# Report only first match per pattern
    'HS_FLAG_ALLOWEMPTY': hyperscan.HS_FLAG_ALLOWEMPTY,  # Allow patterns that match empty strings
    'HS_FLAG_UTF8': hyperscan.HS_FLAG_UTF8,              # Enable UTF-8 mode
    'HS_FLAG_UCP': hyperscan.HS_FLAG_UCP,                # Unicode character properties
    'HS_FLAG_PREFILTER': hyperscan.HS_FLAG_PREFILTER,    # Prefilter mode for complex patterns
    'HS_FLAG_SOM_LEFTMOST': hyperscan.HS_FLAG_SOM_LEFTMOST,  # Report start-of-match offset
}

# Example: Combine multiple flags
db = hyperscan.Database()
db.compile(
    expressions=[b'hello', b'^world$', b'foo.bar'],
    ids=[1, 2, 3],
    flags=[
        hyperscan.HS_FLAG_CASELESS,  # 'HELLO' matches
        hyperscan.HS_FLAG_MULTILINE | hyperscan.HS_FLAG_CASELESS,  # Match at line start
        hyperscan.HS_FLAG_DOTALL | hyperscan.HS_FLAG_SOM_LEFTMOST,  # Dot matches newline
    ]
)

matches = []
def on_match(pattern_id, from_offset, to_offset, flags, context):
    matches.append((pattern_id, from_offset, to_offset))
    return None

db.scan(b'HELLO\nworld\nfoo\nbar', match_event_handler=on_match)
print(matches)
# Matches 'HELLO' (caseless), 'world' at line start, and 'foo\nbar' (dotall)
```

--------------------------------

### Compile Hyperscan Database and Scan with Threads

Source: https://context7.com/darvid/python-hyperscan/llms.txt

Demonstrates compiling a Hyperscan database with multiple patterns and scanning data concurrently using threads. Each thread requires its own scratch space for independent matching.

```python
import hyperscan
from concurrent.futures import ThreadPoolExecutor

# Create and compile database
db = hyperscan.Database()
db.compile(expressions=[b'pattern1', b'pattern2'], ids=[1, 2])

# Create primary scratch space
primary_scratch = hyperscan.Scratch(db)

# Clone scratch for each thread
def scan_in_thread(data, thread_id):
    # Each thread needs its own scratch space
    thread_scratch = primary_scratch.clone()

    matches = []
    def on_match(pattern_id, from_offset, to_offset, flags, context):
        matches.append((thread_id, pattern_id, from_offset, to_offset))
        return None

    db.scan(data, match_event_handler=on_match, scratch=thread_scratch)
    return matches

# Run scans in parallel
with ThreadPoolExecutor(max_workers=4) as executor:
    futures = [
        executor.submit(scan_in_thread, b'test pattern1 data', i)
        for i in range(4)
    ]
    results = [f.result() for f in futures]

print(results)
# Each thread finds matches independently with its own scratch space
```

--------------------------------

### Build Wheels with Cibuildwheel

Source: https://github.com/darvid/python-hyperscan/blob/main/CLAUDE.md

Builds binary wheels for the project using cibuildwheel. This command is typically run in a CI environment for cross-platform compatibility.

```bash
cibuildwheel --platform linux
```

--------------------------------

### Compile Database with Extended Parameters

Source: https://github.com/darvid/python-hyperscan/blob/main/docs/usage.md

Compiles a Hyperscan database using extended parameters, such as minimum offset for matches. Uses the ExpressionExt helper tuple.

```python
db.compile(
    expressions=[b'foobar'],
    flags=hyperscan.HS_FLAG_SOM_LEFTMOST,
    ext=[
        hyperscan.ExpressionExt(
            flags=hyperscan.HS_EXT_FLAG_MIN_OFFSET, min_offset=12
        )
    ],
)
```

--------------------------------

### Build Source Distribution

Source: https://github.com/darvid/python-hyperscan/blob/main/CLAUDE.md

Builds the source distribution (sdist) of the Python package using pyproject-build via uvx. The --verbose flag provides detailed output.

```bash
uvx --from build pyproject-build --installer=uv --sdist --verbose
```

--------------------------------

### Compile Patterns into Database - Python Hyperscan

Source: https://context7.com/darvid/python-hyperscan/llms.txt

Demonstrates compiling regular expression patterns into a Hyperscan database in block mode. Supports custom IDs and flags like case-insensitivity and start-of-match reporting.

```python
import hyperscan

# Create a database in block mode (default)
db = hyperscan.Database()

# Define patterns with ids and flags
patterns = [
    (b'fo+',      0, 0),  # Match 'fo', 'foo', 'fooo', etc.
    (b'^foobar$', 1, hyperscan.HS_FLAG_CASELESS),  # Case-insensitive anchored match
    (b'BAR',      2, hyperscan.HS_FLAG_CASELESS | hyperscan.HS_FLAG_SOM_LEFTMOST),
]
expressions, ids, flags = zip(*patterns)

# Compile the patterns into the database
db.compile(
    expressions=expressions,
    ids=ids,
    elements=len(patterns),
    flags=flags
)

# Get database information
print(db.info().decode())
# Output: Version: 5.4.12 Features: AVX2 Mode: BLOCK

# Get database size in bytes
print(f"Database size: {db.size()} bytes")
```

--------------------------------

### Preview Git-cliff Changelog Range

Source: https://github.com/darvid/python-hyperscan/blob/main/docs/releases.md

Use this command to preview the changelog range that git-cliff will generate before a release. It requires specifying the previous release tag, configuration file, and the next version tag.

```shell
git cliff <previous-release>..HEAD --config cliff.toml --tag v<next-version> --output release-notes.md
```

--------------------------------

### Configure Hyperscan/VectorScan Source

Source: https://github.com/darvid/python-hyperscan/blob/main/CMakeLists.txt

Sets up the Hyperscan or VectorScan source repository and version based on the operating system. For Windows, it defaults to Hyperscan and requires a Visual Studio generator. For other systems, it defaults to VectorScan.

```cmake
if(WIN32)
    # Ensure we're using MSVC on Windows
    if(NOT CMAKE_GENERATOR MATCHES "Visual Studio")
      message(FATAL_ERROR "On Windows, only MSVC/Visual Studio generators are supported for building Python extensions")
    endif()

    set(USE_VECTORSCAN FALSE)
    set(HYPERSCAN_VERSION 5.4.2)
    set(HYPERSCAN_TAG v5.4.2)
    set(HYPERSCAN_REPO https://github.com/intel/hyperscan.git)
    message(STATUS "Using Hyperscan ${HYPERSCAN_VERSION} from ${HYPERSCAN_REPO}")
  else()
    set(HYPERSCAN_VERSION 5.4.12)
    set(HYPERSCAN_TAG vectorscan/5.4.12)
    set(HYPERSCAN_REPO https://github.com/VectorCamp/vectorscan.git)
    message(STATUS "Using VectorScan ${HYPERSCAN_VERSION} from ${HYPERSCAN_REPO}")
  endif()
```

--------------------------------

### Define Hyperscan CMake Arguments

Source: https://github.com/darvid/python-hyperscan/blob/main/CMakeLists.txt

This sets up CMake arguments for Hyperscan, specifically enabling static libraries for Boost and the main build.

```cmake
set(
    HS_CMAKE_ARGS
    -DBOOST_USE_STATIC_LIBS=ON
    -DBUILD_STATIC_LIBS=ON
  )
```

--------------------------------

### Initialize Streaming Mode Database

Source: https://github.com/darvid/python-hyperscan/blob/main/docs/usage.md

Initializes a Hyperscan database specifically for streaming mode. This is required before using the Database.stream method.

```python
db = hyperscan.Database(mode=hyperscan.HS_MODE_STREAM)
```

--------------------------------

### Block Mode Scanning with Database.scan() - Python Hyperscan

Source: https://context7.com/darvid/python-hyperscan/llms.txt

Shows how to use the `scan()` method for matching patterns against a complete text block. It utilizes a callback function to process each match found, with an option to halt scanning by returning a truthy value.

```python
import hyperscan

# Create and compile database
db = hyperscan.Database()
db.compile(
    expressions=[b'foo', b'bar', b'baz'],
    ids=[1, 2, 3],
    flags=hyperscan.HS_FLAG_SOM_LEFTMOST  # Enable start-of-match reporting
)

# Store matches in a list
matches = []

def on_match(pattern_id, from_offset, to_offset, flags, context):
    """
    Callback invoked for each match.

    Args:
        pattern_id: The ID assigned to the matching pattern
        from_offset: Start offset (0 if SOM not enabled)
        to_offset: End offset of the match
        flags: Match flags
        context: User-provided context object

    Returns:
        None to continue scanning, truthy value to halt
    """
    matches.append({
        'id': pattern_id,
        'start': from_offset,
        'end': to_offset,
        'text': context[from_offset:to_offset] if context else None
    })
    return None  # Continue scanning

# Scan text block
text = b'hello foo and bar and baz world'
db.scan(text, match_event_handler=on_match, context=text)

print(matches)
# Output: [{'id': 1, 'start': 6, 'end': 9, 'text': b'foo'},
#          {'id': 2, 'start': 14, 'end': 17, 'text': b'bar'},
#          {'id': 3, 'start': 22, 'end': 25, 'text': b'baz'}]
```

--------------------------------

### Import Hyperscan Libraries

Source: https://github.com/darvid/python-hyperscan/blob/main/CMakeLists.txt

Defines and imports static libraries for Hyperscan, including runtime and utility libraries. It checks for the existence of these libraries if Hyperscan is not being built from source.

```cmake
set(HS_LIBS
  hs
  hs_runtime
  chimera
  pcre)

set(HS_BUILD_BYPRODUCTS)

foreach(lib ${HS_LIBS})
  add_library(${lib} STATIC IMPORTED)

  if(WIN32)
    set(object_name "${lib}${CMAKE_STATIC_LIBRARY_SUFFIX}")
  else()
    set(object_name "lib${lib}${CMAKE_STATIC_LIBRARY_SUFFIX}")
  endif()

  set(object_path "${HS_BUILD_LIB_ROOT}/${object_name}")

  list(APPEND HS_BUILD_BYPRODUCTS "${object_path}")

  if(NOT HS_BUILD_REQUIRED AND NOT EXISTS "${object_path}")
    message(FATAL_ERROR "${object_name} not found at ${HS_BUILD_LIB_ROOT}")
  endif()

  set_target_properties(${lib} PROPERTIES
    IMPORTED_LOCATION "${object_path}")
endforeach()
```

--------------------------------

### Full Lint Workflow

Source: https://github.com/darvid/python-hyperscan/blob/main/CLAUDE.md

Performs a full code quality check by linting and formatting Python code. This command combines ruff check and black formatting, matching CI behavior.

```bash
ruff check --fix src/ && black src/
```

--------------------------------

### Extended Pattern Parameters with ExpressionExt

Source: https://context7.com/darvid/python-hyperscan/llms.txt

Control pattern matching with `ExpressionExt`, specifying minimum/maximum offsets, minimum length, and approximate matching using edit or Hamming distance.

```python
import hyperscan

db = hyperscan.Database()
matches = []

def on_match(pattern_id, from_offset, to_offset, flags, context):
    matches.append((pattern_id, from_offset, to_offset))
    return None

# Min offset: only match after position 12
db.compile(
    expressions=[b'foobar'],
    flags=hyperscan.HS_FLAG_SOM_LEFTMOST,
    ext=[hyperscan.ExpressionExt(
        flags=hyperscan.HS_EXT_FLAG_MIN_OFFSET,
        min_offset=12
    )]
)
matches.clear()
db.scan(b'foobarfoobar', match_event_handler=on_match)
print(f"Min offset matches: {matches}")  # [(0, 6, 12)] - second 'foobar' only

# Max offset: only match before position 6
db.compile(
    expressions=[b'foobar'],
    flags=hyperscan.HS_FLAG_SOM_LEFTMOST,
    ext=[hyperscan.ExpressionExt(
        flags=hyperscan.HS_EXT_FLAG_MAX_OFFSET,
        max_offset=6
    )]
)
matches.clear()
db.scan(b'foobarfoobar', match_event_handler=on_match)
print(f"Max offset matches: {matches}")  # [(0, 0, 6)] - first 'foobar' only

# Min length: require at least 3 characters for 'fo+'
db.compile(
    expressions=[b'fo+'],
    flags=hyperscan.HS_FLAG_SOM_LEFTMOST,
    ext=[hyperscan.ExpressionExt(
        flags=hyperscan.HS_EXT_FLAG_MIN_LENGTH,
        min_length=3
    )]
)
matches.clear()
db.scan(b'fo', match_event_handler=on_match)
print(f"'fo' matches: {matches}")  # [] - no match, too short
matches.clear()
db.scan(b'foo', match_event_handler=on_match)
print(f"'foo' matches: {matches}")  # [(0, 0, 3)] - matches

# Edit distance: allow up to 3 character substitutions/insertions/deletions
db.compile(
    expressions=[b'foobar'],
    flags=hyperscan.HS_FLAG_SOM_LEFTMOST,
    ext=[hyperscan.ExpressionExt(
        flags=hyperscan.HS_EXT_FLAG_EDIT_DISTANCE,
        edit_distance=3
    )]
)
matches.clear()
db.scan(b'fxxxar', match_event_handler=on_match)
print(f"Edit distance matches: {matches}")  # [(0, 0, 6)] - 'fxxxar' matches 'foobar'
```

--------------------------------

### Download and Extract Boost Dependency

Source: https://github.com/darvid/python-hyperscan/blob/main/CMakeLists.txt

Downloads a specific version of Boost source code, extracts it into the vendor directory, and renames the extracted folder. Includes expected SHA256 hash for verification.

```cmake
file(
  DOWNLOAD https://archives.boost.io/release/${BOOST_VERSION}/source/boost_${BOOST_FILENAME_VERSION}.tar.gz
  ${hyperscan_VENDOR_DIR}/boost.tar.gz
  EXPECTED_HASH SHA256=f55c340aa49763b1925ccf02b2e83f35fdcf634c9d5164a2acb87540173c741d
)

if(EXISTS ${hyperscan_VENDOR_DIR}/boost)
  file(REMOVE_RECURSE ${hyperscan_VENDOR_DIR}/boost)
endif()

if(EXISTS ${hyperscan_VENDOR_DIR}/boost_${BOOST_FILENAME_VERSION})
  file(REMOVE_RECURSE ${hyperscan_VENDOR_DIR}/boost_${BOOST_FILENAME_VERSION})
endif()

file(
  ARCHIVE_EXTRACT INPUT ${hyperscan_VENDOR_DIR}/boost.tar.gz DESTINATION ${hyperscan_VENDOR_DIR} PATTERNS "boost_${BOOST_FILENAME_VERSION}/boost/*"
)
file(RENAME ${hyperscan_VENDOR_DIR}/boost_${BOOST_FILENAME_VERSION} ${hyperscan_VENDOR_DIR}/boost)
message(STATUS "Boost downloaded to ${hyperscan_VENDOR_DIR}/boost")
```

--------------------------------

### Serialize and Deserialize Hyperscan Database

Source: https://github.com/darvid/python-hyperscan/blob/main/docs/usage.md

Demonstrates how to serialize a Hyperscan database to bytes for storage or transmission and then deserialize it back into a usable database object. This is useful for saving compiled patterns.

```python
# Serializing (dumping to bytes)
serialized = hyperscan.dumpb(db)
with open('hs.db', 'wb') as f:
    f.write(serialized)
```

```python
# Deserializing (loading from bytes):
db = hyperscan.loadb(serialized)
```

--------------------------------

### Scan for Second Foobar Match

Source: https://github.com/darvid/python-hyperscan/blob/main/docs/usage.md

This snippet shows how to scan a byte string for a pattern and specifies a callback function to handle matches. It is useful for finding specific occurrences within text.

```python
db.scan(b'foobarfoobar', match_event_handler=callback)
```

--------------------------------

### Mirror Semantic-Release Decision Locally

Source: https://github.com/darvid/python-hyperscan/blob/main/docs/releases.md

This command mirrors the semantic-release decision locally by performing a no-op version check with debug verbosity. It helps in understanding if a new semantic version will be cut.

```shell
uv run python -m semantic_release version --noop --verbosity DEBUG
```

--------------------------------

### Compile Hyperscan Database in Literal Mode

Source: https://context7.com/darvid/python-hyperscan/llms.txt

Demonstrates compiling a Hyperscan database in literal mode, where patterns are treated as exact strings rather than regular expressions. This mode offers faster compilation and matching for fixed strings, and special regex characters lose their meaning.

```python
import hyperscan

# Create database for literal matching
db = hyperscan.Database()

# These are treated as literal strings, not regex patterns
# Characters like +, *, . have no special meaning
db.compile(
    expressions=[b'foo+bar', b'test.*pattern', b'[literal]'],
    ids=[1, 2, 3],
    literal=True  # Enable literal mode
)

matches = []
def on_match(pattern_id, from_offset, to_offset, flags, context):
    matches.append((pattern_id, context[from_offset:to_offset]))
    return None

# Must match exactly - '+' and '.*' are literal characters
text = b'foo+bar test.*pattern [literal]'
db.scan(text, match_event_handler=on_match, context=text)

print(matches)
# Output: [(1, b'foo+bar'), (2, b'test.*pattern'), (3, b'[literal]')
```

--------------------------------

### Windows MSVC Runtime Configuration

Source: https://github.com/darvid/python-hyperscan/blob/main/CMakeLists.txt

Configures consistent MSVC runtime flags for Windows builds. Sets the parallel build level if not already defined.

```cmake
set(CMAKE_MSVC_RUNTIME_LIBRARY "MultiThreadedDLL")

set(HS_CMAKE_COMMON_FLAGS "/arch:SSE2 /FS /GS-")

if(NOT CMAKE_BUILD_PARALLEL_LEVEL)
  set(CMAKE_BUILD_PARALLEL_LEVEL 2)
endif()

set(HS_CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${HS_CMAKE_COMMON_FLAGS}")
set(HS_CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${HS_CMAKE_COMMON_FLAGS}")
set(HS_CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS}")
```

--------------------------------

### Streaming Mode Scanning with Database.stream() - Python Hyperscan

Source: https://context7.com/darvid/python-hyperscan/llms.txt

Illustrates scanning data in chunks using the `stream()` method, suitable for large datasets or network traffic. Patterns can span across chunk boundaries, and the context can be overridden for specific chunks.

```python
import hyperscan

# Create database in streaming mode
db = hyperscan.Database(mode=hyperscan.HS_MODE_STREAM)
db.compile(
    expressions=[b'foobar', b'hello world'],
    ids=[1, 2],
    flags=hyperscan.HS_FLAG_SOM_LEFTMOST
)

matches = []

def on_match(pattern_id, from_offset, to_offset, flags, context):
    matches.append((pattern_id, from_offset, to_offset, context))
    return None

# Use stream context manager for chunked processing
with db.stream(match_event_handler=on_match, context='default') as stream:
    # Pattern 'foobar' spans chunks
    stream.scan(b'foo')
    stream.scan(b'bar')  # Match detected here

    # Override context for specific chunk
    stream.scan(b' hello', context='chunk2')
    stream.scan(b' world', context='chunk3')  # Match detected here

print(matches)
# Output: [(1, 0, 6, 'default'), (2, 6, 17, 'chunk3')]

# Get stream state size
with db.stream(match_event_handler=on_match) as stream:
    print(f"Stream state size: {stream.size()} bytes")
```

--------------------------------

### Non-Windows Architecture Flags and SIMDE Backend

Source: https://github.com/darvid/python-hyperscan/blob/main/CMakeLists.txt

Configures compiler flags and SIMDE_BACKEND selection for non-x86 architectures. SIMDE_BACKEND is enabled for ARM to provide SIMD support where native support is lacking. On x86-64, it's disabled to leverage native ISA extensions for better performance.

```cmake
set(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS}")

# Architecture-specific compiler flags and SIMDE_BACKEND selection.
# SIMDE_BACKEND is only enabled for non-x86 architectures (ARM, etc.)
# where vectorscan has no native SIMD support. On x86-64, the native
# backend provides runtime CPU feature detection (SSE4.2/AVX2/AVX512)
# which is critical for performance. Enabling SIMDE_BACKEND on x86-64
# disables all higher ISA code paths and caps performance at SSE2
# level (~10-15x slower). See: https://github.com/darvid/python-hyperscan/issues/253
#
# For macOS cross-compilation (e.g. building x86_64 on ARM runner),
# CMAKE_OSX_ARCHITECTURES reflects the TARGET arch and takes priority
# over CMAKE_SYSTEM_PROCESSOR (which reflects the HOST).
set(HS_USE_SIMDE_BACKEND OFF)
set(_HS_TARGET_ARCH "${CMAKE_SYSTEM_PROCESSOR}")
if(APPLE AND CMAKE_OSX_ARCHITECTURES)
  set(_HS_TARGET_ARCH "${CMAKE_OSX_ARCHITECTURES}")
endif()
if(_HS_TARGET_ARCH MATCHES "(arm|aarch64|arm64)")
  set(HS_CMAKE_COMMON_FLAGS "-fPIC")
  set(HS_USE_SIMDE_BACKEND ON)
elseif(_HS_TARGET_ARCH MATCHES "(x86|X86|amd64|AMD64|x86_64|i[3-6]86)")
  set(HS_CMAKE_COMMON_FLAGS "-march=x86-64 -fPIC")
else()
  set(HS_CMAKE_COMMON_FLAGS "-fPIC")
  set(HS_USE_SIMDE_BACKEND ON)
endif()


set(HS_CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${HS_CMAKE_COMMON_FLAGS}")
set(HS_CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${HS_CMAKE_COMMON_FLAGS} -D_GLIBCXX_USE_CXX11_ABI=1")
```

--------------------------------

### Windows Generator and Architecture Settings

Source: https://github.com/darvid/python-hyperscan/blob/main/CMakeLists.txt

Sets the Visual Studio generator and architecture for Windows builds. Defaults to 'Visual Studio 17 2022' and 'x64' if not using an existing Visual Studio generator.

```cmake
if(NOT ${CMAKE_GENERATOR} MATCHES "^Visual Studio")
  set(HS_GENERATOR "Visual Studio 17 2022")

  # Ensure x64 architecture
  set(HS_CMAKE_ARGS ${HS_CMAKE_ARGS} -A x64)
else()
  set(HS_GENERATOR ${CMAKE_GENERATOR})
endif()
```

--------------------------------

### Test with Coverage

Source: https://github.com/darvid/python-hyperscan/blob/main/CLAUDE.md

Runs tests and generates a coverage report using pytest. The --pyargs flag specifies the package to test.

```bash
pytest --pyargs hyperscan/tests -vvv
```

--------------------------------

### Run All Tests with Pytest

Source: https://github.com/darvid/python-hyperscan/blob/main/CLAUDE.md

Executes all tests in the tests/ directory using pytest. The -vvv flag increases verbosity.

```bash
pytest tests/ -vvv
```

--------------------------------

### Remove Existing Boost and PCRE Directories

Source: https://github.com/darvid/python-hyperscan/blob/main/CMakeLists.txt

Before downloading new versions, this script ensures that any existing Boost or PCRE directories are removed to prevent conflicts.

```cmake
if(EXISTS ${HS_SRC_ROOT}/boost)
    file(REMOVE_RECURSE ${HS_SRC_ROOT}/boost)
  endif()

  if(EXISTS ${HS_SRC_ROOT}/pcre)
    file(REMOVE_RECURSE ${HS_SRC_ROOT}/pcre)
  endif()
```

--------------------------------

### Compile and Scan with Chimera Mode

Source: https://github.com/darvid/python-hyperscan/blob/main/docs/usage.md

Illustrates the use of Chimera mode in Hyperscan, which allows mixing PCRE literals with Hyperscan's multi-pattern engine. It's recommended to reuse a Scratch object per thread when using Chimera mode to prevent reallocations.

```python
chimera_db = hyperscan.Database(chimera=True)
chimera_db.compile(expressions=[br'(foo)+', br'b(ar|az)'])
chimera_db.scan(b'foobaz', match_event_handler=on_match)
```

--------------------------------

### Database.stream() - Streaming Mode Scanning

Source: https://context7.com/darvid/python-hyperscan/llms.txt

Provides a context manager for scanning data that arrives in chunks, ideal for large or streaming data.

```APIDOC
## Database.stream() - Streaming Mode Scanning

### Description
The `stream()` method returns a context manager for scanning data that arrives in chunks. This is ideal for processing network traffic, log files, or any data that cannot be loaded entirely into memory. Patterns can match across chunk boundaries.

### Method
`Database.stream(match_event_handler, context)`

### Parameters
#### Path Parameters
None

#### Query Parameters
None

#### Request Body
- **match_event_handler** (callable) - Required - A function to be called for each match found. It receives `pattern_id`, `from_offset`, `to_offset`, `flags`, and `context`. Returning a truthy value from this handler will stop the scan.
- **context** (any) - Optional - A user-provided object that will be passed to the `match_event_handler` for the initial stream.

### Request Example
```python
import hyperscan
db = hyperscan.Database(mode=hyperscan.HS_MODE_STREAM)
db.compile(
    expressions=[b'foobar', b'hello world'],
    ids=[1, 2],
    flags=hyperscan.HS_FLAG_SOM_LEFTMOST
)
matches = []
def on_match(pattern_id, from_offset, to_offset, flags, context):
    matches.append((pattern_id, from_offset, to_offset, context))
    return None

with db.stream(match_event_handler=on_match, context='default') as stream:
    stream.scan(b'foo')
    stream.scan(b'bar')
    stream.scan(b' hello', context='chunk2')
    stream.scan(b' world', context='chunk3')

print(matches)
```

### Method
`Stream.scan(chunk, context)`

### Description
Scans a chunk of data. The `context` parameter can override the context provided during `stream()` initialization for this specific chunk.

### Parameters
#### Path Parameters
None

#### Query Parameters
None

#### Request Body
- **chunk** (bytes) - Required - The data chunk to scan.
- **context** (any) - Optional - A user-provided object to be passed to the `match_event_handler` for matches found within this chunk.

### Method
`Stream.size()`

### Description
Returns the size of the stream state in bytes.

### Response
#### Success Response (200)
- **size** (int) - The size of the stream state in bytes.

### Response Example
```
Stream state size: 512 bytes
```
```

--------------------------------

### Scratch Class for Thread-Safe Scanning

Source: https://context7.com/darvid/python-hyperscan/llms.txt

Manage memory for scanning operations using the `Scratch` class. Each thread requires its own scratch space, which can be cloned for efficiency.

```python
import hyperscan
from concurrent.futures import ThreadPoolExecutor
```

--------------------------------

### Download and Extract PCRE Library

Source: https://github.com/darvid/python-hyperscan/blob/main/CMakeLists.txt

This snippet downloads a specific version of the PCRE library, verifies its integrity using a SHA256 hash, and extracts it into the hyperscan vendor directory.

```cmake
message(STATUS "Downloading PCRE ${PCRE_VERSION}")
  file(
    DOWNLOAD https://sourceforge.net/projects/pcre/files/pcre/${PCRE_VERSION}/pcre-${PCRE_VERSION}.tar.bz2
    ${hyperscan_VENDOR_DIR}/pcre-${PCRE_VERSION}.tar.bz2
    EXPECTED_HASH SHA256=4dae6fdcd2bb0bb6c37b5f97c33c2be954da743985369cddac3546e3218bffb8
  )

  if(EXISTS ${hyperscan_VENDOR_DIR}/pcre)
    file(REMOVE_RECURSE ${hyperscan_VENDOR_DIR}/pcre)
  endif()

  file(
    ARCHIVE_EXTRACT INPUT ${hyperscan_VENDOR_DIR}/pcre-${PCRE_VERSION}.tar.bz2 DESTINATION ${hyperscan_VENDOR_DIR} PATTERNS "pcre-${PCRE_VERSION}/*"
  )
  file(RENAME ${hyperscan_VENDOR_DIR}/pcre-${PCRE_VERSION} ${hyperscan_VENDOR_DIR}/pcre)
  message(STATUS "PCRE downloaded to ${hyperscan_VENDOR_DIR}/pcre")
```

--------------------------------

### Write Patched PCRE CMakeLists.txt

Source: https://github.com/darvid/python-hyperscan/blob/main/CMakeLists.txt

After applying all necessary patches, this command writes the modified content back to the PCRE CMakeLists.txt file if any changes were made.

```cmake
if(_any_patch_applied)
      file(WRITE ${PCRE_CMAKE_FILE} "${_pcre_cmake_content_current}")
      message(STATUS "Final patched PCRE CMakeLists.txt written to ${PCRE_CMAKE_FILE} as changes were applied.")
    else()
      message(STATUS "No changes made to PCRE CMakeLists.txt after attempting all patches, file not rewritten.")
    endif()
```

--------------------------------

### Database.scan() - Block Mode Scanning

Source: https://context7.com/darvid/python-hyperscan/llms.txt

Performs pattern matching against a complete block of text using the compiled database.

```APIDOC
## Database.scan() - Block Mode Scanning

### Description
The `scan()` method performs pattern matching against a complete block of text. It invokes the match callback for each match found, passing the pattern ID, start offset (if SOM enabled), end offset, flags, and optional context object. Returning a truthy value from the callback halts scanning.

### Method
`Database.scan(text, match_event_handler, context)`

### Parameters
#### Path Parameters
None

#### Query Parameters
None

#### Request Body
- **text** (bytes) - Required - The block of text to scan.
- **match_event_handler** (callable) - Required - A function to be called for each match found. It receives `pattern_id`, `from_offset`, `to_offset`, `flags`, and `context`. Returning a truthy value from this handler will stop the scan.
- **context** (any) - Optional - A user-provided object that will be passed to the `match_event_handler`.

### Request Example
```python
import hyperscan
db = hyperscan.Database()
db.compile(
    expressions=[b'foo', b'bar', b'baz'],
    ids=[1, 2, 3],
    flags=hyperscan.HS_FLAG_SOM_LEFTMOST
)
matches = []
def on_match(pattern_id, from_offset, to_offset, flags, context):
    matches.append({
        'id': pattern_id,
        'start': from_offset,
        'end': to_offset,
        'text': context[from_offset:to_offset] if context else None
    })
    return None
text = b'hello foo and bar and baz world'
db.scan(text, match_event_handler=on_match, context=text)
print(matches)
```

### Response
#### Success Response (200)
- **matches** (list) - A list containing details of each match found, as populated by the `match_event_handler`.

#### Response Example
```json
[
    {'id': 1, 'start': 6, 'end': 9, 'text': b'foo'},
    {'id': 2, 'start': 14, 'end': 17, 'text': b'bar'},
    {'id': 3, 'start': 22, 'end': 25, 'text': b'baz'}
]
```
```

--------------------------------

### Linux Specific Linking Options for manylinux2014

Source: https://github.com/darvid/python-hyperscan/blob/main/CMakeLists.txt

Configures linker options for the Python Hyperscan extension on manylinux2014 Docker images. Includes flags to control symbol visibility and library handling.

```cmake
target_link_options(${HS_EXT_NAME} PRIVATE
  -Wl,--no-as-needed
  -Wl,--copy-dt-needed-entries
  -Wl,--no-allow-shlib-undefined
  -Wl,--exclude-libs,ALL
)
target_link_libraries(${HS_EXT_NAME} PRIVATE ${HS_LIBS})
target_link_libraries(${HS_EXT_NAME} PRIVATE -Wl,--push-state -Wl,-Bstatic -lstdc++ -Wl,--pop-state)
```

--------------------------------

### Compile Hyperscan Database

Source: https://github.com/darvid/python-hyperscan/blob/main/docs/usage.md

Compiles a database of regular expressions with optional IDs and flags. Ensure the Hyperscan engine version is 5.4 or newer.

```python
import hyperscan

db = hyperscan.Database()
patterns = (
    # expression,  id, flags
    (br'fo+',      0,  0),
    (br'^foobar$', 1,  hyperscan.HS_FLAG_CASELESS),
    (br'BAR',      2,  hyperscan.HS_FLAG_CASELESS
                       | hyperscan.HS_FLAG_SOM_LEFTMOST),
)
expressions, ids, flags = zip(*patterns)
db.compile(
    expressions=expressions, ids=ids, elements=len(patterns), flags=flags
)
print(db.info().decode())
# Version: 5.4.12 Features: AVX2 Mode: BLOCK
```

--------------------------------

### Format Python Code

Source: https://github.com/darvid/python-hyperscan/blob/main/CLAUDE.md

Formats Python code in the 'src/' directory using black. This command modifies files in-place.

```bash
black src/
```

--------------------------------

### Locate Ragel Executable

Source: https://github.com/darvid/python-hyperscan/blob/main/CMakeLists.txt

A CMake function to find the Ragel executable and determine its version. It checks if the found version meets a minimum requirement if specified.

```cmake
function(hs_locate_ragel)
  set(options)
  set(oneValueArgs VERSION)
  set(multiValueArgs)
  cmake_parse_arguments(HS_R "${options}" "${oneValueArgs}" "${multiValueArgs}" ${ARGN})

  find_program(_ragel_executable NAMES ragel)

  if(NOT _ragel_executable)
    set(RAGEL_FOUND FALSE PARENT_SCOPE)
    return()
  endif()

  set(_ragel_version "")
  execute_process(
    COMMAND ${_ragel_executable} --version
    OUTPUT_VARIABLE _ragel_stdout
    ERROR_VARIABLE _ragel_stderr
    OUTPUT_STRIP_TRAILING_WHITESPACE
    RESULT_VARIABLE _ragel_result)

  if(_ragel_result EQUAL 0)
    string(REGEX MATCH "([0-9]+\\.[0-9]+(\\.[0-9]+)?)" _ragel_version "${_ragel_stdout}")
  endif()

  if(NOT _ragel_version)
    set(_ragel_version "0.0")
  endif()

  if(HS_R_VERSION AND _ragel_version VERSION_LESS HS_R_VERSION)
    message(STATUS "Found ragel ${_ragel_version} at ${_ragel_executable} but ${HS_R_VERSION}+ is required")
    set(RAGEL_FOUND FALSE PARENT_SCOPE)
    return()
  endif()

  set(RAGEL_EXECUTABLE "${_ragel_executable}" PARENT_SCOPE)
  set(RAGEL_VERSION "${_ragel_version}" PARENT_SCOPE)
  set(RAGEL_FOUND TRUE PARENT_SCOPE)
endfunction()
```

--------------------------------

### Scan Text in Streaming Mode

Source: https://github.com/darvid/python-hyperscan/blob/main/docs/usage.md

Uses the Stream context manager to scan data in chunks. Match handlers and context objects can be provided at initialization or overridden per scan.

```python
with db.stream(match_event_handler=on_match, context=2345) as stream:
    stream.scan(b'foobar')
    # Override context only for one chunk
    stream.scan(b'barfoofoobarbarfoobar', context=1234)
    # Override match handler only for one chunk
    stream.scan(b'qux', match_event_handler=on_qux_match)
```

--------------------------------

### Linux Specific Linking Options (Non-manylinux2014)

Source: https://github.com/darvid/python-hyperscan/blob/main/CMakeLists.txt

Configures linker options for the Python Hyperscan extension on other Linux platforms, such as local development or CI environments without auditwheel. It uses flags to control symbol visibility.

```cmake
target_link_options(${HS_EXT_NAME} PRIVATE
  -Wl,--no-as-needed
  -Wl,--exclude-libs,ALL
)
target_link_libraries(${HS_EXT_NAME} PRIVATE ${HS_LIBS})
target_link_libraries(${HS_EXT_NAME} PRIVATE stdc++)
```

--------------------------------

### Add Custom Target for Ragel

Source: https://github.com/darvid/python-hyperscan/blob/main/CMakeLists.txt

Configures a custom target for the Ragel executable. This is used to ensure the Ragel tool is available and its version is checked.

```cmake
message(STATUS "Ragel executable: ${RAGEL_EXECUTABLE}")
add_custom_target(ragel COMMAND ${RAGEL_EXECUTABLE} -V)
```

```cmake
set(RAGEL_EXECUTABLE ${RAGEL_EXECUTABLE})
add_custom_target(ragel COMMAND ${RAGEL_EXECUTABLE} -V)
message(STATUS "Ragel executable: ${RAGEL_EXECUTABLE}")
```

--------------------------------

### Link Libraries for Python Hyperscan Extension

Source: https://github.com/darvid/python-hyperscan/blob/main/CMakeLists.txt

Specifies the libraries to link against for the Python Hyperscan extension. This is a common configuration step in build systems.

```cmake
target_link_libraries(${HS_EXT_NAME} PRIVATE ${HS_LIBS})
target_link_libraries(${HS_EXT_NAME} PRIVATE c++)
```

--------------------------------

### Set CMake Policy Version

Source: https://github.com/darvid/python-hyperscan/blob/main/CMakeLists.txt

Ensures downstream projects opt into modern policy behavior by setting the CMake policy version. Requires CMake 3.5 or higher.

```cmake
set(_cmake_policy_version "${CMAKE_MAJOR_VERSION}.${CMAKE_MINOR_VERSION}")
set(HS_CMAKE_CACHE_ARGS
  -DCMAKE_POLICY_VERSION:STRING=${_cmake_policy_version}
  -DCMAKE_POLICY_VERSION_MINIMUM:STRING=3.5
)
```

--------------------------------

### Windows Specific Linking for Python Hyperscan Extension

Source: https://github.com/darvid/python-hyperscan/blob/main/CMakeLists.txt

Specifies the libraries to link against for the Python Hyperscan extension on Windows. This is a simplified configuration for the Windows platform.

```cmake
target_link_libraries(${HS_EXT_NAME} PRIVATE ${HS_LIBS})
```

--------------------------------

### Database Class - Core Pattern Database

Source: https://context7.com/darvid/python-hyperscan/llms.txt

The Database class is used to compile and store regular expression patterns. It supports block, stream, and vectored scanning modes.

```APIDOC
## Database Class - Core Pattern Database

### Description
The `Database` class is the primary interface for compiling and storing regular expression patterns. It supports three scanning modes: block mode for complete text blocks, stream mode for processing data in chunks, and vectored mode for scanning multiple non-contiguous buffers. The database automatically manages scratch space for efficient scanning.

### Method
`hyperscan.Database()`

### Parameters
#### Initialization Parameters
- **mode** (int) - Optional - The scanning mode (e.g., `hyperscan.HS_MODE_BLOCK`, `hyperscan.HS_MODE_STREAM`). Defaults to `hyperscan.HS_MODE_BLOCK`.

### Method
`Database.compile()`

### Description
Compiles a list of regular expressions into the database.

### Parameters
#### Path Parameters
None

#### Query Parameters
None

#### Request Body
- **expressions** (list of bytes) - Required - A list of regular expression patterns to compile.
- **ids** (list of int) - Required - A list of unique identifiers for each pattern.
- **elements** (int) - Required - The total number of patterns.
- **flags** (int) - Optional - Flags to apply to the patterns (e.g., `hyperscan.HS_FLAG_CASELESS`).

### Request Example
```python
import hyperscan
db = hyperscan.Database()
patterns = [
    (b'fo+',      0, 0),
    (b'^foobar$', 1, hyperscan.HS_FLAG_CASELESS),
]
expressions, ids, flags = zip(*patterns)
db.compile(
    expressions=expressions,
    ids=ids,
    elements=len(patterns),
    flags=flags
)
```

### Method
`Database.info()`

### Description
Retrieves information about the compiled database.

### Response
#### Success Response (200)
- **info** (bytes) - A byte string containing database information (e.g., version, features, mode).

### Response Example
```
Version: 5.4.12 Features: AVX2 Mode: BLOCK
```

### Method
`Database.size()`

### Description
Returns the size of the compiled database in bytes.

### Response
#### Success Response (200)
- **size** (int) - The size of the database in bytes.

### Response Example
```
Database size: 1234 bytes
```
```

--------------------------------

### Run Specific Test with Pytest

Source: https://github.com/darvid/python-hyperscan/blob/main/CLAUDE.md

Runs a specific test case using pytest. Replace 'test_hyperscan.py::test_name' with the actual test file and function name.

```bash
pytest tests/test_hyperscan.py::test_name -vvv
```

--------------------------------

### Chimera Mode Scanning with Capture Groups

Source: https://context7.com/darvid/python-hyperscan/llms.txt

Utilize Chimera mode for PCRE compatibility, supporting features like capture groups. The match handler receives captured group information.

```python
import hyperscan

# Create database with Chimera support and capture groups
db = hyperscan.Database(chimera=True, mode=hyperscan.CH_MODE_GROUPS)

# Compile patterns using Chimera flags
db.compile(
    expressions=[b'(foo)+', b'b(ar|az)'],
    ids=[1, 2],
    flags=[
        hyperscan.CH_FLAG_CASELESS,
        hyperscan.CH_FLAG_CASELESS
    ]
)

matches = []

def on_chimera_match(pattern_id, from_offset, to_offset, flags, captured, context):
    """
    Chimera callback includes captured groups.

    Args:
        captured: List of (group_id, start, end) tuples for capture groups
    """
    matches.append({
        'id': pattern_id,
        'start': from_offset,
        'end': to_offset,
        'groups': captured
    })
    return None

db.scan(b'foofoobar', match_event_handler=on_chimera_match)

print(matches)
# Output: [{'id': 1, 'start': 0, 'end': 6, 'groups': [(1, 3, 6)]},
#          {'id': 2, 'start': 6, 'end': 9, 'groups': [(1, 7, 9)]}]
```