### Install Ratiopath Package

Source: https://github.com/rationai/ratiopath/blob/main/docs/learn/get-started/quick-start/index.md

Installs the Ratiopath Python package using pip. This is the first step to using the library for image analysis.

```bash
pip install "ratiopath"
```

--------------------------------

### Minimal Ratiopath Tiling Pipeline (Python)

Source: https://github.com/rationai/ratiopath/blob/main/docs/learn/get-started/quick-start/index.md

Demonstrates a minimal pipeline to read whole-slide images and generate tile metadata using Ratiopath. It utilizes `read_slides` for image loading and `grid_tiles` for defining tile coordinates. The output is a list of dictionaries, each representing a tile with its slide ID and coordinates.

```python
from ratiopath.ray import read_slides
from ratiopath.tiling import grid_tiles

slides = read_slides("data", mpp=0.25, tile_extent=1024, stride=960)

def tiling(row):
    return [
        yield {
            "slide_id": row["id"],
            "tile_x": x,
            "tile_y": y,
            "level": row["level"],
        }
    for x, y in grid_tiles(
        (row["extent_x"], row["extent_y"]),
        (row["tile_extent_x"], row["tile_extent_y"]),
        (row["stride_x"], row["stride_y"]),
        last="keep",
    )
    ]
        

tiles = slides.flat_map(tiling)
tiles.show(5)
```

--------------------------------

### Estimate Stain Vectors with Macenko Method (Python)

Source: https://context7.com/rationai/ratiopath/llms.txt

Provides a Python example for estimating stain vectors from histopathology images using the Macenko method. It demonstrates how to use the `estimate_stain_vectors` function with various parameters and shows how to access pre-defined and custom stain matrices.

```python
from ratiopath.augmentations.estimate_stain_vectors import (
    estimate_stain_vectors, HE, HDAB, make_residual_stain, HEMATOXYLIN, EOSIN
)
import numpy as np

# Load a histopathology tile
image = np.random.randint(0, 255, (512, 512, 3), dtype=np.uint8)

# Estimate stain vectors using Macenko method
stain_vectors = estimate_stain_vectors(
    image=image,
    default_stain_vectors=HE,  # Use H&E as reference
    i0=256,                     # Normalization intensity
    min_stain=0.05,             # Minimum OD threshold
    max_stain=1.0,              # Maximum OD threshold
    alpha=0.01,                 # Percentile for extreme angles
)

print(f"Estimated stain vectors shape: {stain_vectors.shape}")  # (3, 3)
print(f"Stain 1 (Hematoxylin): {stain_vectors[0]}")
print(f"Stain 2 (Eosin): {stain_vectors[1]}")
print(f"Residual stain: {stain_vectors[2]}")

# Pre-defined stain matrices
print(f"Standard H&E matrix:\n{HE}")
print(f"Standard H-DAB matrix:\n{HDAB}")

# Create custom stain matrix
custom_stain1 = np.array([0.65, 0.70, 0.29])
custom_stain2 = np.array([0.07, 0.99, 0.11])
residual = make_residual_stain(custom_stain1, custom_stain2)
custom_matrix = np.array([custom_stain1, custom_stain2, residual])
```

--------------------------------

### Generate Row Hash for Dataset IDs (Python)

Source: https://context7.com/rationai/ratiopath/llms.txt

Illustrates the use of the `row_hash` utility function from `ratiopath.tiling.utils` to generate unique hash identifiers for dataset rows. Examples include default usage, custom column names, and different hashing algorithms.

```python
from ratiopath.tiling.utils import row_hash
import hashlib
import ray # Assuming 'slides' is a Ray dataset

# Add unique ID to each slide
# slides = slides.map(row_hash, num_cpus=0.1, memory=128 * 1024**2)

# Custom column name and algorithm
# slides_custom = slides.map(
#     lambda row: row_hash(row, column="slide_hash", algorithm=hashlib.md5),
#     num_cpus=0.1,
# )

# The hash is based on all row contents
sample_row = {"path": "/data/slide.svs", "extent_x": 1000, "extent_y": 2000}
hashed_row = row_hash(sample_row)
print(f"Generated ID: {hashed_row['id']}")  # SHA256 hash string
```

--------------------------------

### Save Images as TIFF using VipsTiffDatasink (Python)

Source: https://context7.com/rationai/ratiopath/llms.txt

Demonstrates how to use the `VipsTiffDatasink` from `ratiopath.ray.datasource` to efficiently save image data as TIFF files with libvips. It covers default options and per-row specific TIFF configurations.

```python
from ratiopath.ray.datasource import VipsTiffDatasink
import ray
import numpy as np

# Prepare dataset with image data
tiles = ray.data.from_items([
    {"id": "tile_001", "image": np.random.randint(0, 255, (512, 512, 3), dtype=np.uint8)},
    {"id": "tile_002", "image": np.random.randint(0, 255, (512, 512, 3), dtype=np.uint8)},
])

# Create datasink with default TIFF options
datasink = VipsTiffDatasink(
    path="output_tiles/",
    data_column="image",              # Column containing numpy arrays
    options_column="tiff_options",    # Optional: per-row TIFF options
    default_options={
        "compression": "lzw",
        "tile": True,
        "tile_width": 256,
        "tile_height": 256,
    },
)

# Write tiles to TIFF files
tiles.write_datasink(datasink)

# With per-row options
tiles_with_options = tiles.map(lambda row: {
    **row,
    "tiff_options": {"compression": "jpeg", "Q": 85}
})
tiles_with_options.write_datasink(datasink)
```

--------------------------------

### TiffFile Wrapper for OME-TIFF Handling (Python)

Source: https://context7.com/rationai/ratiopath/llms.txt

This snippet illustrates using the TiffFile wrapper from ratiopath for reading OME-TIFF files. It covers accessing multi-resolution levels, finding the closest resolution, and reading tiles efficiently using zarr. Key dependencies are ratiopath, tifffile, and zarr.

```python
from ratiopath.tifffile import TiffFile

with TiffFile("/path/to/slide.ome.tiff") as slide:
    # Get number of resolution levels
    num_levels = slide.levels()
    print(f"Available levels: {num_levels}")

    # Find the level closest to desired resolution
    target_mpp = 0.25
    best_level = slide.closest_level(mpp=target_mpp)

    # Get resolution at specific level
    resolution = slide.slide_resolution(level=best_level)
    print(f"Resolution at level {best_level}: {resolution} µm/px")

    # Access specific page for zarr-based reading
    page = slide.get_main_page(level=best_level)

    # Read tile using zarr for efficient access
    import zarr
    z = zarr.open(page.aszarr(), mode="r")
    tile = z[0:512, 0:512]  # Read 512x512 region
    print(f"Tile shape: {tile.shape}")
```

--------------------------------

### OpenSlide Wrapper for WSI Handling (Python)

Source: https://context7.com/rationai/ratiopath/llms.txt

This snippet demonstrates using the OpenSlide wrapper from ratiopath for reading Whole Slide Images (WSI). It shows how to find the best resolution level, read image regions with relative coordinates, and convert them to NumPy arrays. Dependencies include ratiopath, numpy, and Pillow.

```python
from ratiopath.openslide import OpenSlide

with OpenSlide("/path/to/slide.svs") as slide:
    # Find the level closest to desired resolution (microns per pixel)
    target_mpp = 0.5
    best_level = slide.closest_level(mpp=target_mpp)
    print(f"Best level for {target_mpp} µm/px: {best_level}")

    # Get actual resolution at that level
    actual_mpp = slide.slide_resolution(level=best_level)
    print(f"Actual resolution: {actual_mpp} µm/px")

    # Read a region with coordinates relative to the specified level
    # (automatically scales coordinates based on level downsample)
    region = slide.read_region_relative(
        location=(1000, 2000),  # Coordinates at the target level
        level=best_level,
        size=(512, 512),
    )

    # Convert to RGB numpy array
    import numpy as np
    from PIL import Image
    rgb = Image.alpha_composite(
        Image.new("RGBA", region.size, (255, 255, 255)), region
    ).convert("RGB")
    tile_array = np.asarray(rgb)
    print(f"Tile shape: {tile_array.shape}")  # (512, 512, 3)
```

--------------------------------

### Create Stain Augmentor with Fixed H&E Matrix (Python)

Source: https://context7.com/rationai/ratiopath/llms.txt

Demonstrates how to create a StainAugmentor instance using a fixed H&E stain matrix for image augmentation. It shows applying the augmentor to a random image and combining it with other Albumentations transforms.

```python
from ratiopath.augmentations.stain_augmentor import StainAugmentor
from ratiopath.augmentations.stain_augmentor import HE
import numpy as np
import albumentations as A

# Create stain augmentor with fixed H&E stain matrix
augmentor = StainAugmentor(
    conv_matrix=HE,       # Pre-defined H&E stain vectors
    alpha=0.02,           # Multiplicative augmentation range
    beta=0.02,            # Additive augmentation range
    p=0.5,                # Probability of applying augmentation
)

# Apply to image
image = np.random.randint(0, 255, (512, 512, 3), dtype=np.uint8)
augmented = augmentor(image=image)["image"]

# Use with adaptive stain estimation per image
def adaptive_stain_matrix(image):
    # Assuming estimate_stain_vectors is imported and available
    # from ratiopath.augmentations.estimate_stain_vectors import estimate_stain_vectors, HE
    return estimate_stain_vectors(image, default_stain_vectors=HE)

adaptive_augmentor = StainAugmentor(
    conv_matrix=adaptive_stain_matrix,  # Callable for per-image estimation
    alpha=0.02,
    beta=0.02,
)

# Combine with other albumentations transforms
transform = A.Compose([
    A.RandomRotate90(p=0.5),
    A.HorizontalFlip(p=0.5),
    augmentor,
])

result = transform(image=image)["image"]
```

--------------------------------

### Read Slides into Ray Dataset - Python

Source: https://context7.com/rationai/ratiopath/llms.txt

Creates a Ray Dataset from whole-slide image files, reading metadata and preparing tiling parameters. It automatically selects the best slide level based on specified microns per pixel (mpp) resolution and returns a dataset with slide metadata for tiled processing. Dependencies include the ratiopath.ray library.

```python
from ratiopath.ray import read_slides

# Read slides from a directory with specified resolution and tile parameters
slides = read_slides(
    paths="data/",
    tile_extent=1024,           # Tile size in pixels (can be tuple for different x/y)
    stride=960,                 # Step size between tiles (1024-64 for overlap)
    mpp=0.25,                   # Target resolution in microns per pixel
)

# View the dataset schema
slides.schema()
# Column         Type
# ------         ----
# path           string
# extent_x       int64
# extent_y       int64
# tile_extent_x  int64
# tile_extent_y  int64
# stride_x       int64
# stride_y       int64
# mpp_x          double
# mpp_y          double
# level          int64
# downsample     double

# Preview the data
slides.show(2)
# {'path': '/abs/path/slide1.svs', 'extent_x': 84320, 'extent_y': 61120, ...}
```

--------------------------------

### Repartition Dataset for Parallelism

Source: https://github.com/rationai/ratiopath/blob/main/docs/learn/get-started/quick-start/tiling.md

Shuffles the dataset rows (tiles) to distribute them more evenly across smaller blocks. This improves parallelism for subsequent processing steps by allowing Ray to spread work across more CPU cores.

```python
tiles = tiles.repartition(target_num_rows_per_block=128)
```

--------------------------------

### ASAP XML Annotation Parsing (Python)

Source: https://context7.com/rationai/ratiopath/llms.txt

This Python snippet shows how to use the ASAPParser from ratiopath to parse ASAP XML annotation files. It demonstrates extracting all polygons, filtering them by name and group using regular expressions, and retrieving point annotations. The primary dependency is ratiopath, with shapely used for polygon operations.

```python
from ratiopath.parsers import ASAPParser

# Parse ASAP XML annotation file
parser = ASAPParser("/path/to/annotations.xml")

# Get all polygon annotations
all_polygons = list(parser.get_polygons())
print(f"Total polygons: {len(all_polygons)}")

# Filter polygons by name and group using regex
tumor_polygons = list(parser.get_polygons(
    name="tumor.*",              # Match names starting with "tumor"
    part_of_group="malignant",   # Match group containing "malignant"
))

# Get point annotations
mitosis_points = list(parser.get_points(
    name="mitosis",
    part_of_group=".*",  # Any group
))

# Use polygons for tile annotation coverage
from shapely import Polygon
for polygon in tumor_polygons:
    print(f"Polygon area: {polygon.area}, bounds: {polygon.bounds}")
```

--------------------------------

### read_slides

Source: https://context7.com/rationai/ratiopath/llms.txt

Creates a Ray Dataset from whole-slide image files, reading metadata and preparing tiling parameters. It selects the best slide level based on the specified microns per pixel (mpp) resolution.

```APIDOC
## POST /api/read_slides

### Description
Creates a Ray Dataset from whole-slide image files, reading metadata and preparing tiling parameters. This function automatically selects the best slide level based on the specified microns per pixel (mpp) resolution and returns a dataset where each row corresponds to a single slide with all metadata needed for subsequent tiled processing.

### Method
POST

### Endpoint
/api/read_slides

### Parameters
#### Query Parameters
- **paths** (string) - Required - Path to the WSI files or directory.
- **tile_extent** (integer or tuple) - Optional - Tile size in pixels (can be tuple for different x/y).
- **stride** (integer or tuple) - Optional - Step size between tiles (e.g., 1024-64 for overlap).
- **mpp** (float) - Optional - Target resolution in microns per pixel.

### Request Example
```python
from ratiopath.ray import read_slides

slides = read_slides(
    paths="data/",
    tile_extent=1024,
    stride=960,
    mpp=0.25,
)
```

### Response
#### Success Response (200)
- **dataset** (Ray Dataset) - A Ray Dataset where each row contains slide metadata including path, dimensions, tile parameters, resolution, level, and downsample factor.

#### Response Example
```json
{
  "schema": [
    {"column": "path", "type": "string"},
    {"column": "extent_x", "type": "int64"},
    {"column": "extent_y", "type": "int64"},
    {"column": "tile_extent_x", "type": "int64"},
    {"column": "tile_extent_y", "type": "int64"},
    {"column": "stride_x", "type": "int64"},
    {"column": "stride_y", "type": "int64"},
    {"column": "mpp_x", "type": "double"},
    {"column": "mpp_y", "type": "double"},
    {"column": "level", "type": "int64"},
    {"column": "downsample", "type": "double"}
  ],
  "preview": [
    {"path": "/abs/path/slide1.svs", "extent_x": 84320, "extent_y": 61120, "tile_extent_x": 1024, "tile_extent_y": 1024, "stride_x": 960, "stride_y": 960, "mpp_x": 0.25, "mpp_y": 0.25, "level": 0, "downsample": 1.0}
  ]
}
```
```

--------------------------------

### Build Histopathology Tiling Pipeline with Python and Ray Data

Source: https://github.com/rationai/ratiopath/blob/main/docs/learn/get-started/quick-start/tiling.md

This Python script defines a complete tiling pipeline using Ray Data and ratiopath. It reads slide metadata, generates grid tiles, filters out background tiles based on standard deviation, and saves the resulting tiles to a Parquet file. The pipeline is designed for scalability and efficiency in processing large whole-slide images.

```python
from typing import Any

from ray.data.expressions import col

from ratiopath.ray import read_slides
from ratiopath.tiling import grid_tiles, read_slide_tiles
from ratiopath.tiling.utils import row_hash


def tiling(row: dict[str, Any]) -> list[dict[str, Any]]:
    return [
        {
            "tile_x": x,
            "tile_y": y,
            "path": row["path"],
            "slide_id": row["id"],
            "level": row["level"],
            "tile_extent_x": row["tile_extent_x"],
            "tile_extent_y": row["tile_extent_y"],
        }
        for x, y in grid_tiles(
            slide_extent=(row["extent_x"], row["extent_y"]),
            tile_extent=(row["tile_extent_x"], row["tile_extent_y"]),
            stride=(row["stride_x"], row["stride_y"]),
            last="keep",
        )
    ]


if __name__ == "__main__":
    slides = read_slides("data", mpp=0.25, tile_extent=1024, stride=1024 - 64)

    slides = slides.map(row_hash, num_cpus=0.1, memory=128 * 1024**2)
    slides.write_parquet("slides")

    tiles = slides.flat_map(tiling, num_cpus=0.2, memory=128 * 1024**2).repartition(
        target_num_rows_per_block=128
    )

    tissue_tiles = tiles.with_column(
        "tile",
        read_slide_tiles(
            col("path"),
            col("tile_x"),
            col("tile_y"),
            col("tile_extent_x"),
            col("tile_extent_y"),
            col("level"),
        ),
        num_cpus=1,
        memory=4 * 1024**3,
    ).filter(lambda row: row["tile"].std() > 8)

    tissue_tiles = tissue_tiles.drop_columns(
        ["tile", "path", "level", "tile_extent_x", "tile_extent_y"]
    )

    tissue_tiles.write_parquet("tiles")

```

--------------------------------

### Parse ASAP Annotation Files with ratiopath

Source: https://github.com/rationai/ratiopath/blob/main/docs/learn/get-started/quick-start/annotations.md

Parses ASAP XML annotation files to extract polygon data. It requires the path to the annotation file and optionally accepts regular expressions to filter annotations by name and group. The output is a list of annotation polygons.

```python
from ratiopath.parsers import ASAPParser

annotation_path = row["path"].replace(".mrxs", ".xml")
parser = ASAPParser(annotation_path)
annotations = list(parser.get_polygons(name="...", part_of_group="..."))
```

--------------------------------

### Compute Tile Annotation Coverage with Python

Source: https://context7.com/rationai/ratiopath/llms.txt

Computes annotation coverage for tiles by intersecting tile regions with annotation polygons. It parses annotations, defines a region of interest (ROI) for each tile, generates tile coordinates, and then calculates the area of intersection between annotations and tiles. The output is a list of dictionaries, each containing tile coordinates, slide information, and the coverage fraction.

```python
from ratiopath.tiling import tile_annotations, grid_tiles
from ratiopath.parsers import ASAPParser
from shapely import Polygon
import numpy as np

def tiling_with_annotations(row):
    # Parse annotations for this slide
    annotation_path = row["path"].replace(".svs", ".xml")
    parser = ASAPParser(annotation_path)
    annotations = list(parser.get_polygons(name="tumor.*"))

    # Define ROI (region of interest) covering full tile
    roi = Polygon([
        (0, 0),
        (row["tile_extent_x"], 0),
        (row["tile_extent_x"], row["tile_extent_y"]),
        (0, row["tile_extent_y"]),
    ])

    # Generate tile coordinates
    coordinates = np.array(list(grid_tiles(
        slide_extent=(row["extent_x"], row["extent_y"]),
        tile_extent=(row["tile_extent_x"], row["tile_extent_y"]),
        stride=(row["stride_x"], row["stride_y"]),
        last="keep",
    )))

    # Compute annotation intersection for each tile
    return [
        {
            "tile_x": int(coordinates[i, 0]),
            "tile_y": int(coordinates[i, 1]),
            "path": row["path"],
            "slide_id": row["id"],
            "coverage": polygon.area / roi.area,  # Fraction covered by annotations
        }
        for i, polygon in enumerate(tile_annotations(
            annotations=annotations,
            roi=roi,
            coordinates=coordinates,
            downsample=row["downsample"],
        ))
    ]

# Apply to dataset
tiles_with_coverage = slides.flat_map(tiling_with_annotations)
# Filter for tiles with significant annotation coverage
annotated_tiles = tiles_with_coverage.filter(lambda t: t["coverage"] > 0.5)
```

--------------------------------

### Apply Stain Augmentation with Python

Source: https://context7.com/rationai/ratiopath/llms.txt

Applies stain augmentation to histopathological images using the Tellez et al. method. This function is compatible with the albumentations pipeline for data augmentation. It requires importing `StainAugmentor` and optionally `estimate_stain_vectors` for stain vector estimation.

```python
from ratiopath.augmentations import StainAugmentor
from ratiopath.augmentations.estimate_stain_vectors import HE, estimate_stain_vectors
import albumentations as A
import numpy as np
```

--------------------------------

### Create Unique Slide IDs and Save Metadata

Source: https://github.com/rationai/ratiopath/blob/main/docs/learn/get-started/quick-start/tiling.md

Generates a unique hash ID for each slide using the `row_hash` function from `ratiopath.tiling.utils` and applies it to every row in the Ray Dataset using `.map()`. The resulting slide-level metadata, including the new hash ID, is then saved to a Parquet file named 'slides'. This action triggers the execution of the Ray data processing plan.

```python
from ratiopath.tiling.utils import row_hash

slides = slides.map(row_hash, num_cpus=0.1, memory=128 * 1024**2)
slides.write_parquet("slides")
```

--------------------------------

### Apply Annotation Coverage Function with Ray Data

Source: https://github.com/rationai/ratiopath/blob/main/docs/learn/get-started/quick-start/annotations.md

Applies a custom function, `tiling_with_annotations`, to a dataset of tiles using Ray Data's `flat_map` transformation. This integrates the annotation coverage calculation into a distributed data processing pipeline.

```python
tiles = slides.flat_map(tiling_with_annotations)
```

--------------------------------

### Read Slide Tiles with Ray - Python

Source: https://context7.com/rationai/ratiopath/llms.txt

Reads batches of tiles from whole-slide images using OpenSlide or tifffile backends. This Ray UDF expression enables efficient batch processing with automatic file format detection and caching. Dependencies include ratiopath.tiling and ray.data.expressions.

```python
from ratiopath.tiling import read_slide_tiles
from ray.data.expressions import col

# Add tile pixel data to the dataset
tiles_with_pixels = tiles.with_column(
    "tile",
    read_slide_tiles(
        col("path"),           # Path to the WSI file
        col("tile_x"),         # X coordinate of tile
        col("tile_y"),         # Y coordinate of tile
        col("tile_extent_x"),  # Width of tile
        col("tile_extent_y"),  # Height of tile
        col("level"),          # Slide pyramid level
    ),
    num_cpus=1,
    memory=4 * 1024**3,
)

# Filter tiles based on content (e.g., tissue vs background)
tissue_tiles = tiles_with_pixels.filter(
    lambda row: row["tile"].std() > 8  # Keep tiles with high variance (tissue)
)

# Save filtered tile coordinates
tissue_tiles.drop_columns(["tile", "path", "level", "tile_extent_x", "tile_extent_y"])
tissue_tiles.write_parquet("tiles")
```

--------------------------------

### Save Filtered Tile Data to Parquet

Source: https://github.com/rationai/ratiopath/blob/main/docs/learn/get-started/quick-start/tiling.md

Saves the filtered tile information to disk in Parquet format. It drops unnecessary columns like raw pixel data and redundant metadata to reduce file size, retaining only essential information for the final output.

```python
tissue_tiles = tissue_tiles.drop_columns(
    ["tile", "path", "level", "tile_extent_x", "tile_extent_y"]
)

tissue_tiles.write_parquet("tiles")
```

--------------------------------

### Complete End-to-End Pipeline for Pathology Image Processing

Source: https://context7.com/rationai/ratiopath/llms.txt

This Python script demonstrates a full pipeline for processing whole-slide pathology images. It reads slide metadata, generates tile coordinates, filters for tissue-rich tiles, and saves the results. The pipeline leverages Ray Data for distributed processing and RatioPath for image-specific operations like reading slides and generating tiles. Dependencies include 'typing', 'ray.data.expressions', and 'ratiopath'.

```python
from typing import Any
from ray.data.expressions import col
from ratiopath.ray import read_slides
from ratiopath.tiling import grid_tiles, read_slide_tiles
from ratiopath.tiling.utils import row_hash


def tiling(row: dict[str, Any]) -> list[dict[str, Any]]:
    return [
        {
            "tile_x": x,
            "tile_y": y,
            "path": row["path"],
            "slide_id": row["id"],
            "level": row["level"],
            "tile_extent_x": row["tile_extent_x"],
            "tile_extent_y": row["tile_extent_y"],
        }
        for x, y in grid_tiles(
            slide_extent=(row["extent_x"], row["extent_y"]),
            tile_extent=(row["tile_extent_x"], row["tile_extent_y"]),
            stride=(row["stride_x"], row["stride_y"]),
            last="keep",
        )
    ]


if __name__ == "__main__":
    # Step 1: Read slide metadata
    slides = read_slides("data", mpp=0.25, tile_extent=1024, stride=1024 - 64)

    # Step 2: Generate unique slide IDs and save metadata
    slides = slides.map(row_hash, num_cpus=0.1, memory=128 * 1024**2)
    slides.write_parquet("slides")

    # Step 3: Generate tile coordinates
    tiles = slides.flat_map(tiling, num_cpus=0.2, memory=128 * 1024**2)
    tiles = tiles.repartition(target_num_rows_per_block=128)

    # Step 4: Read tile pixels and filter for tissue
    tissue_tiles = tiles.with_column(
        "tile",
        read_slide_tiles(
            col("path"),
            col("tile_x"),
            col("tile_y"),
            col("tile_extent_x"),
            col("tile_extent_y"),
            col("level"),
        ),
        num_cpus=1,
        memory=4 * 1024**3,
    ).filter(lambda row: row["tile"].std() > 8)

    # Step 5: Save filtered tile coordinates
    tissue_tiles = tissue_tiles.drop_columns(
        ["tile", "path", "level", "tile_extent_x", "tile_extent_y"]
    )
    tissue_tiles.write_parquet("tiles")

```

--------------------------------

### Extract Overlay Patches using tile_overlay (Python)

Source: https://github.com/rationai/ratiopath/blob/main/docs/learn/get-started/quick-start/overlays.md

This code snippet demonstrates how to use the `tile_overlay` function from the `ratiopath.tiling` library to extract overlay image patches. It requires a Ray Dataset with tile metadata, a defined Region of Interest (ROI), and the path to the overlay image. The extracted patches are stored in a new column, and the function handles resolution differences and ROI clipping.

```python
from ratiopath.tiling import tile_overlay
from ray.data.expressions import col
from shapely.geometry import box

# Assuming 'tiles' is a Ray Dataset augmented with 'tissue_mask_path'
tiles = ...

# Define a rectangular ROI (e.g., center 50% of a 512x512 tile)
roi = box(128, 128, 384, 384)

tile_with_overlay = tiles.with_column(
    "tissue_overlay",  # New column name for the overlay patch
    tile_overlay(
        roi=roi,
        overlay_path=col("tissue_mask_path"),
        tile_x=col("tile_x"),
        tile_y=col("tile_y"),
        mpp_x=col("mpp_x"),
        mpp_y=col("mpp_y"),
    ),
    num_cpus=1,
    memory=4 * 1024**3,
)
```

--------------------------------

### Read Slide Metadata with Ray Data

Source: https://github.com/rationai/ratiopath/blob/main/docs/learn/get-started/quick-start/tiling.md

Reads slide metadata using the `read_slides` function from `ratiopath.ray`. It takes a data directory, desired resolution (mpp), tile extent, and stride as input. This function returns a Ray Dataset where each row contains metadata for a single slide, automatically determining the best magnification level.

```python
from ratiopath.ray import read_slides

slides = read_slides("data", mpp=0.25, tile_extent=1024, stride=1024 - 64)
```

--------------------------------

### Generate Tissue Mask with Python and PyVips

Source: https://context7.com/rationai/ratiopath/llms.txt

Generates a tissue mask from a whole-slide image using saturation channel extraction and morphological operations. It loads the slide at a specified resolution, applies a default or custom filter pipeline (including grayscale conversion, Otsu thresholding, opening, and closing), and outputs the mask along with its resolution. The generated mask can be saved as a TIFF file.

```python
from ratiopath.masks import tissue_mask
from ratiopath.masks.vips_filters import (
    VipsCompose, VipsGrayScaleFilter, VipsOtsu, VipsOpening, VipsClosing
)
import pyvips

# Load slide at low resolution for mask generation
slide = pyvips.Image.new_from_file("/path/to/slide.svs", level=4)
mpp = (2.0, 2.0)  # Resolution at level 4

# Generate tissue mask with default filter pipeline
mask, output_mpp = tissue_mask(slide, mpp)

# Save mask as TIFF
mask.write_to_file("tissue_mask.tiff")

# Custom filter pipeline
custom_filter = VipsCompose([
    VipsGrayScaleFilter(),
    VipsOtsu(),
    VipsOpening(),  # Remove small noise
    VipsClosing(),  # Fill small holes
])

mask_custom, _ = tissue_mask(slide, mpp, filter=custom_filter)
```

--------------------------------

### GeoJSON Annotation Parsing with Nested Properties (Python)

Source: https://context7.com/rationai/ratiopath/llms.txt

This Python snippet details the usage of GeoJSONParser from ratiopath for handling GeoJSON annotation files. It explains how to parse the file, retrieve all polygons, and filter features based on nested properties using dot notation. It also shows how to obtain a filtered GeoDataFrame for further analysis. Dependencies include ratiopath and geopandas.

```python
from ratiopath.parsers import GeoJSONParser

# Parse GeoJSON annotation file
parser = GeoJSONParser("/path/to/annotations.geojson")

# Get all polygons
all_polygons = list(parser.get_polygons())

# Filter by nested properties using dot notation
# For GeoJSON with properties like {"classification": {"name": "Tumor"}}
tumor_polygons = list(parser.get_polygons(
    classification_name="Tumor.*"  # Uses underscore as separator (configurable)
))

# Get filtered GeoDataFrame for advanced analysis
gdf = parser.get_filtered_geodataframe(
    separator="_",
    classification_name="Tumor",
)
print(f"Filtered features: {len(gdf)}")

# Get point annotations
points = list(parser.get_points(classification_name="Mitosis"))
for point in points:
    print(f"Point at: ({point.x}, {point.y})")
```

--------------------------------

### Add Overlay Path to Tile Dataset (Python)

Source: https://github.com/rationai/ratiopath/blob/main/docs/learn/get-started/quick-start/overlays.md

This function augments a Ray Dataset of tiles by adding a column containing the file path to the corresponding overlay WSI. It assumes the overlay path can be derived from the primary tile path by replacing the extension.

```python
from ratiopath.tiling import tile_overlay
from ray.data.expressions import col

# Assuming 'tiles' is a pre-prepared Ray Dataset
tiles = ...

def add_overlay_path(batch: dict) -> dict:
    """Adds the overlay path for each tile in the batch."""
    # Example: Replace the WSI extension with the mask file extension
    batch["tissue_mask_path"] = batch["path"].str.replace(".mrxs", "_tissue_mask.tiff")
    return batch

tiles = tiles.map_batches(add_overlay_path)
```

--------------------------------

### Read Slide Tile Pixel Data

Source: https://github.com/rationai/ratiopath/blob/main/docs/learn/get-started/quick-start/tiling.md

Reads the actual image data for each tile from the original slide file and adds it as a NumPy array to the dataset. It uses Ray's column expressions to specify input columns and is optimized for batches of tiles from a single slide.

```python
from ray.data.expressions import col

from ratiopath.tiling import read_slide_tiles


tiles_with_pixels = tiles.with_column(
    "tile",  # Name of the new column to add.
    read_slide_tiles(
        col("path"),
        col("tile_x"),
        col("tile_y"),
        col("tile_extent_x"),
        col("tile_extent_y"),
        col("level"),
    ),
    num_cpus=1,  # Reading and decoding images is CPU-heavy.
    memory=4 * 1024**3,  # Give Ray a hint about how much memory this task needs.
)
```

--------------------------------

### Compute and Attach Annotation Coverage to Tile Metadata

Source: https://github.com/rationai/ratiopath/blob/main/docs/learn/get-started/quick-start/annotations.md

A Python function that processes tile data, parses associated annotations, calculates the coverage of annotations within each tile, and returns a list of dictionaries, each containing tile metadata and its annotation coverage. It utilizes ratiopath's `tile_annotations` and Shapely for geometric operations.

```python
from ratiopath.tiling import tile_annotations
from shapely import Polygon
import numpy as np
from typing import Any
from ratiopath.parsers import ASAPParser

def tiling_with_annotations(row: dict[str, Any]) -> list[dict[str, Any]]:
    annotation_path = row["path"].replace(".mrxs", ".xml")
    parser = ASAPParser(annotation_path)
    annotations = list(parser.get_polygons(name="...", part_of_group="..."))

    roi = Polygon([
        (0, 0),
        (row["tile_extent_x"], 0),
        (row["tile_extent_x"], row["tile_extent_y"]),
        (0, row["tile_extent_y"]),
    ])

    coordinates = np.array(list(
        grid_tiles(
            slide_extent=(row["extent_x"], row["extent_y"]),
            tile_extent=(row["tile_extent_x"], row["tile_extent_y"]),
            stride=(row["stride_x"], row["stride_y"]),
            last="keep",
        )
    ))
    return [
        {
            "tile_x": coordinates[i, 0],
            "tile_y": coordinates[i, 1],
            "path": row["path"],
            "slide_id": row["id"],
            "level": row["level"],
            "tile_extent_x": row["tile_extent_x"],
            "tile_extent_y": row["tile_extent_y"],
            "coverage": polygon.area / roi.area,
        }
        for i, polygon in enumerate(
            tile_annotations(
                annotations,
                roi,
                coordinates,
                row["downsample"],
            )
        )
    ]
```

--------------------------------

### Process Tile Overlays and Compute Overlap with Python

Source: https://context7.com/rationai/ratiopath/llms.txt

Reads overlay data (tissue masks, heatmaps) and computes overlap statistics, automatically handling resolution differences. It defines a region of interest within a tile, adds an overlay path column to the dataset, extracts overlay patches using `tile_overlay`, and computes overlap statistics using `tile_overlay_overlap`. The output includes tissue coverage information, which can be used for filtering.

```python
from ratiopath.tiling import tile_overlay, tile_overlay_overlap
from ray.data.expressions import col
from shapely.geometry import box

# Define ROI for overlay extraction (center region of tile)
roi = box(128, 128, 384, 384)  # 256x256 region in center of 512x512 tile

# Add overlay path column
def add_overlay_path(batch):
    batch["mask_path"] = batch["path"].str.replace(".svs", "_mask.tiff")
    return batch

tiles = tiles.map_batches(add_overlay_path)

# Extract overlay patches
tiles_with_overlay = tiles.with_column(
    "tissue_overlay",
    tile_overlay(
        roi=roi,
        overlay_path=col("mask_path"),
        tile_x=col("tile_x"),
        tile_y=col("tile_y"),
        mpp_x=col("mpp_x"),
        mpp_y=col("mpp_y"),
    ),
    num_cpus=1,
    memory=4 * 1024**3,
)

# Compute overlap statistics (fraction of each unique value)
tiles_with_stats = tiles.with_column(
    "tissue_overlap",
    tile_overlay_overlap(
        roi=roi,
        overlay_path=col("mask_path"),
        tile_x=col("tile_x"),
        tile_y=col("tile_y"),
        mpp_x=col("mpp_x"),
        mpp_y=col("mpp_y"),
    ),
    num_cpus=1,
    memory=4 * 1024**3,
)

# Extract foreground coverage and filter
def extract_coverage(tile):
    tile["tissue_coverage"] = tile["tissue_overlap"].get("255", 0.0)
    return tile

tissue_tiles = tiles_with_stats.map(extract_coverage).filter(
    lambda t: t["tissue_coverage"] >= 0.5
)
```

--------------------------------

### Generate Tile Coordinates from Slide Row

Source: https://github.com/rationai/ratiopath/blob/main/docs/learn/get-started/quick-start/tiling.md

Defines a tiling function that takes slide metadata and generates a list of tile coordinates (x, y) for each tile within the slide. It copies relevant slide metadata to each tile coordinate. This function is designed to be used with Ray's flat_map for parallel processing.

```python
from typing import Any
from ratiopath.tiling import grid_tiles

def tiling(row: dict[str, Any]) -> list[dict[str, Any]]:
    return [
        {
            "tile_x": x,
            "tile_y": y,
            "path": row["path"],
            "slide_id": row["id"],
            "level": row["level"],
            "tile_extent_x": row["tile_extent_x"],
            "tile_extent_y": row["tile_extent_y"],
        }
        for x, y in grid_tiles(
            slide_extent=(row["extent_x"], row["extent_y"]),
            tile_extent=(row["tile_extent_x"], row["tile_extent_y"]),
            stride=(row["stride_x"], row["stride_y"]),
            last="keep",
        )
    ]

tiles = slides.flat_map(tiling, num_cpus=0.2, memory=128 * 1024**2)
```

--------------------------------

### read_slide_tiles

Source: https://context7.com/rationai/ratiopath/llms.txt

Reads batches of tiles from whole-slide images using either OpenSlide or tifffile backends. This is a Ray UDF expression designed for efficient batch processing with automatic file format detection and caching.

```APIDOC
## POST /api/read_slide_tiles

### Description
Reads batches of tiles from whole-slide images using either OpenSlide or tifffile backends. This is a Ray UDF expression designed for efficient batch processing with automatic file format detection and caching.

### Method
POST

### Endpoint
/api/read_slide_tiles

### Parameters
#### Request Body
- **path** (string) - Required - Path to the WSI file.
- **tile_x** (integer) - Required - X coordinate of the tile's top-left corner.
- **tile_y** (integer) - Required - Y coordinate of the tile's top-left corner.
- **tile_extent_x** (integer) - Required - Width of the tile in pixels.
- **tile_extent_y** (integer) - Required - Height of the tile in pixels.
- **level** (integer) - Optional - The slide pyramid level to read from. Defaults to the level determined by `read_slides`.

### Request Example
```python
from ratiopath.tiling import read_slide_tiles
from ray.data.expressions import col

tiles_with_pixels = tiles.with_column(
    "tile",
    read_slide_tiles(
        col("path"),
        col("tile_x"),
        col("tile_y"),
        col("tile_extent_x"),
        col("tile_extent_y"),
        col("level"),
    ),
    num_cpus=1,
    memory=4 * 1024**3,
)
```

### Response
#### Success Response (200)
- **tile_data** (numpy array or similar) - The pixel data for the requested tile.

#### Response Example
```json
{
  "tile_data": "base64_encoded_pixel_data_or_numpy_array"
}
```
```

--------------------------------

### Compute Tile Overlay Overlap Ratio (Python)

Source: https://github.com/rationai/ratiopath/blob/main/docs/learn/get-started/quick-start/overlays.md

Computes the pixel ratio for each unique value in an overlay patch, useful for filtering tiles based on content. It takes the region of interest, overlay path, tile coordinates, and resolution as input. The output is a dictionary mapping unique pixel values to their area coverage.

```python
from ratiopath.tiling import tile_overlay_overlap

tissue_tiles = tiles.with_column(
    "tissue_overlap",  # New column name for the overlay patch
    tile_overlay_overlap(
        roi=roi,
        overlay_path=col("tissue_mask_path"),
        tile_x=col("tile_x"),
        tile_y=col("tile_y"),
        mpp_x=col("mpp_x"),
        mpp_y=col("mpp_y"),
    ),
    num_cpus=1,
    memory=4 * 1024**3,
)
```

--------------------------------

### Filter Tiles Containing Tissue

Source: https://github.com/rationai/ratiopath/blob/main/docs/learn/get-started/quick-start/tiling.md

Filters the dataset to keep only tiles that contain tissue. It uses a heuristic where tiles with a standard deviation of pixel values above a certain threshold are considered to contain tissue, distinguishing them from uniform backgrounds.

```python
tissue_tiles = tiles_with_pixels.filter(lambda row: row["tile"].std() > 8)
```

--------------------------------

### grid_tiles

Source: https://context7.com/rationai/ratiopath/llms.txt

Generates tile coordinates for a given slide based on its size, tile size, and stride. The function yields tile coordinates in row-major order and provides options for handling edge tiles.

```APIDOC
## GET /api/grid_tiles

### Description
Generates tile coordinates for a given slide based on its size, tile size, and stride. The function yields tile coordinates in row-major order and provides options for handling edge tiles that don't fit the stride pattern.

### Method
GET

### Endpoint
/api/grid_tiles

### Parameters
#### Query Parameters
- **slide_extent** (tuple of integers) - Required - The width and height of the slide in pixels (e.g., `(84320, 61120)`).
- **tile_extent** (tuple of integers) - Required - The width and height of each tile in pixels (e.g., `(1024, 1024)`).
- **stride** (tuple of integers) - Required - The step size between tiles in pixels (e.g., `(960, 960)`).
- **last** (string) - Optional - Determines how to handle edge tiles. Options: `"keep"` (include edge tiles), `"drop"` (exclude), `"shift"` (shift to fit). Defaults to `"keep"`.

### Request Example
```python
from ratiopath.tiling import grid_tiles

slide_extent = (84320, 61120)
tile_extent = (1024, 1024)
stride = (960, 960)

coordinates = list(grid_tiles(
    slide_extent=slide_extent,
    tile_extent=tile_extent,
    stride=stride,
    last="keep",
))
```

### Response
#### Success Response (200)
- **coordinates** (list of numpy arrays) - A list where each element is a numpy array representing the `[x, y]` coordinates of a tile's top-left corner.

#### Response Example
```json
{
  "coordinates": [
    [0, 0],
    [960, 0],
    [1920, 0],
    ...
  ]
}
```
```

--------------------------------

### Extract Foreground Tissue Coverage (Python)

Source: https://github.com/rationai/ratiopath/blob/main/docs/learn/get-started/quick-start/overlays.md

A Python function to extract the coverage of a specific class (value '255') from the overlap dictionary generated by `tile_overlay_overlap`. It safely retrieves the coverage, defaulting to 0.0 if the class is not present, and adds it as a new key 'tissue_coverage'.

```python
def extract_foreground_coverage(tile: dict) -> dict:
    """Extracts the foreground coverage (value 255) from the overlap dictionary."""
    # Use .get(255, 0.0) to safely retrieve the value, defaulting to 0.0 if not present
    tile["tissue_coverage"] = tile["tissue_mask_overlap"].get('255', 0.0)
    return tile

tiles_with_tissue_coverage = tiles_with_overlap.map(extract_foreground_coverage).filter(
    lambda tile: tile["tissue_coverage"] >= 0.5  # Keep tiles with at least 50% tissue
)
```