### Install Ratiopath Package Source: https://github.com/rationai/ratiopath/blob/main/docs/learn/get-started/quick-start/index.md Installs the Ratiopath Python package using pip. This is the first step to using the library for image analysis. ```bash pip install "ratiopath" ``` -------------------------------- ### Minimal Ratiopath Tiling Pipeline (Python) Source: https://github.com/rationai/ratiopath/blob/main/docs/learn/get-started/quick-start/index.md Demonstrates a minimal pipeline to read whole-slide images and generate tile metadata using Ratiopath. It utilizes `read_slides` for image loading and `grid_tiles` for defining tile coordinates. The output is a list of dictionaries, each representing a tile with its slide ID and coordinates. ```python from ratiopath.ray import read_slides from ratiopath.tiling import grid_tiles slides = read_slides("data", mpp=0.25, tile_extent=1024, stride=960) def tiling(row): return [ yield { "slide_id": row["id"], "tile_x": x, "tile_y": y, "level": row["level"], } for x, y in grid_tiles( (row["extent_x"], row["extent_y"]), (row["tile_extent_x"], row["tile_extent_y"]), (row["stride_x"], row["stride_y"]), last="keep", ) ] tiles = slides.flat_map(tiling) tiles.show(5) ``` -------------------------------- ### Estimate Stain Vectors with Macenko Method (Python) Source: https://context7.com/rationai/ratiopath/llms.txt Provides a Python example for estimating stain vectors from histopathology images using the Macenko method. It demonstrates how to use the `estimate_stain_vectors` function with various parameters and shows how to access pre-defined and custom stain matrices. ```python from ratiopath.augmentations.estimate_stain_vectors import ( estimate_stain_vectors, HE, HDAB, make_residual_stain, HEMATOXYLIN, EOSIN ) import numpy as np # Load a histopathology tile image = np.random.randint(0, 255, (512, 512, 3), dtype=np.uint8) # Estimate stain vectors using Macenko method stain_vectors = estimate_stain_vectors( image=image, default_stain_vectors=HE, # Use H&E as reference i0=256, # Normalization intensity min_stain=0.05, # Minimum OD threshold max_stain=1.0, # Maximum OD threshold alpha=0.01, # Percentile for extreme angles ) print(f"Estimated stain vectors shape: {stain_vectors.shape}") # (3, 3) print(f"Stain 1 (Hematoxylin): {stain_vectors[0]}") print(f"Stain 2 (Eosin): {stain_vectors[1]}") print(f"Residual stain: {stain_vectors[2]}") # Pre-defined stain matrices print(f"Standard H&E matrix:\n{HE}") print(f"Standard H-DAB matrix:\n{HDAB}") # Create custom stain matrix custom_stain1 = np.array([0.65, 0.70, 0.29]) custom_stain2 = np.array([0.07, 0.99, 0.11]) residual = make_residual_stain(custom_stain1, custom_stain2) custom_matrix = np.array([custom_stain1, custom_stain2, residual]) ``` -------------------------------- ### Generate Row Hash for Dataset IDs (Python) Source: https://context7.com/rationai/ratiopath/llms.txt Illustrates the use of the `row_hash` utility function from `ratiopath.tiling.utils` to generate unique hash identifiers for dataset rows. Examples include default usage, custom column names, and different hashing algorithms. ```python from ratiopath.tiling.utils import row_hash import hashlib import ray # Assuming 'slides' is a Ray dataset # Add unique ID to each slide # slides = slides.map(row_hash, num_cpus=0.1, memory=128 * 1024**2) # Custom column name and algorithm # slides_custom = slides.map( # lambda row: row_hash(row, column="slide_hash", algorithm=hashlib.md5), # num_cpus=0.1, # ) # The hash is based on all row contents sample_row = {"path": "/data/slide.svs", "extent_x": 1000, "extent_y": 2000} hashed_row = row_hash(sample_row) print(f"Generated ID: {hashed_row['id']}") # SHA256 hash string ``` -------------------------------- ### Save Images as TIFF using VipsTiffDatasink (Python) Source: https://context7.com/rationai/ratiopath/llms.txt Demonstrates how to use the `VipsTiffDatasink` from `ratiopath.ray.datasource` to efficiently save image data as TIFF files with libvips. It covers default options and per-row specific TIFF configurations. ```python from ratiopath.ray.datasource import VipsTiffDatasink import ray import numpy as np # Prepare dataset with image data tiles = ray.data.from_items([ {"id": "tile_001", "image": np.random.randint(0, 255, (512, 512, 3), dtype=np.uint8)}, {"id": "tile_002", "image": np.random.randint(0, 255, (512, 512, 3), dtype=np.uint8)}, ]) # Create datasink with default TIFF options datasink = VipsTiffDatasink( path="output_tiles/", data_column="image", # Column containing numpy arrays options_column="tiff_options", # Optional: per-row TIFF options default_options={ "compression": "lzw", "tile": True, "tile_width": 256, "tile_height": 256, }, ) # Write tiles to TIFF files tiles.write_datasink(datasink) # With per-row options tiles_with_options = tiles.map(lambda row: { **row, "tiff_options": {"compression": "jpeg", "Q": 85} }) tiles_with_options.write_datasink(datasink) ``` -------------------------------- ### TiffFile Wrapper for OME-TIFF Handling (Python) Source: https://context7.com/rationai/ratiopath/llms.txt This snippet illustrates using the TiffFile wrapper from ratiopath for reading OME-TIFF files. It covers accessing multi-resolution levels, finding the closest resolution, and reading tiles efficiently using zarr. Key dependencies are ratiopath, tifffile, and zarr. ```python from ratiopath.tifffile import TiffFile with TiffFile("/path/to/slide.ome.tiff") as slide: # Get number of resolution levels num_levels = slide.levels() print(f"Available levels: {num_levels}") # Find the level closest to desired resolution target_mpp = 0.25 best_level = slide.closest_level(mpp=target_mpp) # Get resolution at specific level resolution = slide.slide_resolution(level=best_level) print(f"Resolution at level {best_level}: {resolution} µm/px") # Access specific page for zarr-based reading page = slide.get_main_page(level=best_level) # Read tile using zarr for efficient access import zarr z = zarr.open(page.aszarr(), mode="r") tile = z[0:512, 0:512] # Read 512x512 region print(f"Tile shape: {tile.shape}") ``` -------------------------------- ### OpenSlide Wrapper for WSI Handling (Python) Source: https://context7.com/rationai/ratiopath/llms.txt This snippet demonstrates using the OpenSlide wrapper from ratiopath for reading Whole Slide Images (WSI). It shows how to find the best resolution level, read image regions with relative coordinates, and convert them to NumPy arrays. Dependencies include ratiopath, numpy, and Pillow. ```python from ratiopath.openslide import OpenSlide with OpenSlide("/path/to/slide.svs") as slide: # Find the level closest to desired resolution (microns per pixel) target_mpp = 0.5 best_level = slide.closest_level(mpp=target_mpp) print(f"Best level for {target_mpp} µm/px: {best_level}") # Get actual resolution at that level actual_mpp = slide.slide_resolution(level=best_level) print(f"Actual resolution: {actual_mpp} µm/px") # Read a region with coordinates relative to the specified level # (automatically scales coordinates based on level downsample) region = slide.read_region_relative( location=(1000, 2000), # Coordinates at the target level level=best_level, size=(512, 512), ) # Convert to RGB numpy array import numpy as np from PIL import Image rgb = Image.alpha_composite( Image.new("RGBA", region.size, (255, 255, 255)), region ).convert("RGB") tile_array = np.asarray(rgb) print(f"Tile shape: {tile_array.shape}") # (512, 512, 3) ``` -------------------------------- ### Create Stain Augmentor with Fixed H&E Matrix (Python) Source: https://context7.com/rationai/ratiopath/llms.txt Demonstrates how to create a StainAugmentor instance using a fixed H&E stain matrix for image augmentation. It shows applying the augmentor to a random image and combining it with other Albumentations transforms. ```python from ratiopath.augmentations.stain_augmentor import StainAugmentor from ratiopath.augmentations.stain_augmentor import HE import numpy as np import albumentations as A # Create stain augmentor with fixed H&E stain matrix augmentor = StainAugmentor( conv_matrix=HE, # Pre-defined H&E stain vectors alpha=0.02, # Multiplicative augmentation range beta=0.02, # Additive augmentation range p=0.5, # Probability of applying augmentation ) # Apply to image image = np.random.randint(0, 255, (512, 512, 3), dtype=np.uint8) augmented = augmentor(image=image)["image"] # Use with adaptive stain estimation per image def adaptive_stain_matrix(image): # Assuming estimate_stain_vectors is imported and available # from ratiopath.augmentations.estimate_stain_vectors import estimate_stain_vectors, HE return estimate_stain_vectors(image, default_stain_vectors=HE) adaptive_augmentor = StainAugmentor( conv_matrix=adaptive_stain_matrix, # Callable for per-image estimation alpha=0.02, beta=0.02, ) # Combine with other albumentations transforms transform = A.Compose([ A.RandomRotate90(p=0.5), A.HorizontalFlip(p=0.5), augmentor, ]) result = transform(image=image)["image"] ``` -------------------------------- ### Read Slides into Ray Dataset - Python Source: https://context7.com/rationai/ratiopath/llms.txt Creates a Ray Dataset from whole-slide image files, reading metadata and preparing tiling parameters. It automatically selects the best slide level based on specified microns per pixel (mpp) resolution and returns a dataset with slide metadata for tiled processing. Dependencies include the ratiopath.ray library. ```python from ratiopath.ray import read_slides # Read slides from a directory with specified resolution and tile parameters slides = read_slides( paths="data/", tile_extent=1024, # Tile size in pixels (can be tuple for different x/y) stride=960, # Step size between tiles (1024-64 for overlap) mpp=0.25, # Target resolution in microns per pixel ) # View the dataset schema slides.schema() # Column Type # ------ ---- # path string # extent_x int64 # extent_y int64 # tile_extent_x int64 # tile_extent_y int64 # stride_x int64 # stride_y int64 # mpp_x double # mpp_y double # level int64 # downsample double # Preview the data slides.show(2) # {'path': '/abs/path/slide1.svs', 'extent_x': 84320, 'extent_y': 61120, ...} ``` -------------------------------- ### Repartition Dataset for Parallelism Source: https://github.com/rationai/ratiopath/blob/main/docs/learn/get-started/quick-start/tiling.md Shuffles the dataset rows (tiles) to distribute them more evenly across smaller blocks. This improves parallelism for subsequent processing steps by allowing Ray to spread work across more CPU cores. ```python tiles = tiles.repartition(target_num_rows_per_block=128) ``` -------------------------------- ### ASAP XML Annotation Parsing (Python) Source: https://context7.com/rationai/ratiopath/llms.txt This Python snippet shows how to use the ASAPParser from ratiopath to parse ASAP XML annotation files. It demonstrates extracting all polygons, filtering them by name and group using regular expressions, and retrieving point annotations. The primary dependency is ratiopath, with shapely used for polygon operations. ```python from ratiopath.parsers import ASAPParser # Parse ASAP XML annotation file parser = ASAPParser("/path/to/annotations.xml") # Get all polygon annotations all_polygons = list(parser.get_polygons()) print(f"Total polygons: {len(all_polygons)}") # Filter polygons by name and group using regex tumor_polygons = list(parser.get_polygons( name="tumor.*", # Match names starting with "tumor" part_of_group="malignant", # Match group containing "malignant" )) # Get point annotations mitosis_points = list(parser.get_points( name="mitosis", part_of_group=".*", # Any group )) # Use polygons for tile annotation coverage from shapely import Polygon for polygon in tumor_polygons: print(f"Polygon area: {polygon.area}, bounds: {polygon.bounds}") ``` -------------------------------- ### read_slides Source: https://context7.com/rationai/ratiopath/llms.txt Creates a Ray Dataset from whole-slide image files, reading metadata and preparing tiling parameters. It selects the best slide level based on the specified microns per pixel (mpp) resolution. ```APIDOC ## POST /api/read_slides ### Description Creates a Ray Dataset from whole-slide image files, reading metadata and preparing tiling parameters. This function automatically selects the best slide level based on the specified microns per pixel (mpp) resolution and returns a dataset where each row corresponds to a single slide with all metadata needed for subsequent tiled processing. ### Method POST ### Endpoint /api/read_slides ### Parameters #### Query Parameters - **paths** (string) - Required - Path to the WSI files or directory. - **tile_extent** (integer or tuple) - Optional - Tile size in pixels (can be tuple for different x/y). - **stride** (integer or tuple) - Optional - Step size between tiles (e.g., 1024-64 for overlap). - **mpp** (float) - Optional - Target resolution in microns per pixel. ### Request Example ```python from ratiopath.ray import read_slides slides = read_slides( paths="data/", tile_extent=1024, stride=960, mpp=0.25, ) ``` ### Response #### Success Response (200) - **dataset** (Ray Dataset) - A Ray Dataset where each row contains slide metadata including path, dimensions, tile parameters, resolution, level, and downsample factor. #### Response Example ```json { "schema": [ {"column": "path", "type": "string"}, {"column": "extent_x", "type": "int64"}, {"column": "extent_y", "type": "int64"}, {"column": "tile_extent_x", "type": "int64"}, {"column": "tile_extent_y", "type": "int64"}, {"column": "stride_x", "type": "int64"}, {"column": "stride_y", "type": "int64"}, {"column": "mpp_x", "type": "double"}, {"column": "mpp_y", "type": "double"}, {"column": "level", "type": "int64"}, {"column": "downsample", "type": "double"} ], "preview": [ {"path": "/abs/path/slide1.svs", "extent_x": 84320, "extent_y": 61120, "tile_extent_x": 1024, "tile_extent_y": 1024, "stride_x": 960, "stride_y": 960, "mpp_x": 0.25, "mpp_y": 0.25, "level": 0, "downsample": 1.0} ] } ``` ``` -------------------------------- ### Build Histopathology Tiling Pipeline with Python and Ray Data Source: https://github.com/rationai/ratiopath/blob/main/docs/learn/get-started/quick-start/tiling.md This Python script defines a complete tiling pipeline using Ray Data and ratiopath. It reads slide metadata, generates grid tiles, filters out background tiles based on standard deviation, and saves the resulting tiles to a Parquet file. The pipeline is designed for scalability and efficiency in processing large whole-slide images. ```python from typing import Any from ray.data.expressions import col from ratiopath.ray import read_slides from ratiopath.tiling import grid_tiles, read_slide_tiles from ratiopath.tiling.utils import row_hash def tiling(row: dict[str, Any]) -> list[dict[str, Any]]: return [ { "tile_x": x, "tile_y": y, "path": row["path"], "slide_id": row["id"], "level": row["level"], "tile_extent_x": row["tile_extent_x"], "tile_extent_y": row["tile_extent_y"], } for x, y in grid_tiles( slide_extent=(row["extent_x"], row["extent_y"]), tile_extent=(row["tile_extent_x"], row["tile_extent_y"]), stride=(row["stride_x"], row["stride_y"]), last="keep", ) ] if __name__ == "__main__": slides = read_slides("data", mpp=0.25, tile_extent=1024, stride=1024 - 64) slides = slides.map(row_hash, num_cpus=0.1, memory=128 * 1024**2) slides.write_parquet("slides") tiles = slides.flat_map(tiling, num_cpus=0.2, memory=128 * 1024**2).repartition( target_num_rows_per_block=128 ) tissue_tiles = tiles.with_column( "tile", read_slide_tiles( col("path"), col("tile_x"), col("tile_y"), col("tile_extent_x"), col("tile_extent_y"), col("level"), ), num_cpus=1, memory=4 * 1024**3, ).filter(lambda row: row["tile"].std() > 8) tissue_tiles = tissue_tiles.drop_columns( ["tile", "path", "level", "tile_extent_x", "tile_extent_y"] ) tissue_tiles.write_parquet("tiles") ``` -------------------------------- ### Parse ASAP Annotation Files with ratiopath Source: https://github.com/rationai/ratiopath/blob/main/docs/learn/get-started/quick-start/annotations.md Parses ASAP XML annotation files to extract polygon data. It requires the path to the annotation file and optionally accepts regular expressions to filter annotations by name and group. The output is a list of annotation polygons. ```python from ratiopath.parsers import ASAPParser annotation_path = row["path"].replace(".mrxs", ".xml") parser = ASAPParser(annotation_path) annotations = list(parser.get_polygons(name="...", part_of_group="...")) ``` -------------------------------- ### Compute Tile Annotation Coverage with Python Source: https://context7.com/rationai/ratiopath/llms.txt Computes annotation coverage for tiles by intersecting tile regions with annotation polygons. It parses annotations, defines a region of interest (ROI) for each tile, generates tile coordinates, and then calculates the area of intersection between annotations and tiles. The output is a list of dictionaries, each containing tile coordinates, slide information, and the coverage fraction. ```python from ratiopath.tiling import tile_annotations, grid_tiles from ratiopath.parsers import ASAPParser from shapely import Polygon import numpy as np def tiling_with_annotations(row): # Parse annotations for this slide annotation_path = row["path"].replace(".svs", ".xml") parser = ASAPParser(annotation_path) annotations = list(parser.get_polygons(name="tumor.*")) # Define ROI (region of interest) covering full tile roi = Polygon([ (0, 0), (row["tile_extent_x"], 0), (row["tile_extent_x"], row["tile_extent_y"]), (0, row["tile_extent_y"]), ]) # Generate tile coordinates coordinates = np.array(list(grid_tiles( slide_extent=(row["extent_x"], row["extent_y"]), tile_extent=(row["tile_extent_x"], row["tile_extent_y"]), stride=(row["stride_x"], row["stride_y"]), last="keep", ))) # Compute annotation intersection for each tile return [ { "tile_x": int(coordinates[i, 0]), "tile_y": int(coordinates[i, 1]), "path": row["path"], "slide_id": row["id"], "coverage": polygon.area / roi.area, # Fraction covered by annotations } for i, polygon in enumerate(tile_annotations( annotations=annotations, roi=roi, coordinates=coordinates, downsample=row["downsample"], )) ] # Apply to dataset tiles_with_coverage = slides.flat_map(tiling_with_annotations) # Filter for tiles with significant annotation coverage annotated_tiles = tiles_with_coverage.filter(lambda t: t["coverage"] > 0.5) ``` -------------------------------- ### Apply Stain Augmentation with Python Source: https://context7.com/rationai/ratiopath/llms.txt Applies stain augmentation to histopathological images using the Tellez et al. method. This function is compatible with the albumentations pipeline for data augmentation. It requires importing `StainAugmentor` and optionally `estimate_stain_vectors` for stain vector estimation. ```python from ratiopath.augmentations import StainAugmentor from ratiopath.augmentations.estimate_stain_vectors import HE, estimate_stain_vectors import albumentations as A import numpy as np ``` -------------------------------- ### Create Unique Slide IDs and Save Metadata Source: https://github.com/rationai/ratiopath/blob/main/docs/learn/get-started/quick-start/tiling.md Generates a unique hash ID for each slide using the `row_hash` function from `ratiopath.tiling.utils` and applies it to every row in the Ray Dataset using `.map()`. The resulting slide-level metadata, including the new hash ID, is then saved to a Parquet file named 'slides'. This action triggers the execution of the Ray data processing plan. ```python from ratiopath.tiling.utils import row_hash slides = slides.map(row_hash, num_cpus=0.1, memory=128 * 1024**2) slides.write_parquet("slides") ``` -------------------------------- ### Apply Annotation Coverage Function with Ray Data Source: https://github.com/rationai/ratiopath/blob/main/docs/learn/get-started/quick-start/annotations.md Applies a custom function, `tiling_with_annotations`, to a dataset of tiles using Ray Data's `flat_map` transformation. This integrates the annotation coverage calculation into a distributed data processing pipeline. ```python tiles = slides.flat_map(tiling_with_annotations) ``` -------------------------------- ### Read Slide Tiles with Ray - Python Source: https://context7.com/rationai/ratiopath/llms.txt Reads batches of tiles from whole-slide images using OpenSlide or tifffile backends. This Ray UDF expression enables efficient batch processing with automatic file format detection and caching. Dependencies include ratiopath.tiling and ray.data.expressions. ```python from ratiopath.tiling import read_slide_tiles from ray.data.expressions import col # Add tile pixel data to the dataset tiles_with_pixels = tiles.with_column( "tile", read_slide_tiles( col("path"), # Path to the WSI file col("tile_x"), # X coordinate of tile col("tile_y"), # Y coordinate of tile col("tile_extent_x"), # Width of tile col("tile_extent_y"), # Height of tile col("level"), # Slide pyramid level ), num_cpus=1, memory=4 * 1024**3, ) # Filter tiles based on content (e.g., tissue vs background) tissue_tiles = tiles_with_pixels.filter( lambda row: row["tile"].std() > 8 # Keep tiles with high variance (tissue) ) # Save filtered tile coordinates tissue_tiles.drop_columns(["tile", "path", "level", "tile_extent_x", "tile_extent_y"]) tissue_tiles.write_parquet("tiles") ``` -------------------------------- ### Save Filtered Tile Data to Parquet Source: https://github.com/rationai/ratiopath/blob/main/docs/learn/get-started/quick-start/tiling.md Saves the filtered tile information to disk in Parquet format. It drops unnecessary columns like raw pixel data and redundant metadata to reduce file size, retaining only essential information for the final output. ```python tissue_tiles = tissue_tiles.drop_columns( ["tile", "path", "level", "tile_extent_x", "tile_extent_y"] ) tissue_tiles.write_parquet("tiles") ``` -------------------------------- ### Complete End-to-End Pipeline for Pathology Image Processing Source: https://context7.com/rationai/ratiopath/llms.txt This Python script demonstrates a full pipeline for processing whole-slide pathology images. It reads slide metadata, generates tile coordinates, filters for tissue-rich tiles, and saves the results. The pipeline leverages Ray Data for distributed processing and RatioPath for image-specific operations like reading slides and generating tiles. Dependencies include 'typing', 'ray.data.expressions', and 'ratiopath'. ```python from typing import Any from ray.data.expressions import col from ratiopath.ray import read_slides from ratiopath.tiling import grid_tiles, read_slide_tiles from ratiopath.tiling.utils import row_hash def tiling(row: dict[str, Any]) -> list[dict[str, Any]]: return [ { "tile_x": x, "tile_y": y, "path": row["path"], "slide_id": row["id"], "level": row["level"], "tile_extent_x": row["tile_extent_x"], "tile_extent_y": row["tile_extent_y"], } for x, y in grid_tiles( slide_extent=(row["extent_x"], row["extent_y"]), tile_extent=(row["tile_extent_x"], row["tile_extent_y"]), stride=(row["stride_x"], row["stride_y"]), last="keep", ) ] if __name__ == "__main__": # Step 1: Read slide metadata slides = read_slides("data", mpp=0.25, tile_extent=1024, stride=1024 - 64) # Step 2: Generate unique slide IDs and save metadata slides = slides.map(row_hash, num_cpus=0.1, memory=128 * 1024**2) slides.write_parquet("slides") # Step 3: Generate tile coordinates tiles = slides.flat_map(tiling, num_cpus=0.2, memory=128 * 1024**2) tiles = tiles.repartition(target_num_rows_per_block=128) # Step 4: Read tile pixels and filter for tissue tissue_tiles = tiles.with_column( "tile", read_slide_tiles( col("path"), col("tile_x"), col("tile_y"), col("tile_extent_x"), col("tile_extent_y"), col("level"), ), num_cpus=1, memory=4 * 1024**3, ).filter(lambda row: row["tile"].std() > 8) # Step 5: Save filtered tile coordinates tissue_tiles = tissue_tiles.drop_columns( ["tile", "path", "level", "tile_extent_x", "tile_extent_y"] ) tissue_tiles.write_parquet("tiles") ``` -------------------------------- ### Extract Overlay Patches using tile_overlay (Python) Source: https://github.com/rationai/ratiopath/blob/main/docs/learn/get-started/quick-start/overlays.md This code snippet demonstrates how to use the `tile_overlay` function from the `ratiopath.tiling` library to extract overlay image patches. It requires a Ray Dataset with tile metadata, a defined Region of Interest (ROI), and the path to the overlay image. The extracted patches are stored in a new column, and the function handles resolution differences and ROI clipping. ```python from ratiopath.tiling import tile_overlay from ray.data.expressions import col from shapely.geometry import box # Assuming 'tiles' is a Ray Dataset augmented with 'tissue_mask_path' tiles = ... # Define a rectangular ROI (e.g., center 50% of a 512x512 tile) roi = box(128, 128, 384, 384) tile_with_overlay = tiles.with_column( "tissue_overlay", # New column name for the overlay patch tile_overlay( roi=roi, overlay_path=col("tissue_mask_path"), tile_x=col("tile_x"), tile_y=col("tile_y"), mpp_x=col("mpp_x"), mpp_y=col("mpp_y"), ), num_cpus=1, memory=4 * 1024**3, ) ``` -------------------------------- ### Read Slide Metadata with Ray Data Source: https://github.com/rationai/ratiopath/blob/main/docs/learn/get-started/quick-start/tiling.md Reads slide metadata using the `read_slides` function from `ratiopath.ray`. It takes a data directory, desired resolution (mpp), tile extent, and stride as input. This function returns a Ray Dataset where each row contains metadata for a single slide, automatically determining the best magnification level. ```python from ratiopath.ray import read_slides slides = read_slides("data", mpp=0.25, tile_extent=1024, stride=1024 - 64) ``` -------------------------------- ### Generate Tissue Mask with Python and PyVips Source: https://context7.com/rationai/ratiopath/llms.txt Generates a tissue mask from a whole-slide image using saturation channel extraction and morphological operations. It loads the slide at a specified resolution, applies a default or custom filter pipeline (including grayscale conversion, Otsu thresholding, opening, and closing), and outputs the mask along with its resolution. The generated mask can be saved as a TIFF file. ```python from ratiopath.masks import tissue_mask from ratiopath.masks.vips_filters import ( VipsCompose, VipsGrayScaleFilter, VipsOtsu, VipsOpening, VipsClosing ) import pyvips # Load slide at low resolution for mask generation slide = pyvips.Image.new_from_file("/path/to/slide.svs", level=4) mpp = (2.0, 2.0) # Resolution at level 4 # Generate tissue mask with default filter pipeline mask, output_mpp = tissue_mask(slide, mpp) # Save mask as TIFF mask.write_to_file("tissue_mask.tiff") # Custom filter pipeline custom_filter = VipsCompose([ VipsGrayScaleFilter(), VipsOtsu(), VipsOpening(), # Remove small noise VipsClosing(), # Fill small holes ]) mask_custom, _ = tissue_mask(slide, mpp, filter=custom_filter) ``` -------------------------------- ### GeoJSON Annotation Parsing with Nested Properties (Python) Source: https://context7.com/rationai/ratiopath/llms.txt This Python snippet details the usage of GeoJSONParser from ratiopath for handling GeoJSON annotation files. It explains how to parse the file, retrieve all polygons, and filter features based on nested properties using dot notation. It also shows how to obtain a filtered GeoDataFrame for further analysis. Dependencies include ratiopath and geopandas. ```python from ratiopath.parsers import GeoJSONParser # Parse GeoJSON annotation file parser = GeoJSONParser("/path/to/annotations.geojson") # Get all polygons all_polygons = list(parser.get_polygons()) # Filter by nested properties using dot notation # For GeoJSON with properties like {"classification": {"name": "Tumor"}} tumor_polygons = list(parser.get_polygons( classification_name="Tumor.*" # Uses underscore as separator (configurable) )) # Get filtered GeoDataFrame for advanced analysis gdf = parser.get_filtered_geodataframe( separator="_", classification_name="Tumor", ) print(f"Filtered features: {len(gdf)}") # Get point annotations points = list(parser.get_points(classification_name="Mitosis")) for point in points: print(f"Point at: ({point.x}, {point.y})") ``` -------------------------------- ### Add Overlay Path to Tile Dataset (Python) Source: https://github.com/rationai/ratiopath/blob/main/docs/learn/get-started/quick-start/overlays.md This function augments a Ray Dataset of tiles by adding a column containing the file path to the corresponding overlay WSI. It assumes the overlay path can be derived from the primary tile path by replacing the extension. ```python from ratiopath.tiling import tile_overlay from ray.data.expressions import col # Assuming 'tiles' is a pre-prepared Ray Dataset tiles = ... def add_overlay_path(batch: dict) -> dict: """Adds the overlay path for each tile in the batch.""" # Example: Replace the WSI extension with the mask file extension batch["tissue_mask_path"] = batch["path"].str.replace(".mrxs", "_tissue_mask.tiff") return batch tiles = tiles.map_batches(add_overlay_path) ``` -------------------------------- ### Read Slide Tile Pixel Data Source: https://github.com/rationai/ratiopath/blob/main/docs/learn/get-started/quick-start/tiling.md Reads the actual image data for each tile from the original slide file and adds it as a NumPy array to the dataset. It uses Ray's column expressions to specify input columns and is optimized for batches of tiles from a single slide. ```python from ray.data.expressions import col from ratiopath.tiling import read_slide_tiles tiles_with_pixels = tiles.with_column( "tile", # Name of the new column to add. read_slide_tiles( col("path"), col("tile_x"), col("tile_y"), col("tile_extent_x"), col("tile_extent_y"), col("level"), ), num_cpus=1, # Reading and decoding images is CPU-heavy. memory=4 * 1024**3, # Give Ray a hint about how much memory this task needs. ) ``` -------------------------------- ### Compute and Attach Annotation Coverage to Tile Metadata Source: https://github.com/rationai/ratiopath/blob/main/docs/learn/get-started/quick-start/annotations.md A Python function that processes tile data, parses associated annotations, calculates the coverage of annotations within each tile, and returns a list of dictionaries, each containing tile metadata and its annotation coverage. It utilizes ratiopath's `tile_annotations` and Shapely for geometric operations. ```python from ratiopath.tiling import tile_annotations from shapely import Polygon import numpy as np from typing import Any from ratiopath.parsers import ASAPParser def tiling_with_annotations(row: dict[str, Any]) -> list[dict[str, Any]]: annotation_path = row["path"].replace(".mrxs", ".xml") parser = ASAPParser(annotation_path) annotations = list(parser.get_polygons(name="...", part_of_group="...")) roi = Polygon([ (0, 0), (row["tile_extent_x"], 0), (row["tile_extent_x"], row["tile_extent_y"]), (0, row["tile_extent_y"]), ]) coordinates = np.array(list( grid_tiles( slide_extent=(row["extent_x"], row["extent_y"]), tile_extent=(row["tile_extent_x"], row["tile_extent_y"]), stride=(row["stride_x"], row["stride_y"]), last="keep", ) )) return [ { "tile_x": coordinates[i, 0], "tile_y": coordinates[i, 1], "path": row["path"], "slide_id": row["id"], "level": row["level"], "tile_extent_x": row["tile_extent_x"], "tile_extent_y": row["tile_extent_y"], "coverage": polygon.area / roi.area, } for i, polygon in enumerate( tile_annotations( annotations, roi, coordinates, row["downsample"], ) ) ] ``` -------------------------------- ### Process Tile Overlays and Compute Overlap with Python Source: https://context7.com/rationai/ratiopath/llms.txt Reads overlay data (tissue masks, heatmaps) and computes overlap statistics, automatically handling resolution differences. It defines a region of interest within a tile, adds an overlay path column to the dataset, extracts overlay patches using `tile_overlay`, and computes overlap statistics using `tile_overlay_overlap`. The output includes tissue coverage information, which can be used for filtering. ```python from ratiopath.tiling import tile_overlay, tile_overlay_overlap from ray.data.expressions import col from shapely.geometry import box # Define ROI for overlay extraction (center region of tile) roi = box(128, 128, 384, 384) # 256x256 region in center of 512x512 tile # Add overlay path column def add_overlay_path(batch): batch["mask_path"] = batch["path"].str.replace(".svs", "_mask.tiff") return batch tiles = tiles.map_batches(add_overlay_path) # Extract overlay patches tiles_with_overlay = tiles.with_column( "tissue_overlay", tile_overlay( roi=roi, overlay_path=col("mask_path"), tile_x=col("tile_x"), tile_y=col("tile_y"), mpp_x=col("mpp_x"), mpp_y=col("mpp_y"), ), num_cpus=1, memory=4 * 1024**3, ) # Compute overlap statistics (fraction of each unique value) tiles_with_stats = tiles.with_column( "tissue_overlap", tile_overlay_overlap( roi=roi, overlay_path=col("mask_path"), tile_x=col("tile_x"), tile_y=col("tile_y"), mpp_x=col("mpp_x"), mpp_y=col("mpp_y"), ), num_cpus=1, memory=4 * 1024**3, ) # Extract foreground coverage and filter def extract_coverage(tile): tile["tissue_coverage"] = tile["tissue_overlap"].get("255", 0.0) return tile tissue_tiles = tiles_with_stats.map(extract_coverage).filter( lambda t: t["tissue_coverage"] >= 0.5 ) ``` -------------------------------- ### Generate Tile Coordinates from Slide Row Source: https://github.com/rationai/ratiopath/blob/main/docs/learn/get-started/quick-start/tiling.md Defines a tiling function that takes slide metadata and generates a list of tile coordinates (x, y) for each tile within the slide. It copies relevant slide metadata to each tile coordinate. This function is designed to be used with Ray's flat_map for parallel processing. ```python from typing import Any from ratiopath.tiling import grid_tiles def tiling(row: dict[str, Any]) -> list[dict[str, Any]]: return [ { "tile_x": x, "tile_y": y, "path": row["path"], "slide_id": row["id"], "level": row["level"], "tile_extent_x": row["tile_extent_x"], "tile_extent_y": row["tile_extent_y"], } for x, y in grid_tiles( slide_extent=(row["extent_x"], row["extent_y"]), tile_extent=(row["tile_extent_x"], row["tile_extent_y"]), stride=(row["stride_x"], row["stride_y"]), last="keep", ) ] tiles = slides.flat_map(tiling, num_cpus=0.2, memory=128 * 1024**2) ``` -------------------------------- ### read_slide_tiles Source: https://context7.com/rationai/ratiopath/llms.txt Reads batches of tiles from whole-slide images using either OpenSlide or tifffile backends. This is a Ray UDF expression designed for efficient batch processing with automatic file format detection and caching. ```APIDOC ## POST /api/read_slide_tiles ### Description Reads batches of tiles from whole-slide images using either OpenSlide or tifffile backends. This is a Ray UDF expression designed for efficient batch processing with automatic file format detection and caching. ### Method POST ### Endpoint /api/read_slide_tiles ### Parameters #### Request Body - **path** (string) - Required - Path to the WSI file. - **tile_x** (integer) - Required - X coordinate of the tile's top-left corner. - **tile_y** (integer) - Required - Y coordinate of the tile's top-left corner. - **tile_extent_x** (integer) - Required - Width of the tile in pixels. - **tile_extent_y** (integer) - Required - Height of the tile in pixels. - **level** (integer) - Optional - The slide pyramid level to read from. Defaults to the level determined by `read_slides`. ### Request Example ```python from ratiopath.tiling import read_slide_tiles from ray.data.expressions import col tiles_with_pixels = tiles.with_column( "tile", read_slide_tiles( col("path"), col("tile_x"), col("tile_y"), col("tile_extent_x"), col("tile_extent_y"), col("level"), ), num_cpus=1, memory=4 * 1024**3, ) ``` ### Response #### Success Response (200) - **tile_data** (numpy array or similar) - The pixel data for the requested tile. #### Response Example ```json { "tile_data": "base64_encoded_pixel_data_or_numpy_array" } ``` ``` -------------------------------- ### Compute Tile Overlay Overlap Ratio (Python) Source: https://github.com/rationai/ratiopath/blob/main/docs/learn/get-started/quick-start/overlays.md Computes the pixel ratio for each unique value in an overlay patch, useful for filtering tiles based on content. It takes the region of interest, overlay path, tile coordinates, and resolution as input. The output is a dictionary mapping unique pixel values to their area coverage. ```python from ratiopath.tiling import tile_overlay_overlap tissue_tiles = tiles.with_column( "tissue_overlap", # New column name for the overlay patch tile_overlay_overlap( roi=roi, overlay_path=col("tissue_mask_path"), tile_x=col("tile_x"), tile_y=col("tile_y"), mpp_x=col("mpp_x"), mpp_y=col("mpp_y"), ), num_cpus=1, memory=4 * 1024**3, ) ``` -------------------------------- ### Filter Tiles Containing Tissue Source: https://github.com/rationai/ratiopath/blob/main/docs/learn/get-started/quick-start/tiling.md Filters the dataset to keep only tiles that contain tissue. It uses a heuristic where tiles with a standard deviation of pixel values above a certain threshold are considered to contain tissue, distinguishing them from uniform backgrounds. ```python tissue_tiles = tiles_with_pixels.filter(lambda row: row["tile"].std() > 8) ``` -------------------------------- ### grid_tiles Source: https://context7.com/rationai/ratiopath/llms.txt Generates tile coordinates for a given slide based on its size, tile size, and stride. The function yields tile coordinates in row-major order and provides options for handling edge tiles. ```APIDOC ## GET /api/grid_tiles ### Description Generates tile coordinates for a given slide based on its size, tile size, and stride. The function yields tile coordinates in row-major order and provides options for handling edge tiles that don't fit the stride pattern. ### Method GET ### Endpoint /api/grid_tiles ### Parameters #### Query Parameters - **slide_extent** (tuple of integers) - Required - The width and height of the slide in pixels (e.g., `(84320, 61120)`). - **tile_extent** (tuple of integers) - Required - The width and height of each tile in pixels (e.g., `(1024, 1024)`). - **stride** (tuple of integers) - Required - The step size between tiles in pixels (e.g., `(960, 960)`). - **last** (string) - Optional - Determines how to handle edge tiles. Options: `"keep"` (include edge tiles), `"drop"` (exclude), `"shift"` (shift to fit). Defaults to `"keep"`. ### Request Example ```python from ratiopath.tiling import grid_tiles slide_extent = (84320, 61120) tile_extent = (1024, 1024) stride = (960, 960) coordinates = list(grid_tiles( slide_extent=slide_extent, tile_extent=tile_extent, stride=stride, last="keep", )) ``` ### Response #### Success Response (200) - **coordinates** (list of numpy arrays) - A list where each element is a numpy array representing the `[x, y]` coordinates of a tile's top-left corner. #### Response Example ```json { "coordinates": [ [0, 0], [960, 0], [1920, 0], ... ] } ``` ``` -------------------------------- ### Extract Foreground Tissue Coverage (Python) Source: https://github.com/rationai/ratiopath/blob/main/docs/learn/get-started/quick-start/overlays.md A Python function to extract the coverage of a specific class (value '255') from the overlap dictionary generated by `tile_overlay_overlap`. It safely retrieves the coverage, defaulting to 0.0 if the class is not present, and adds it as a new key 'tissue_coverage'. ```python def extract_foreground_coverage(tile: dict) -> dict: """Extracts the foreground coverage (value 255) from the overlap dictionary.""" # Use .get(255, 0.0) to safely retrieve the value, defaulting to 0.0 if not present tile["tissue_coverage"] = tile["tissue_mask_overlap"].get('255', 0.0) return tile tiles_with_tissue_coverage = tiles_with_overlap.map(extract_foreground_coverage).filter( lambda tile: tile["tissue_coverage"] >= 0.5 # Keep tiles with at least 50% tissue ) ```