### Get Matter array file size

Source: https://github.com/bioconductor/hdf5array/wiki/matter-vs-hdf5:-which-format-performs-better-for-storing-big-array-like-datasets?

Retrieves the file size of the created 'matter_arr' object to assess storage efficiency.

```R
file.info(a1@paths)$size / 1e9
```

--------------------------------

### Saving and Loading SummarizedExperiment with HDF5

Source: https://context7.com/bioconductor/hdf5array/llms.txt

Demonstrates how to save a SummarizedExperiment object to disk using HDF5 for assay data and then reload it. It also shows how to perform a quick resave after modifying metadata without rewriting the HDF5 data.

```APIDOC
## Saving and Loading SummarizedExperiment with HDF5

### Description
Save a SummarizedExperiment object to disk using HDF5 for assay data, allowing for efficient storage and retrieval of large datasets. This includes options for custom chunk dimensions and compression levels. Reload the experiment and perform quick resaves after metadata modifications.

### Usage
```r
# Save to disk
dir <- tempfile()
saveHDF5SummarizedExperiment(se, dir=dir,
                             chunkdim=c(100, 25),  # custom chunk dims
                             level=6L,             # gzip level
                             verbose=TRUE)
list.files(dir)  # "assays.h5"  "se.rds"

# Reload with HDF5-backed assays
se2 <- loadHDF5SummarizedExperiment(dir)
assay(se2, "counts")  # HDF5Matrix – still on disk

# After adding metadata, quick-resave without rewriting HDF5
colData(se2)$batch <- sample(c("A","B"), ncol, replace=TRUE)
quickResaveHDF5SummarizedExperiment(se2, verbose=TRUE)

# Use a prefix to store multiple objects in the same directory
dir2 <- tempfile()
dir.create(dir2)
saveHDF5SummarizedExperiment(se, dir=dir2, prefix="exp1_")
saveHDF5SummarizedExperiment(se, dir=dir2, prefix="exp2_")
list.files(dir2)  # "exp1_assays.h5" "exp1_se.rds" "exp2_..."

se_exp1 <- loadHDF5SummarizedExperiment(dir2, prefix="exp1_")
```
```

--------------------------------

### Save and Load SummarizedExperiment with HDF5 assays

Source: https://context7.com/bioconductor/hdf5array/llms.txt

Saves a SummarizedExperiment object to disk, writing assays as HDF5 datasets and metadata to an RDS file. The result is a directory containing 'se.rds' and 'assays.h5'. `loadHDF5SummarizedExperiment` reconstructs the object with HDF5-backed assays. `quickResaveHDF5SummarizedExperiment` re-serializes only metadata.

```R
library(HDF5Array)
library(SummarizedExperiment)

# Build a toy SummarizedExperiment
row <- 200; ncol <- 50
counts <- matrix(rpois(nrow * ncol, lambda=5), nrow=nrow,
                 dimnames=list(paste0("gene", seq_len(nrow)),
                               paste0("cell", seq_len(ncol))))
se <- SummarizedExperiment(assays=list(counts=counts))
```

--------------------------------

### Save and Load SummarizedExperiment to HDF5

Source: https://context7.com/bioconductor/hdf5array/llms.txt

Saves a SummarizedExperiment object to disk using HDF5 for assays, allowing for efficient reloading. Supports custom chunk dimensions and compression levels.

```R
dir <- tempfile()
saveHDF5SummarizedExperiment(se, dir=dir,
                             chunkdim=c(100, 25),  # custom chunk dims
                             level=6L,             # gzip level
                             verbose=TRUE)
list.files(dir)  # "assays.h5"  "se.rds"

se2 <- loadHDF5SummarizedExperiment(dir)
assay(se2, "counts")  # HDF5Matrix – still on disk

colData(se2)$batch <- sample(c("A","B"), ncol, replace=TRUE)
quickResaveHDF5SummarizedExperiment(se2, verbose=TRUE)
```

```R
dir2 <- tempfile()
dir.create(dir2)
saveHDF5SummarizedExperiment(se, dir=dir2, prefix="exp1_")
saveHDF5SummarizedExperiment(se, dir=dir2, prefix="exp2_")
list.files(dir2)  # "exp1_assays.h5" "exp1_se.rds" "exp2_..."

se_exp1 <- loadHDF5SummarizedExperiment(dir2, prefix="exp1_")
```

--------------------------------

### saveHDF5SummarizedExperiment() / loadHDF5SummarizedExperiment()

Source: https://context7.com/bioconductor/hdf5array/llms.txt

Saves and loads SummarizedExperiment objects with HDF5-backed assays. saveHDF5SummarizedExperiment writes assays to HDF5 datasets and metadata to an .rds file, while loadHDF5SummarizedExperiment reconstructs the object.

```APIDOC
## saveHDF5SummarizedExperiment() / loadHDF5SummarizedExperiment()

### Description
Saves a `SummarizedExperiment` object to disk by writing all assays as HDF5 datasets and serialising the R metadata (colData, rowData, etc.) to an `.rds` file. The result is a directory containing `se.rds` and `assays.h5`. `loadHDF5SummarizedExperiment()` reconstructs the object with HDF5-backed assays. `quickResaveHDF5SummarizedExperiment()` re-serialises only the metadata without touching the HDF5 file.

### Usage
```r
library(HDF5Array)
library(SummarizedExperiment)

# Build a toy SummarizedExperiment
nrow <- 200; ncol <- 50
counts <- matrix(rpois(nrow * ncol, lambda=5), nrow=nrow,
                 dimnames=list(paste0("gene", seq_len(nrow)),
                               paste0("cell", seq_len(ncol))))
se <- SummarizedExperiment(assays=list(counts=counts))

# ... further usage examples for saving and loading ...
```
```

--------------------------------

### H5SparseMatrix Operations

Source: https://context7.com/bioconductor/hdf5array/llms.txt

Demonstrates basic operations on an H5SparseMatrix, including dimension checking, sparsity, non-zero count, subsetting, and extraction of non-zero data by column. It also shows coercion to a dgCMatrix.

```R
dim(sm)
# c(500, 300)
is_sparse(sm)
# TRUE
nzcount(sm)
# number of nonzero entries

# Subset (delayed)
sm[1:10, 1:20]

# Extract nonzero values by column (low-level, avoids materialising full rows)
nz_cols <- extractNonzeroDataByCol(sm, 1:5)
lengths(nz_cols)

# Coerce to in-memory sparse matrix
as(sm, "dgCMatrix")
```

--------------------------------

### Create a large integer array

Source: https://github.com/bioconductor/hdf5array/wiki/matter-vs-hdf5:-which-format-performs-better-for-storing-big-array-like-datasets?

Initializes a large 3D integer array with random values for benchmarking purposes.

```R
set.seed(123)
a0 <- array(as.integer(runif(250e6, max=100)), dim=c(3000, 800, 125))
```

--------------------------------

### Manage HDF5 Dump Directory and File Settings

Source: https://context7.com/bioconductor/hdf5array/llms.txt

Control the location and naming of automatically created HDF5 datasets. These settings are global and propagated to BiocParallel workers.

```R
library(HDF5Array)

# --- Dump directory and file ---
getHDF5DumpDir()              # current auto-dump directory (in tempdir())
setHDF5DumpDir("~/my_dumps")  # redirect auto-dumps to a custom directory

setHDF5DumpFile("~/my_dumps/results.h5")  # pin all auto-dumps to one file
getHDF5DumpFile()
lsHDF5DumpFile()              # list datasets in the current dump file

setHDF5DumpName("/experiment1/counts")  # pin the next dataset name
getHDF5DumpName()

# --- Chunk geometry ---
getHDF5DumpChunkLength()          # 1,000,000 elements (default)
setHDF5DumpChunkLength(500000L)

getHDF5DumpChunkShape()           # "scale" (default)
setHDF5DumpChunkShape("first-dim-grows-first")

# Compute chunk dims for a given array shape
getHDF5DumpChunkDim(c(20000L, 500L))   # e.g. c(2000, 500)

# --- Compression ---
getHDF5DumpCompressionLevel()   # 6 (default; 0 = none, 9 = max)
setHDF5DumpCompressionLevel(9L)

# --- Dump log (shows every dataset created in this session) ---
m <- matrix(runif(100), 10, 10)
writeHDF5Array(m, name="test1")
writeHDF5Array(m + 1, name="test2")
showHDF5DumpLog()
# [2025-01-15 10:00:01] #1 In file '.../auto....h5': creation of dataset
#     '/test1' (10x10:double, chunkdims=10x10, level=6)
```

--------------------------------

### H5ADMatrix()

Source: https://context7.com/bioconductor/hdf5array/llms.txt

Constructs a DelayedMatrix backed by the central X matrix or any /layers matrix in an .h5ad (AnnData) file. It handles both dense and sparse storage and populates rownames/colnames from the var and obs groups.

```APIDOC
## H5ADMatrix()

### Description
Constructs a `DelayedMatrix` backed by the central `X` matrix (or any `/layers` matrix) in an `.h5ad` (AnnData) file. Automatically handles both dense (`HDF5ArraySeed`) and sparse (`CSC_H5ADMatrixSeed` / `CSR_H5ADMatrixSeed`) storage, and populates `rownames`/`colnames` from the `var` and `obs` groups.

### Usage
```r
library(HDF5Array)
library(zellkonverter)  # provides test h5ad files

# Obtain an example h5ad file
h5ad_path <- system.file("extdata", "krumsiek11.h5ad",
                         package="zellkonverter")

# Load the central X matrix
X <- H5ADMatrix(h5ad_path)
X
# <200 x 11> matrix of class H5ADMatrix and type "double":

dim(X)          # c(200, 11)
rownames(X)     # cell barcodes from obs/_index
colnames(X)     # gene names from var/_index

# Load a specific layer instead of X
# (requires the h5ad file to have a /layers/counts group)
# counts <- H5ADMatrix(h5ad_path, layer="counts")

# Arithmetic is delayed
log1p_X <- log1p(X)
class(log1p_X)  # "DelayedMatrix"

# Realise to memory
as.matrix(X[1:5, ])

# Access the underlying seed to inspect storage format
seed(X)
# Dense_H5ADMatrixSeed / CSC_H5ADMatrixSeed / CSR_H5ADMatrixSeed
is_sparse(X)  # TRUE if stored as h5sparse
nzcount(X)    # only works for sparse seeds
```
```

--------------------------------

### writeTENxMatrix()

Source: https://context7.com/bioconductor/hdf5array/llms.txt

Writes any matrix-like object to disk in the 10x Genomics HDF5 sparse format (CSR layout). It processes the input column-by-column for large matrices and returns a TENxMatrix pointing to the result.

```APIDOC
## writeTENxMatrix()

### Description
Writes any matrix-like object to disk in the 10x Genomics HDF5 sparse format (CSR layout with standard group structure). Returns a `TENxMatrix` pointing to the result. Block-processes the input column-by-column so that arbitrarily large matrices can be written without loading them fully into memory.

### Usage
```r
library(HDF5Array)
library(Matrix)

m <- rsparsematrix(5000, 3000, density=0.02,
                   dimnames=list(paste0("g", 1:5000),
                                 paste0("b", 1:3000)))

h5f <- tempfile(fileext=".h5")
tenx <- writeTENxMatrix(m, h5f, group="counts",
                        level=6L,      # gzip compression (0–9)
                        verbose=TRUE)
# sparsity: 0.98

tenx
nzcount(tenx)        # actual stored nonzero count
sparsity(tenx)       # fraction of zero entries

# Round-trip: coerce a TENxMatrix back to dgCMatrix
stopifnot(all.equal(as(tenx, "dgCMatrix"), as(m, "dgCMatrix")))

# Using coercion shorthand (writes to current dump file)
tenx2 <- as(m, "TENxMatrix")
path(tenx2)
```
```

--------------------------------

### Compare extracted slices from different formats

Source: https://github.com/bioconductor/hdf5array/wiki/matter-vs-hdf5:-which-format-performs-better-for-storing-big-array-like-datasets?

Verifies that the data extracted from Matter and HDF5 arrays are identical.

```R
identical(x1, x2)
```

```R
identical(x1, x3)
```

--------------------------------

### H5SparseMatrix() Constructor

Source: https://context7.com/bioconductor/hdf5array/llms.txt

Constructs a DelayedMatrix backed by an HDF5 sparse matrix stored in CSR/CSC/Yale format. The sparse layout is automatically detected from HDF5 group attributes, but can also be overridden. It supports efficient operations on specific slices through nonzero-data extraction by column or row.

```APIDOC
## H5SparseMatrix()

### Description
Constructs a `DelayedMatrix` backed by an HDF5 sparse matrix stored in CSR/CSC/Yale format (as produced by Python's `scipy.sparse` or AnnData). The sparse layout is detected automatically from the HDF5 group attributes; it can also be overridden. Nonzero-data extraction by column or row is available for efficient operations on specific slices.

### Usage
```r
library(HDF5Array)

# Write a sparse matrix in 10x/CSC format first, then reload
m <- Matrix::rsparsematrix(500, 300, density=0.05)
h5f <- tempfile(fileext=".h5")
# writeTENxMatrix writes 10x CSR format; H5SparseMatrix reads generic h5sparse
tenx <- writeTENxMatrix(m, h5f, group="matrix")

# H5SparseMatrix works on any h5sparse group (CSR or CSC)
sm <- H5SparseMatrix(h5f, "matrix")
```
```

--------------------------------

### R Session Information

Source: https://github.com/bioconductor/hdf5array/wiki/matter-vs-hdf5:-which-format-performs-better-for-storing-big-array-like-datasets?

The `sessionInfo()` function in R provides details about the R version, platform, loaded packages, and their versions. This is useful for reproducibility.

```r
> sessionInfo()
R version 3.6.0 Patched (2019-05-02 r76454)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.5 LTS

Matrix products: default
BLAS:   /home/hpages/R/R-3.6.r76454/lib/libRblas.so
LAPACK: /home/hpages/R/R-3.6.r76454/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] HDF5Array_1.13.9    rhdf5_2.29.3        DelayedArray_0.11.8
 [4] IRanges_2.19.16     S4Vectors_0.23.25   BiocGenerics_0.31.6
 [7] matrixStats_0.55.0  matter_1.11.1       biglm_0.9-1        
[10] DBI_1.0.0           BiocParallel_1.19.3

loaded via a namespace (and not attached):
 [1] lattice_0.20-38 digest_0.6.21   grid_3.6.0      irlba_2.3.3    
[5] Matrix_1.2-17   Rhdf5lib_1.7.5  tools_3.6.0     compiler_3.6.0 

```

--------------------------------

### HDF5 Dump Management

Source: https://context7.com/bioconductor/hdf5array/llms.txt

Provides functions to manage the global options for HDF5 dump directories, file pinning, dataset naming, chunk geometry, and compression levels. It also includes functionality to show the dump log.

```APIDOC
## HDF5 Dump Management

### Description
A set of `get/set` functions control where and how automatically created HDF5 datasets are stored. These global options are propagated to `BiocParallel` workers, ensuring consistent dump locations and compression settings across parallel jobs.

### Usage
```r
library(HDF5Array)

# --- Dump directory and file ---
getHDF5DumpDir()              # current auto-dump directory (in tempdir())
setHDF5DumpDir("~/my_dumps")  # redirect auto-dumps to a custom directory

setHDF5DumpFile("~/my_dumps/results.h5")  # pin all auto-dumps to one file
getHDF5DumpFile()
lsHDF5DumpFile()              # list datasets in the current dump file

setHDF5DumpName("/experiment1/counts")  # pin the next dataset name
getHDF5DumpName()

# --- Chunk geometry ---
getHDF5DumpChunkLength()          # 1,000,000 elements (default)
setHDF5DumpChunkLength(500000L)

getHDF5DumpChunkShape()           # "scale" (default)
setHDF5DumpChunkShape("first-dim-grows-first")

# Compute chunk dims for a given array shape
getHDF5DumpChunkDim(c(20000L, 500L))   # e.g. c(2000, 500)

# --- Compression ---
getHDF5DumpCompressionLevel()   # 6 (default; 0 = none, 9 = max)
setHDF5DumpCompressionLevel(9L)

# --- Dump log (shows every dataset created in this session) ---
m <- matrix(runif(100), 10, 10)
writeHDF5Array(m, name="test1")
writeHDF5Array(m + 1, name="test2")
showHDF5DumpLog()
# [2025-01-15 10:00:01] #1 In file '.../auto....h5': creation of dataset
#     '/test1' (10x10:double, chunkdims=10x10, level=6)
```
```

--------------------------------

### HDF5Array() Constructor

Source: https://context7.com/bioconductor/hdf5array/llms.txt

Constructs a DelayedArray (or DelayedMatrix) backed by a conventional dense HDF5 dataset. It supports local file paths or H5File objects for remote files (e.g., S3). Options include enabling memory-optimized block processing for sparse data and overriding the inferred R type.

```APIDOC
## HDF5Array()

### Description
Constructs a `DelayedArray` (or `DelayedMatrix`) backed by a conventional dense HDF5 dataset. Accepts a local file path or an `H5File` object for S3-hosted files. The optional `as.sparse` flag enables memory-optimized block processing for zero-heavy datasets, and `type` overrides the automatically inferred R type.

### Usage
```r
library(HDF5Array)

# --- Local file ---
toy_h5 <- system.file("extdata", "toy.h5", package="HDF5Array")
h5ls(toy_h5)

M2 <- HDF5Array(toy_h5, "M2")

# Override inferred type
M2_int <- HDF5Array(toy_h5, "M2", type="integer")

# Flag as sparse for memory-efficient block processing
M2_sp <- HDF5Array(toy_h5, "M2", as.sparse=TRUE)

# Toggle sparse flag after construction
is_sparse(M2) <- TRUE

# Standard array operations (all delayed)
dim(M2)

# --- Remote file on Amazon S3 ---
h5file <- H5File("https://rhdf5-public.s3.eu-central-1.amazonaws.com/rhdf5ex_t_float_3d.h5",
                 s3=TRUE)
HDF5Array(h5file, "a1")
```
```

--------------------------------

### Construct H5ADMatrix from .h5ad file

Source: https://context7.com/bioconductor/hdf5array/llms.txt

Loads the central 'X' matrix or a specific layer from an AnnData (.h5ad) file into a DelayedMatrix. Supports both dense and sparse storage, and populates row/column names from metadata. Arithmetic operations are delayed.

```R
library(HDF5Array)
library(zellkonverter)

h5ad_path <- system.file("extdata", "krumsiek11.h5ad",
                         package="zellkonverter")

X <- H5ADMatrix(h5ad_path)
X
# <200 x 11> matrix of class H5ADMatrix and type "double":

dim(X)
rownames(X)
colnames(X)

# Load a specific layer instead of X
# (requires the h5ad file to have a /layers/counts group)
# counts <- H5ADMatrix(h5ad_path, layer="counts")

# Arithmetic is delayed
log1p_X <- log1p(X)
class(log1p_X)

# Realise to memory
as.matrix(X[1:5, ])

# Access the underlying seed to inspect storage format
seed(X)
# Dense_H5ADMatrixSeed / CSC_H5ADMatrixSeed / CSR_H5ADMatrixSeed
is_sparse(X)
nzcount(X)
```

--------------------------------

### Create Matter array and DelayedArray object

Source: https://github.com/bioconductor/hdf5array/wiki/matter-vs-hdf5:-which-format-performs-better-for-storing-big-array-like-datasets?

Generates a 'matter_arr' object and wraps it in a DelayedArray for efficient handling of large arrays. Measures the time taken for creation.

```R
library(matter)
system.time(a1 <- matter_arr(a0, datamode="integer", dim=dim(a0)))
```

```R
library(DelayedArray)
A1 <- DelayedArray(a1)
```

--------------------------------

### TENxMatrix()

Source: https://context7.com/bioconductor/hdf5array/llms.txt

Constructs a DelayedMatrix backed by the HDF5 sparse matrix format used by 10x Genomics. This is suitable for Cell Ranger output .h5 files.

```APIDOC
## TENxMatrix()

### Description
Constructs a `DelayedMatrix` backed by the HDF5 sparse matrix format used by 10x Genomics (CSR with `shape`, `data`, `indices`, `indptr`, `barcodes`, `genes` datasets under a named group). This is the appropriate constructor for Cell Ranger output `.h5` files.

### Usage
```r
library(HDF5Array)
library(TENxBrainData)  # provides example 10x data

# Download 1.3 Million Brain Cell Dataset (subset)
tenx_file <- TENxBrainData()  # returns a SingleCellExperiment
# Or load directly from an .h5 file:
# tenx_file <- "path/to/filtered_gene_bc_matrices.h5"
# m <- TENxMatrix(tenx_file, group="mm10")

# Create a TENxMatrix from an in-memory sparse matrix
library(Matrix)
m <- rsparsematrix(1000, 500, density=0.01,
                   dimnames=list(paste0("gene", 1:1000),
                                 paste0("cell", 1:500)))
h5f <- tempfile(fileext=".h5")
tenx <- writeTENxMatrix(m, h5f, group="matrix", verbose=TRUE)
tenx
# <1000 x 500> sparse matrix of class TENxMatrix and type "double":

dim(tenx)       # c(1000, 500)
is_sparse(tenx) # TRUE
nzcount(tenx)   # number of nonzero entries

# Delayed subsetting and arithmetic
sub <- tenx[1:100, 1:50]
row_sums <- rowSums(sub)  # block-processed

# Coerce to dgCMatrix for in-memory use
dgc <- as(tenx, "dgCMatrix")
```
```

--------------------------------

### Create compressed HDF5 DelayedArray

Source: https://github.com/bioconductor/hdf5array/wiki/matter-vs-hdf5:-which-format-performs-better-for-storing-big-array-like-datasets?

Writes the array to an HDF5 file using HDF5Array with compression enabled (level=6). Measures the time and reports the file size.

```R
system.time(A3 <- writeHDF5Array(a0, chunkdim=c(50, 50, 10), level=6))
```

```R
file.info(path(A3))$size / 1e9
```

--------------------------------

### Construct HDF5Array from Local File

Source: https://context7.com/bioconductor/hdf5array/llms.txt

Constructs a DelayedArray backed by a dense HDF5 dataset. Supports overriding the inferred R type and enabling sparse processing for zero-heavy datasets. Use for local HDF5 files.

```r
library(HDF5Array)

# --- Local file ---
toy_h5 <- system.file("extdata", "toy.h5", package="HDF5Array")
h5ls(toy_h5)
#   group name       otype  dclass      dim
# 0     /   M1 H5I_DATASET   FLOAT  100 x 3
# 1     /   M2 H5I_DATASET   FLOAT 200 x 3

M2 <- HDF5Array(toy_h5, "M2")
M2
# <200 x 3> matrix of class HDF5Matrix and type "double":

# Override inferred type
M2_int <- HDF5Array(toy_h5, "M2", type="integer")
type(M2_int)  # "integer"

# Flag as sparse for memory-efficient block processing
M2_sp <- HDF5Array(toy_h5, "M2", as.sparse=TRUE)
is_sparse(M2_sp)  # TRUE

# Toggle sparse flag after construction
is_sparse(M2) <- TRUE

# Standard array operations (all delayed)
dim(M2)         # c(200, 3)
dimnames(M2)
M2[1:5, ]       # subset – no data loaded until needed
as.array(M2)    # materialise into memory
```

--------------------------------

### Construct TENxMatrix from 10x Genomics HDF5 file

Source: https://context7.com/bioconductor/hdf5array/llms.txt

Creates a DelayedMatrix from the HDF5 sparse matrix format used by 10x Genomics. This is suitable for Cell Ranger output files. It supports delayed subsetting and arithmetic, and can be coerced to a dgCMatrix.

```R
library(HDF5Array)
library(TENxBrainData)

tenx_file <- TENxBrainData()
# Or load directly from an .h5 file:
# tenx_file <- "path/to/filtered_gene_bc_matrices.h5"
# m <- TENxMatrix(tenx_file, group="mm10")

# Create a TENxMatrix from an in-memory sparse matrix
library(Matrix)
m <- rsparsematrix(1000, 500, density=0.01,
                   dimnames=list(paste0("gene", 1:1000),
                                 paste0("cell", 1:500)))
h5f <- tempfile(fileext=".h5")
tenx <- writeTENxMatrix(m, h5f, group="matrix", verbose=TRUE)
tenx
# <1000 x 500> sparse matrix of class TENxMatrix and type "double":

dim(tenx)
is_sparse(tenx)
nzcount(tenx)

# Delayed subsetting and arithmetic
sub <- tenx[1:100, 1:50]
row_sums <- rowSums(sub)

# Coerce to dgCMatrix for in-memory use
dgc <- as(tenx, "dgCMatrix")
```

--------------------------------

### extractNonzeroDataByCol() / extractNonzeroDataByRow()

Source: https://context7.com/bioconductor/hdf5array/llms.txt

Low-level generics for extracting nonzero data from H5SparseMatrix (or TENxMatrix) objects column by column or row by row, without materializing the full matrix. Returns a NumericList or IntegerList.

```APIDOC
## extractNonzeroDataByCol() / extractNonzeroDataByRow()

### Description
Low-level generics for extracting nonzero data from `H5SparseMatrix` (or `TENxMatrix`) objects one or more columns (or rows) at a time, without materialising the full matrix. Return a `NumericList` or `IntegerList` parallel to the requested indices.

### Usage
```r
library(HDF5Array)
library(Matrix)

m <- rsparsematrix(1000, 500, density=0.05)
h5f <- tempfile(fileext=".h5")
tenx <- writeTENxMatrix(m, h5f, group="mat")

# Extract nonzero values for columns 10, 20, 30
nz <- extractNonzeroDataByCol(tenx, c(10L, 20L, 30L))
length(nz)       # 3
lengths(nz)      # number of nonzero entries in each requested column
nz[[1]]          # nonzero values in column 10

# For CSR-layout H5SparseMatrix, use extractNonzeroDataByRow
sm_csr <- H5SparseMatrix(h5f, "mat")  # layout determined by file
# if CSR:
# nz_rows <- extractNonzeroDataByRow(sm_csr, 1:5)
```
```

--------------------------------

### Extract Nonzero Data from Sparse HDF5 Matrices

Source: https://context7.com/bioconductor/hdf5array/llms.txt

Low-level generics for extracting nonzero data from H5SparseMatrix or TENxMatrix objects by column or row. Returns a NumericList or IntegerList parallel to the requested indices.

```R
library(HDF5Array)
library(Matrix)

m <- rsparsematrix(1000, 500, density=0.05)
h5f <- tempfile(fileext=".h5")
tenx <- writeTENxMatrix(m, h5f, group="mat")

# Extract nonzero values for columns 10, 20, 30
nz <- extractNonzeroDataByCol(tenx, c(10L, 20L, 30L))
length(nz)       # 3
lengths(nz)      # number of nonzero entries in each requested column
nz[[1]]          # nonzero values in column 10

# For CSR-layout H5SparseMatrix, use extractNonzeroDataByRow
sm_csr <- H5SparseMatrix(h5f, "mat")  # layout determined by file
# if CSR:
# nz_rows <- extractNonzeroDataByRow(sm_csr, 1:5)
```

--------------------------------

### Write matrix to 10x Genomics HDF5 sparse format

Source: https://context7.com/bioconductor/hdf5array/llms.txt

Writes a matrix-like object to disk in the 10x Genomics HDF5 sparse format (CSR layout). This function block-processes input column-by-column, allowing large matrices to be written without full in-memory loading. It returns a TENxMatrix pointing to the written file.

```R
library(HDF5Array)
library(Matrix)

m <- rsparsematrix(5000, 3000, density=0.02,
                   dimnames=list(paste0("g", 1:5000),
                                 paste0("b", 1:3000)))

h5f <- tempfile(fileext=".h5")
tenx <- writeTENxMatrix(m, h5f, group="counts",
                        level=6L,      # gzip compression (0–9)
                        verbose=TRUE)
# sparsity: 0.98

tenx
nzcount(tenx)
sparsity(tenx)

# Round-trip: coerce a TENxMatrix back to dgCMatrix
stopifnot(all.equal(as(tenx, "dgCMatrix"), as(m, "dgCMatrix")))

# Using coercion shorthand (writes to current dump file)
tenx2 <- as(m, "TENxMatrix")
path(tenx2)
```

--------------------------------

### Create uncompressed HDF5 DelayedArray

Source: https://github.com/bioconductor/hdf5array/wiki/matter-vs-hdf5:-which-format-performs-better-for-storing-big-array-like-datasets?

Writes the array to an HDF5 file using HDF5Array, with compression disabled (level=0). Measures the time and reports the file size.

```R
library(HDF5Array)
system.time(A2 <- writeHDF5Array(a0, chunkdim=c(50, 50, 10), level=0))
```

```R
file.info(path(A2))$size / 1e9
```

--------------------------------

### Construct HDF5Array from Remote S3 File

Source: https://context7.com/bioconductor/hdf5array/llms.txt

Constructs an HDF5Array from a remote HDF5 file hosted on Amazon S3. Requires an H5File object configured for S3 access.

```r
library(HDF5Array)

# --- Remote file on Amazon S3 ---
h5file <- H5File("https://rhdf5-public.s3.eu-central-1.amazonaws.com/rhdf5ex_t_float_3d.h5",
                 s3=TRUE)
HDF5Array(h5file, "a1")
```

--------------------------------

### Configure parallel processing for DelayedArray operations

Source: https://github.com/bioconductor/hdf5array/wiki/matter-vs-hdf5:-which-format-performs-better-for-storing-big-array-like-datasets?

Sets up parallel processing parameters (workers and block size) and verbosity for DelayedArray operations. This configuration affects the performance of subsequent summarization tasks.

```R
workers <- 4
block_size <- 2.5e6  # 2.5 Mb
setAutoBPPARAM(MulticoreParam(workers))
setAutoBlockSize(block_size)
DelayedArray:::set_verbose_block_processing(TRUE)
```

--------------------------------

### Construct HDF5SparseMatrix

Source: https://context7.com/bioconductor/hdf5array/llms.txt

Constructs a DelayedMatrix from an HDF5 sparse matrix (CSR/CSC/Yale format). Automatically detects sparse layout from HDF5 attributes, which can also be overridden. Use for sparse matrices stored in HDF5.

```r
library(HDF5Array)

# Write a sparse matrix in 10x/CSC format first, then reload
m <- Matrix::rsparsematrix(500, 300, density=0.05)
h5f <- tempfile(fileext=".h5")
# writeTENxMatrix writes 10x CSR format; H5SparseMatrix reads generic h5sparse
tenx <- writeTENxMatrix(m, h5f, group="matrix")

# H5SparseMatrix works on any h5sparse group (CSR or CSC)
sm <- H5SparseMatrix(h5f, "matrix")
sm
```

--------------------------------

### Reshape HDF5 Dataset with ReshapedHDF5Array

Source: https://context7.com/bioconductor/hdf5array/llms.txt

Wraps an HDF5 dataset as a DelayedArray with a user-supplied virtual reshape. This allows the on-disk data dimensions to differ from the in-memory view without copying data.

```R
library(HDF5Array)

toy_h5 <- system.file("extdata", "toy.h5", package="HDF5Array")

# The dataset "M2" is stored as 200 x 3 on disk
M2 <- HDF5Array(toy_h5, "M2")
dim(M2)   # c(200, 3)

# Reshape to 3D without touching the file
M2r <- ReshapedHDF5Array(toy_h5, "M2", dim=c(4L, 50L, 3L))
dim(M2r)  # c(4, 50, 3)
class(M2r)  # "ReshapedHDF5Array"

# All DelayedArray operations still work
M2r[1, , ]        # 50 x 3 slice
as.array(M2r[1:2, 1:5, ])
```

--------------------------------

### Extract random subset using extract_array from Matter array

Source: https://github.com/bioconductor/hdf5array/wiki/matter-vs-hdf5:-which-format-performs-better-for-storing-big-array-like-datasets?

Measures the time to extract a random subset of elements using the 'extract_array' function from the 'matter_arr' object.

```R
i <- list(sample(3000L, 50), sample(800L, 25), sample(125L, 10))
system.time(x1 <- extract_array(a1, i))
```

--------------------------------

### Write Array to HDF5 File

Source: https://context7.com/bioconductor/hdf5array/llms.txt

Writes any array-like or DelayedArray object to an HDF5 file using block processing. Returns an HDF5Array pointing to the new dataset. Supports control over chunk dimensions, compression, and data types.

```r
library(HDF5Array)

h5file <- tempfile(fileext=".h5")

# Write a plain matrix
m0 <- matrix(runif(364, min=-1), nrow=26,
             dimnames=list(letters, LETTERS[1:14]))
M1 <- writeHDF5Array(m0, h5file, name="M1", chunkdim=c(5, 5))
chunkdim(M1)    # c(5, 5)
dimnames(M1)    # dimnames are stored in the HDF5 file by default

# Skip writing dimnames
M1b <- writeHDF5Array(m0, h5file, name="M1b", with.dimnames=FALSE)
is.null(dimnames(M1b))  # TRUE

# Write a sparse matrix (auto-detected; as.sparse flag set on result)
sm <- Matrix::rsparsematrix(20, 8, density=0.1)
M2 <- writeHDF5Array(sm, h5file, name="M2", chunkdim=c(5, 5))
is_sparse(M2)   # TRUE

# Realize a DelayedArray with pending operations to disk
M3_delayed <- log(t(DelayedArray(m0)) + 1)
M3 <- writeHDF5Array(M3_delayed, h5file, name="M3",
                     chunkdim=c(5, 5), level=6L)
M3

# Coercion shorthand – writes to the current HDF5 dump file
auto <- as(m0, "HDF5Array")
path(auto)   # path to auto-generated dump file

# Use a compact 32-bit float type to reduce disk footprint
M4 <- writeHDF5Array(m0, h5file, name="M4", H5type="H5T_IEEE_F32LE")
```

--------------------------------

### Extract subset with complex indexing from Matter array

Source: https://github.com/bioconductor/hdf5array/wiki/matter-vs-hdf5:-which-format-performs-better-for-storing-big-array-like-datasets?

Measures the time for extracting a subset using more complex indexing from the 'matter_arr' object.

```R
system.time(x1 <- a1[(310:11)*7, (1:100)*8, 77])
```

--------------------------------

### ReshapedHDF5Array()

Source: https://context7.com/bioconductor/hdf5array/llms.txt

Wraps an HDF5 dataset as a DelayedArray with a user-supplied virtual reshape, allowing on-disk dimensions to differ from the in-memory view without data copying.

```APIDOC
## ReshapedHDF5Array()

### Description
Wraps an HDF5 dataset as a `DelayedArray` with a user-supplied virtual reshape, allowing the on-disk data dimensions to differ from the in-memory view without copying data.

### Usage
```r
library(HDF5Array)

toy_h5 <- system.file("extdata", "toy.h5", package="HDF5Array")

# The dataset "M2" is stored as 200 x 3 on disk
M2 <- HDF5Array(toy_h5, "M2")
dim(M2)   # c(200, 3)

# Reshape to 3D without touching the file
M2r <- ReshapedHDF5Array(toy_h5, "M2", dim=c(4L, 50L, 3L))
dim(M2r)  # c(4, 50, 3)
class(M2r)  # "ReshapedHDF5Array"

# All DelayedArray operations still work
M2r[1, , ]        # 50 x 3 slice
as.array(M2r[1:2, 1:5, ])
```
```

--------------------------------

### Calculate column sums for uncompressed HDF5 DelayedArray

Source: https://github.com/bioconductor/hdf5array/wiki/matter-vs-hdf5:-which-format-performs-better-for-storing-big-array-like-datasets?

Measures the time taken to compute column sums for a slice of the uncompressed HDF5 DelayedArray, showing block processing messages and potential errors.

```R
system.time(cs2 <- colSums(A2[ , , 77L]))
```

--------------------------------

### Verify Dataframe Identity in R

Source: https://github.com/bioconductor/hdf5array/wiki/matter-vs-hdf5:-which-format-performs-better-for-storing-big-array-like-datasets?

Use the `identical()` function in R to check if two data structures are exactly the same.

```r
identical(cs1, cs2)
# [1] TRUE
identical(cs1, cs3)
# [1] TRUE
```

--------------------------------

### writeHDF5Array() Function

Source: https://context7.com/bioconductor/hdf5array/llms.txt

Writes any array-like or DelayedArray object to an HDF5 file using block processing, ensuring the object is never fully materialized in memory. It returns an HDF5Array pointing to the newly written dataset and allows control over chunk dimensions, compression level, HDF5 datatype, and dimname storage.

```APIDOC
## writeHDF5Array()

### Description
Writes any array-like or `DelayedArray` object to an HDF5 file via block processing — the object is never fully materialised in memory. Returns an `HDF5Array` pointing to the newly written dataset. Chunk dimensions, compression level, H5 datatype, and dimname storage are all controllable.

### Usage
```r
library(HDF5Array)

h5file <- tempfile(fileext=".h5")

# Write a plain matrix
m0 <- matrix(runif(364, min=-1), nrow=26,
             dimnames=list(letters, LETTERS[1:14]))
M1 <- writeHDF5Array(m0, h5file, name="M1", chunkdim=c(5, 5))

# Skip writing dimnames
M1b <- writeHDF5Array(m0, h5file, name="M1b", with.dimnames=FALSE)

# Write a sparse matrix (auto-detected; as.sparse flag set on result)
sm <- Matrix::rsparsematrix(20, 8, density=0.1)
M2 <- writeHDF5Array(sm, h5file, name="M2", chunkdim=c(5, 5))

# Realize a DelayedArray with pending operations to disk
M3_delayed <- log(t(DelayedArray(m0)) + 1)
M3 <- writeHDF5Array(M3_delayed, h5file, name="M3",
                     chunkdim=c(5, 5), level=6L)

# Coercion shorthand – writes to the current HDF5 dump file
auto <- as(m0, "HDF5Array")

# Use a compact 32-bit float type to reduce disk footprint
M4 <- writeHDF5Array(m0, h5file, name="M4", H5type="H5T_IEEE_F32LE")
```
```

--------------------------------

### Calculate column sums for compressed HDF5 DelayedArray

Source: https://github.com/bioconductor/hdf5array/wiki/matter-vs-hdf5:-which-format-performs-better-for-storing-big-array-like-datasets?

Measures the time taken to compute column sums for a slice of the compressed HDF5 DelayedArray, showing block processing messages and potential errors.

```R
system.time(cs3 <- colSums(A3[ , , 77L]))
```

--------------------------------

### Extract random subset using extract_array from uncompressed HDF5 DelayedArray

Source: https://github.com/bioconductor/hdf5array/wiki/matter-vs-hdf5:-which-format-performs-better-for-storing-big-array-like-datasets?

Measures the time to extract the same random subset using 'extract_array' from the uncompressed HDF5 DelayedArray.

```R
system.time(x2 <- extract_array(A2, i))
```

--------------------------------

### Extract random subset using extract_array from compressed HDF5 DelayedArray

Source: https://github.com/bioconductor/hdf5array/wiki/matter-vs-hdf5:-which-format-performs-better-for-storing-big-array-like-datasets?

Measures the time to extract the same random subset using 'extract_array' from the compressed HDF5 DelayedArray.

```R
system.time(x3 <- extract_array(A3, i))
```

--------------------------------

### Calculate column sums for Matter DelayedArray

Source: https://github.com/bioconductor/hdf5array/wiki/matter-vs-hdf5:-which-format-performs-better-for-storing-big-array-like-datasets?

Measures the time taken to compute column sums for a slice of the Matter DelayedArray, showing block processing messages.

```R
system.time(cs1 <- colSums(A1[ , , 77L]))
```

--------------------------------

### Extract subset with complex indexing from compressed HDF5 DelayedArray

Source: https://github.com/bioconductor/hdf5array/wiki/matter-vs-hdf5:-which-format-performs-better-for-storing-big-array-like-datasets?

Measures the time for extracting the same subset from the compressed HDF5 DelayedArray.

```R
system.time(x3 <- as.matrix(A3[(310:11)*7, (1:100)*8, 77]))
```

--------------------------------

### Extract subset with complex indexing from uncompressed HDF5 DelayedArray

Source: https://github.com/bioconductor/hdf5array/wiki/matter-vs-hdf5:-which-format-performs-better-for-storing-big-array-like-datasets?

Measures the time for extracting the same subset from the uncompressed HDF5 DelayedArray.

```R
system.time(x2 <- as.matrix(A2[(310:11)*7, (1:100)*8, 77]))
```

--------------------------------

### Extract slice from compressed HDF5 DelayedArray

Source: https://github.com/bioconductor/hdf5array/wiki/matter-vs-hdf5:-which-format-performs-better-for-storing-big-array-like-datasets?

Measures the time taken to extract the same slice from the compressed HDF5 DelayedArray.

```R
system.time(x3 <- as.matrix(A3[891:1400, 401:700, 77]))
```

--------------------------------

### Extract slice from uncompressed HDF5 DelayedArray

Source: https://github.com/bioconductor/hdf5array/wiki/matter-vs-hdf5:-which-format-performs-better-for-storing-big-array-like-datasets?

Measures the time taken to extract the same slice from the uncompressed HDF5 DelayedArray.

```R
system.time(x2 <- as.matrix(A2[891:1400, 401:700, 77]))
```

--------------------------------

### Extract slice from Matter array

Source: https://github.com/bioconductor/hdf5array/wiki/matter-vs-hdf5:-which-format-performs-better-for-storing-big-array-like-datasets?

Measures the time taken to extract a specific slice from the 'matter_arr' object.

```R
system.time(x1 <- a1[891:1400, 401:700, 77])
```

=== COMPLETE CONTENT === This response contains all available snippets from this library. No additional content exists. Do not make further requests.