### Implement Custom BiocParallel Backend Source: https://github.com/bioconductor/biocparallel/blob/devel/_autodocs/07-developer-interface.md This comprehensive example demonstrates the implementation of a custom backend for BiocParallel. It includes defining a new parameter class, implementing backend control methods (start, stop, isup, backend), communication methods (send, recv), and task management. ```r # Define new parameter class .MyCustomParam <- setRefClass("MyCustomParam", contains = "BiocParallelParam", fields = list( custom_backend = "ANY" ) ) # Constructor MyCustomParam <- function( workers = 2, ...) { prototype <- .prototype_update( .BiocParallelParam_prototype, workers = as.integer(workers), ... ) x <- do.call(.MyCustomParam, prototype) validObject(x) x } # Implement backend control setMethod("bpstart", "MyCustomParam", function(x, ...) { # Connect to custom backend x$custom_backend <- connect_backend(x) .bpstart_impl(x) } ) setMethod("bpstop", "MyCustomParam", function(x) { disconnect_backend(x$custom_backend) x$custom_backend <- NULL .bpstop_impl(x) } ) setMethod("bpisup", "MyCustomParam", function(x) { !is.null(x$custom_backend) } ) setMethod("bpbackend", "MyCustomParam", function(x) { x$custom_backend } ) # Implement communication setMethod(".send_to", "MyCustomParam", function(backend, node, value) { backend$send(node, value) TRUE } ) setMethod(".recv_any", "MyCustomParam", function(backend) { tryCatch({ backend$recv() }, error = function(e) { .error_worker_comm(e, "Custom backend receive failed") }) } ) setMethod(".send_all", "MyCustomParam", function(backend, value) { for (i in seq_along(backend$workers)) .send_to(backend, i, value) } ) setMethod(".recv_all", "MyCustomParam", function(backend) { replicate(length(backend$workers), .recv_any(backend), simplify = FALSE) } ) # Implement task manager setMethod(".manager", "MyCustomParam", function(BPPARAM) { manager <- .TaskManager() manager$BPPARAM <- BPPARAM manager$backend <- bpbackend(BPPARAM) manager$capacity <- bpnworkers(BPPARAM) manager } ) # Register as default register(MyCustomParam(workers = 4)) ``` -------------------------------- ### Batchtools Resource Example for SGE Source: https://github.com/bioconductor/biocparallel/blob/devel/_autodocs/04-types.md Provides an example of resource specifications for the SGE scheduler within BatchtoolsParam, including parallel threads, memory in GB, and queue name. ```r list( parallel.threads = 4, memory = 4, # GB queue = "long.q" ) ``` -------------------------------- ### SnowParam Usage Example Source: https://github.com/bioconductor/biocparallel/blob/devel/_autodocs/01-api-reference-param-classes.md Demonstrates how to create a SnowParam object for a SOCK cluster and use it with bplapply for parallel execution. This setup is suitable for cross-platform compatibility. ```r # Socket cluster (cross-platform) param <- SnowParam( workers = 4, type = "SOCK", exportglobals = TRUE ) result <- bplapply(1:100, sqrt, BPPARAM = param) ``` -------------------------------- ### Batchtools Resource Example for SLURM Source: https://github.com/bioconductor/biocparallel/blob/devel/_autodocs/04-types.md Provides an example of resource specifications for the SLURM scheduler within BatchtoolsParam, including walltime, memory, core count, account, and partition. ```r list( walltime = 3600, # seconds memory = 4000, # MB cores = 4, account = "myaccount", partition = "gpu" ) ``` -------------------------------- ### Basic Parallel Evaluation with bplapply Source: https://github.com/bioconductor/biocparallel/blob/devel/_autodocs/00-index.md Demonstrates using bplapply with default settings and explicitly with MulticoreParam. Includes an example of error handling with SerialParam. ```r library(BiocParallel) # Use default parallel backend result <- bplapply(1:100, sqrt) # Or specify explicitly param <- MulticoreParam(workers = 4, progressbar = TRUE) result <- bplapply(1:100, sqrt, BPPARAM = param) # With error handling result <- bplapply( 1:100, function(x) if(x == 50) stop("error") else x, BPPARAM = SerialParam(stop.on.error = FALSE) ) # Check results ok <- bpok(result) good_results <- result[ok] ``` -------------------------------- ### bplapply with Additional Arguments Source: https://github.com/bioconductor/biocparallel/blob/devel/_autodocs/02-api-reference-evaluate.md Shows how to pass additional arguments to the function being applied by bplapply. This example uses a serial execution parameter. ```r # With additional arguments result <- bplapply( c(1, 2, 3), function(x, y) x + y, y = 10, BPPARAM = SerialParam() ) ``` -------------------------------- ### Common Initialization for Backend Source: https://github.com/bioconductor/biocparallel/blob/devel/_autodocs/07-developer-interface.md Performs common initialization steps after a backend starts, including setting up the RNG stream, registering a finalizer, and initializing logging. Called by backend-specific bpstart methods. ```r .bpstart_impl(x) ``` -------------------------------- ### bpmapply with Additional Constant Arguments Source: https://github.com/bioconductor/biocparallel/blob/devel/_autodocs/02-api-reference-evaluate.md Shows how to use bpmapply with additional arguments that are not vectorized, passed via the MoreArgs parameter. This example uses a serial execution strategy. ```r a <- c(1, 2, 3) b <- c(10, 20, 30) result <- bpmapply( function(x, y, mult) (x + y) * mult, a, b, MoreArgs = list(mult = 2), BPPARAM = SerialParam() ) ``` -------------------------------- ### Setup Worker Logging Source: https://github.com/bioconductor/biocparallel/blob/devel/_autodocs/05-configuration.md Configures a SnowParam object to enable logging, specifying a temporary directory for log files, a logging threshold, and a job name. Log files are named based on the job name and task number. ```r # Create temp directory for logs logdir <- tempdir() # Create parameter with logging param <- SnowParam( workers = 4, log = TRUE, logdir = logdir, threshold = "DEBUG", jobname = "analysis" ) # Log files created as: # /.task1.log # /.task2.log # ... ``` -------------------------------- ### Configure Task Chunking for bpvec Source: https://github.com/bioconductor/biocparallel/blob/devel/_autodocs/05-configuration.md Splits elements into a specified number of tasks for efficient processing with bpvec. This example splits 10000 elements into 20 tasks. ```r result <- bpvec( 1:10000, function(x) x * 2, AGGREGATE = c, BPPARAM = MulticoreParam(workers = 4, tasks = 20) ) # 10000 elements split into 20 tasks of ~500 each ``` -------------------------------- ### BiocParallel Utility Functions Source: https://github.com/bioconductor/biocparallel/blob/devel/_autodocs/DOCUMENTATION_SUMMARY.txt Documentation for utility functions that assist in managing BiocParallel parameters, options, and error handling. This includes `bpparam` to get default parameters, `register` and `registered` for parameter management, `bpoptions` for creating option overrides, `bpnworkers` to get the worker count, `bptry` for error trapping, `bpok` to check for error-free elements, and `bpresult` to extract results. Worker and Batchtools helpers, as well as `bpvalidate` for parameter validation, are also covered. ```APIDOC ## BiocParallel Utility Functions This section covers utility functions for managing BiocParallel parameters, options, and errors. ### bpparam() - **Description**: Retrieves the default BiocParallel parameter object. - **Signature**: `bpparam(fallback = TRUE) - **Parameters**: `fallback` (logical) - Whether to use a fallback default if none is registered. - **Return Type**: A BiocParallelParam object. ### register() - **Description**: Registers a BiocParallelParam object for use with `bpparam()`. - **Signature**: `register(param, name = "")) - **Parameters**: `param` (BiocParallelParam) - The parameter object to register. `name` (character) - An optional name for the registered parameter. ### registered() - **Description**: Lists all registered BiocParallelParam objects. - **Signature**: `registered()` - **Return Type**: A list of registered parameter names and objects. ### bpoptions() - **Description**: Creates a BiocParallelParam object with specified option overrides. - **Signature**: `bpoptions(...) - **Parameters**: `...` - Named arguments specifying options to override. - **Return Type**: A BiocParallelParam object with overridden options. ### bpnworkers() - **Description**: Returns the number of workers specified by a BiocParallelParam object. - **Signature**: `bpnworkers(param) - **Parameters**: `param` (BiocParallelParam) - The parameter object. - **Return Type**: An integer representing the number of workers. ### bptry() - **Description**: Traps errors during parallel evaluation, allowing for recovery. - **Signature**: `bptry(expr, silent = FALSE) - **Parameters**: `expr` (expression) - The expression to evaluate. `silent` (logical) - Whether to suppress error messages. - **Return Type**: The result of the expression or an error object. ### bpok() - **Description**: Checks if all elements in a result list are error-free. - **Signature**: `bpok(result) - **Parameters**: `result` (list) - The result from a parallel evaluation. - **Return Type**: A logical vector indicating which elements are OK. ### bpresult() - **Description**: Extracts the successful results from a parallel evaluation, excluding errors. - **Signature**: `bpresult(result) - **Parameters**: `result` (list) - The result from a parallel evaluation. - **Return Type**: A list containing only the successful results. ### bperrorTypes() - **Description**: Returns a list of error types handled by BiocParallel. - **Signature**: `bperrorTypes()` - **Return Type**: A list of error type names. ### bpvalidate() - **Description**: Validates a BiocParallelParam object for correctness and compatibility. - **Signature**: `bpvalidate(param) - **Parameters**: `param` (BiocParallelParam) - The parameter object to validate. - **Return Type**: `TRUE` if valid, otherwise an error is raised. ``` -------------------------------- ### Set and Get Timeout Value Source: https://github.com/bioconductor/biocparallel/blob/devel/_autodocs/05-configuration.md Demonstrates how to set a timeout value in seconds for worker processes and how to access or modify it using accessor functions. ```r # Constructor parameter timeout = 300 # seconds (default is WORKER_TIMEOUT) # Accessor bptimeout(param) # get timeout bptimeout(param) <- 600 # set timeout ``` -------------------------------- ### Parameter Accessors (get) Source: https://github.com/bioconductor/biocparallel/blob/devel/_autodocs/00-index.md Functions to retrieve information about the current parallel parameter settings. ```APIDOC ## Parameter Accessors (get) ### Functions - **bpworkers()**: Returns the number of workers. - **bptasks()**: Returns the number of tasks. - **bpjobname()**: Returns the job name. - **bplog()**: Returns whether logging is enabled. - **bplogdir()**: Returns the log directory. - **bpthreshold()**: Returns the log threshold. - **bpresultdir()**: Returns the result directory. - **bpstopOnError()**: Returns the stop on error flag. - **bptimeout()**: Returns the timeout value. - **bpexportglobals()**: Returns the export globals flag. - **bpexportvariables()**: Returns the export variables flag. - **bpprogressbar()**: Returns the progress bar flag. - **bpRNGseed()**: Returns the RNG seed. - **bpforceGC()**: Returns the force GC flag. - **bpfallback()**: Returns the fallback flag. - **bpschedule()**: Returns the scheduling capability. - **bpbackend()**: Returns the backend object. ``` -------------------------------- ### Catching not_available_error Source: https://github.com/bioconductor/biocparallel/blob/devel/_autodocs/06-errors.md Example of catching a not_available_error when attempting to use an unsupported feature, such as bpiterate with DoparParam. ```r # DoparParam doesn't support bpiterate tryCatch( bpiterate(1:10, sqrt, BPPARAM = DoparParam()), not_available_error = function(e) { message("Feature unavailable: ", conditionMessage(e)) } ) # Output: "'bpiterate' not supported for DoparParam" ``` -------------------------------- ### BiocParallel Developer API Overview Source: https://github.com/bioconductor/biocparallel/blob/devel/_autodocs/DOCUMENTATION_SUMMARY.txt This section details the developer API for BiocParallel, including all generics, methods, and utility functions. It outlines implementation patterns and provides examples and source references for users who need to extend or deeply integrate with the package. ```APIDOC ## Developer API ### Description Provides access to all generics, methods, and utility functions within the BiocParallel package. This API is designed for developers who need to understand implementation patterns, extend functionality, or integrate BiocParallel with custom backends. ### Generics and Methods All exported generics and their corresponding methods are documented, detailing their signatures, parameters, and return types. ### Utility Functions Includes documentation for all utility functions that support the core functionality of BiocParallel, aiding in tasks such as task scheduling, logging, and environment management. ### Implementation Patterns Explains common implementation patterns used within BiocParallel, offering insights into how to structure custom backends or contribute to the package. ### Examples and Source References Provides practical code examples demonstrating realistic usage patterns, error handling, cross-platform considerations, debugging techniques, and custom backend implementations. Source file and line number references are included where applicable for key definitions. ``` -------------------------------- ### BiocParallel Evaluation Functions Source: https://github.com/bioconductor/biocparallel/blob/devel/_autodocs/DOCUMENTATION_SUMMARY.txt Documentation for functions that evaluate tasks in parallel. This includes `bplapply` for applying a function over a list or vector, `bpmapply` for applying over multiple arguments, `bpvec` for applying with aggregation, `bpvectorize` for creating vectorized functions, `bpiterate` for iterator-based evaluation, and `bpaggregate` for split-apply-combine operations. Each function's signature, parameters, return type, and examples are provided. ```APIDOC ## BiocParallel Evaluation Functions This section describes the core functions for evaluating tasks in parallel using BiocParallel. ### bplapply() - **Description**: Applies a function over a list or vector in parallel. - **Signature**: `bplapply(X, FUN, ..., BPPARAM = bpparam()) - **Parameters**: `X` (list or vector) - The input object. `FUN` (function) - The function to apply. `...` - Additional arguments to `FUN`. `BPPARAM` (BiocParallelParam) - The parallelization parameters. - **Return Type**: A list or vector of the same structure as `X`, containing the results of applying `FUN`. ### bpmapply() - **Description**: Applies a function to corresponding elements of multiple lists or vectors in parallel. - **Signature**: `bpmapply(FUN, ..., MoreArgs = list(), SIMPLIFY = TRUE, USE.NAMES = TRUE, BPPARAM = bpparam()) - **Parameters**: `FUN` (function) - The function to apply. `...` - Lists or vectors of arguments. `MoreArgs` (list) - Additional arguments to `FUN`. `SIMPLIFY` (logical) - Whether to simplify the result. `USE.NAMES` (logical) - Whether to use names for the result. `BPPARAM` (BiocParallelParam) - The parallelization parameters. - **Return Type**: A list or vector of the results. ### bpvec() - **Description**: Applies a function to elements of a vector in parallel and aggregates the results. - **Signature**: `bpvec(X, FUN, ..., BPPARAM = bpparam()) - **Parameters**: `X` (vector) - The input vector. `FUN` (function) - The function to apply. `...` - Additional arguments to `FUN`. `BPPARAM` (BiocParallelParam) - The parallelization parameters. - **Return Type**: A vector of aggregated results. ### bpvectorize() - **Description**: Creates a vectorized version of a function that can be used with BiocParallel evaluation functions. - **Signature**: `bpvectorize(FUN, ...) - **Parameters**: `FUN` (function) - The function to vectorize. `...` - Additional arguments. - **Return Type**: A vectorized function. ### bpiterate() - **Description**: Evaluates a function iteratively in parallel, suitable for large datasets or streaming. - **Signature**: `bpiterate(ITER, FUN, ..., BPPARAM = bpparam()) - **Parameters**: `ITER` (iterator) - An iterator providing input. `FUN` (function) - The function to apply. `...` - Additional arguments. `BPPARAM` (BiocParallelParam) - The parallelization parameters. - **Return Type**: An iterator yielding results. ### bpaggregate() - **Description**: Performs a split-apply-combine operation in parallel. - **Signature**: `bpaggregate(x, by, FUN, ..., BPPARAM = bpparam()) - **Parameters**: `x` (vector or list) - The data to aggregate. `by` (vector or list) - The grouping variable. `FUN` (function) - The aggregation function. `...` - Additional arguments. `BPPARAM` (BiocParallelParam) - The parallelization parameters. - **Return Type**: A list or vector of aggregated results. ``` -------------------------------- ### BiocParallel Parameter Classes Source: https://github.com/bioconductor/biocparallel/blob/devel/_autodocs/DOCUMENTATION_SUMMARY.txt Documentation for BiocParallel parameter classes used to configure parallel execution. This includes SerialParam for single-threaded execution, MulticoreParam for fork-based parallelism, SnowParam for SNOW clusters, TransientMulticoreParam for auto-managed cores, DoparParam for foreach integration, and BatchtoolsParam for HPC schedulers. Each class has its constructor, parameters, and examples detailed. ```APIDOC ## BiocParallel Parameter Classes This section details the various parameter classes available in BiocParallel for configuring parallel execution strategies. ### SerialParam - **Description**: Manages single-threaded execution. - **Constructor**: `SerialParam()` ### MulticoreParam - **Description**: Manages fork-based parallel execution on Unix-like systems. - **Constructor**: `MulticoreParam(workers = 1L, ...)` - **Parameters**: `workers` (integer) - Number of parallel processes. ### SnowParam - **Description**: Manages parallel execution using SNOW clusters (SOCK, MPI, FORK). - **Constructor**: `SnowParam(workers = 1L, type = "SOCK", ...)` - **Parameters**: `workers` (integer) - Number of parallel workers. `type` (character) - Type of cluster (e.g., "SOCK", "MPI", "FORK"). ### TransientMulticoreParam - **Description**: Manages auto-managed multicore execution. - **Constructor**: `TransientMulticoreParam(workers = 1L, ...)` - **Parameters**: `workers` (integer) - Number of parallel processes. ### DoparParam - **Description**: Integrates with the `foreach` package for parallel iteration. - **Constructor**: `DoparParam()` ### BatchtoolsParam - **Description**: Manages execution on High-Performance Computing (HPC) schedulers using the `batchtools` package. - **Constructor**: `BatchtoolsParam(cluster = "default", template = "default", ...)` - **Parameters**: `cluster` (character) - Name of the HPC cluster. `template` (character) - Name of the `batchtools` template. ### BiocParallelParam (Virtual Base Class) - **Description**: The virtual base class for all parameter types, providing generic accessor and lifecycle methods. ``` -------------------------------- ### BiocParallelParam Lifecycle Methods Source: https://github.com/bioconductor/biocparallel/blob/devel/_autodocs/01-api-reference-param-classes.md These methods manage the operational state of a BiocParallel backend. `bpstart` initializes the backend, `bpstop` cleans it up, and `bpisup` checks its status. ```r bpstart(x) # Initialize/start the backend bpstop(x) # Stop and clean up the backend bpisup(x) # Check if backend is running (returns logical) ``` -------------------------------- ### TransientMulticoreParam Source: https://github.com/bioconductor/biocparallel/blob/devel/_autodocs/01-api-reference-param-classes.md A MulticoreParam that automatically starts and stops worker processes with each call, simplifying lifecycle management. ```APIDOC ## TransientMulticoreParam MulticoreParam that automatically starts and stops with each call. ### Constructor ```r TransientMulticoreParam(param) ``` ### Parameters | Parameter | Type | Description | |-----------|------|-------------| | `param` | BiocParallelParam | Parameter object to convert | ### Usage Example ```r param <- TransientMulticoreParam( MulticoreParam(workers = 4) ) result <- bplapply(1:100, sqrt, BPPARAM = param) ``` ### Notes - Automatically calls bpstart() on creation - Automatically calls bpstop() when finished - Cleans up worker processes automatically - No manual lifecycle management needed ``` -------------------------------- ### Catching and handling remote_error Source: https://github.com/bioconductor/biocparallel/blob/devel/_autodocs/06-errors.md Example of catching a remote_error and accessing its traceback if available. This is useful for debugging worker failures. ```r tryCatch( result <- bplapply( 1:10, function(x) if(x == 5) stop("Worker error") else x, BPPARAM = SerialParam() ), remote_error = function(e) { message("Remote error: ", conditionMessage(e)) if (!is.null(attr(e, "traceback"))) { cat("Worker traceback:\n") cat(attr(e, "traceback"), sep = "\n") } } ) ``` -------------------------------- ### Get Number of Workers Source: https://github.com/bioconductor/biocparallel/blob/devel/_autodocs/03-api-reference-utilities.md Retrieve the number of workers configured for a given BiocParallelParam object. Useful for understanding the available parallelism. ```r param <- MulticoreParam(workers = 4) n <- bpnworkers(param) # returns 4 param <- SerialParam() n <- bpnworkers(param) # returns 1 param <- DoparParam() n <- bpnworkers(param) # returns registered backend workers ``` -------------------------------- ### Basic bplapply Usage Source: https://github.com/bioconductor/biocparallel/blob/devel/_autodocs/02-api-reference-evaluate.md Demonstrates the fundamental use of bplapply to apply the sqrt function to a sequence of numbers in parallel using two worker cores. ```r library(BiocParallel) # Basic usage result <- bplapply( 1:10, sqrt, BPPARAM = MulticoreParam(workers = 2) ) ``` -------------------------------- ### Implement .manager for MyParam Source: https://github.com/bioconductor/biocparallel/blob/devel/_autodocs/07-developer-interface.md Provides an implementation for the .manager generic for a custom 'MyParam' class. It initializes a TaskManager with backend and capacity details. ```r setMethod(".manager", "MyParam", function(BPPARAM) { manager <- .TaskManager() manager$BPPARAM <- BPPARAM manager$backend <- bpbackend(BPPARAM) manager$capacity <- length(manager$backend) manager } ) ``` -------------------------------- ### Configure Batchtools Registry Arguments Source: https://github.com/bioconductor/biocparallel/blob/devel/_autodocs/05-configuration.md Sets up registry arguments for batchtools, specifying the file directory, a seed for reproducibility, and disabling the default registry. ```r registryargs <- batchtoolsRegistryargs( file.dir = "/scratch/registry", seed = 12345, make.default = FALSE ) param <- BatchtoolsParam( cluster = "slurm", registryargs = registryargs ) ``` -------------------------------- ### Get Default Multicore Workers Source: https://github.com/bioconductor/biocparallel/blob/devel/_autodocs/03-api-reference-utilities.md Retrieves the default number of multicore workers available on the system. Useful for initializing parallel processing parameters. ```r n <- multicoreWorkers() # e.g., 8 on 8-core system param <- MulticoreParam(workers = multicoreWorkers()) ``` -------------------------------- ### Options and Configuration Functions Source: https://github.com/bioconductor/biocparallel/blob/devel/_autodocs/00-index.md Functions for setting and retrieving options and configuration for parallel processing. ```APIDOC ## Options and Configuration ### Functions - **bpoptions()**: Creates option overrides for parallel execution. - **bpnworkers()**: A helper function to get the worker count. - **batchtoolsCluster()**: Determines the type of batchtools cluster. - **batchtoolsTemplate()**: Retrieves the batchtools job template. - **batchtoolsRegistryargs()**: Builds registry arguments for batchtools. ``` -------------------------------- ### Error Handling with bptry Source: https://github.com/bioconductor/biocparallel/blob/devel/_autodocs/00-index.md Demonstrates graceful error trapping using bptry with bplapply and MulticoreParam configured to not stop on errors. Includes checking results and identifying failures. ```r # Trap errors gracefully result <- bptry( bplapply( 1:1000, myfunction, BPPARAM = MulticoreParam(stop.on.error = FALSE) ) ) if (inherits(result, "bplist_error")) { all_results <- bpresult(result) failed <- which(!bpok(all_results)) cat(sprintf("Failed: %d of %d\n", length(failed), length(all_results))) } ``` -------------------------------- ### Get Available Error Type Names with bperrorTypes Source: https://github.com/bioconductor/biocparallel/blob/devel/_autodocs/06-errors.md Retrieve a list of all recognized error type names that can be used with the 'type' argument in bpok. ```r types <- bperrorTypes() # Returns: "bperror", "remote_error", # "unevaluated_error", # "not_available_error", "worker_comm_error" ``` -------------------------------- ### Get Registered BiocParallel Parameters Source: https://github.com/bioconductor/biocparallel/blob/devel/_autodocs/03-api-reference-utilities.md Retrieve all registered BiocParallelParam objects or filter by a specific class. Useful for inspecting available parallel backends. ```r # Get all registered params all_params <- registered() # Get specific param serial_param <- registered("SerialParam") # List available classes names(registered()) ``` -------------------------------- ### Configure Task Scheduling (Default) Source: https://github.com/bioconductor/biocparallel/blob/devel/_autodocs/05-configuration.md Sets each worker to handle one task, which is the default behavior. This configuration is suitable when the number of tasks equals the number of workers. ```r # Fine-grained control via tasks param <- MulticoreParam( workers = 4, tasks = 0 # default: tasks = workers ) # Each worker gets one task (4 tasks total for 4 workers) # Larger tasks reduce overhead, less dynamic load balancing ``` -------------------------------- ### Set Custom Default MulticoreParam Source: https://github.com/bioconductor/biocparallel/blob/devel/_autodocs/05-configuration.md Override the default MulticoreParam used by bpparam(). This example sets custom workers, progress bar visibility, and timeout. ```r library(BiocParallel) # Set custom default parameter options( MulticoreParam = MulticoreParam( workers = 8, progressbar = TRUE, timeout = 7200 ) ) # Now bpparam() uses custom setting param <- bpparam() ``` -------------------------------- ### Implement .recv_any for MyBackend Source: https://github.com/bioconductor/biocparallel/blob/devel/_autodocs/07-developer-interface.md Provides an implementation for the .recv_any generic for a custom 'MyBackend' class. It waits for and returns a result from any worker. ```r setMethod(".recv_any", "MyBackend", function(backend) { # Wait for result from any worker # Return list(node = worker_id, value = result) list(node = 1L, value = list(...)) } ) ``` -------------------------------- ### Set Environment Variables for SnowParam/Networking Source: https://github.com/bioconductor/biocparallel/blob/devel/_autodocs/05-configuration.md Configure SnowParam and networking for BiocParallel by setting environment variables. MASTER specifies the manager hostname, and R_PARALLEL_PORT or PORT sets the manager port. ```bash # Set fixed port for manager export R_PARALLEL_PORT=11234 export MASTER=compute-node-1 # Run R with BiocParallel Rscript parallel_analysis.R ``` -------------------------------- ### Get Default Batchtools Workers Source: https://github.com/bioconductor/biocparallel/blob/devel/_autodocs/03-api-reference-utilities.md Retrieves the default number of workers for a specified batchtools cluster type. Useful for setting up batchtools parallel execution. ```r n <- batchtoolsWorkers("socket") n <- batchtoolsWorkers("slurm") ``` -------------------------------- ### Apply Function to Paired Elements with bpmapply Source: https://github.com/bioconductor/biocparallel/blob/devel/_autodocs/02-api-reference-evaluate.md Demonstrates the basic usage of bpmapply to apply a function to corresponding elements of two vectors. It specifies the number of workers for parallel execution. ```r a <- c(1, 2, 3) b <- c(10, 20, 30) result <- bpmapply( function(x, y) x + y, a, b, BPPARAM = MulticoreParam(workers = 2) ) ``` -------------------------------- ### Define Minimal Custom Backend in R Source: https://github.com/bioconductor/biocparallel/blob/devel/_autodocs/07-developer-interface.md This R code demonstrates how to define a minimal custom backend for BiocParallel by setting up a parameter class, constructor, and implementing essential backend control and communication methods. ```r # 1. Define parameter class setClass("MyParam", contains = "BiocParallelParam") # 2. Constructor MyParam <- function(workers = 2, ...) { prototype <- .prototype_update( .BiocParallelParam_prototype, workers = as.integer(workers), ... ) x <- do.call(.MyParam, prototype) validObject(x) x } # 3. Implement backend control setMethod("bpstart", "MyParam", function(x, ...) { x$backend <- list() # Initialize backend .bpstart_impl(x) }) setMethod("bpstop", "MyParam", function(x) { x$backend <- NULL # Clean up .bpstop_impl(x) }) setMethod("bpisup", "MyParam", function(x) { !is.null(bpbackend(x)) }) # 4. Implement communication setMethod(".send_to", "MyParam", function(backend, node, value) { # Custom send logic TRUE } ) setMethod(".recv_any", "MyParam", function(backend) { # Custom receive logic list(node = 1L, value = value) } ) # 5. Implement task manager setMethod(".manager", "MyParam", function(BPPARAM) { manager <- .TaskManager() manager$BPPARAM <- BPPARAM manager$backend <- bpbackend(BPPARAM) manager$capacity <- bpnworkers(BPPARAM) manager } ) ``` -------------------------------- ### Aggregation Function Signature Source: https://github.com/bioconductor/biocparallel/blob/devel/_autodocs/04-types.md Signature for an aggregation function that combines results from multiple tasks into a single result. Examples include `c`, `sum`, `rbind`, `list`. ```r function(...) { # combine and return single result } ``` -------------------------------- ### Get Default Snow Workers Source: https://github.com/bioconductor/biocparallel/blob/devel/_autodocs/05-configuration.md Determine the default number of workers for SnowParam based on the connection type. SOCK/FORK use multicore defaults, while MPI is system-limited. ```r snowWorkers(type) ``` -------------------------------- ### Customizing BiocParallelParam Prototype Source: https://github.com/bioconductor/biocparallel/blob/devel/_autodocs/07-developer-interface.md Shows how to extend the base BiocParallelParam prototype with custom fields and values. ```R .MyParam_prototype <- c( list(custom_field = "value"), .BiocParallelParam_prototype ) ``` -------------------------------- ### Create and Use a Vectorized Function with bpvectorize Source: https://github.com/bioconductor/biocparallel/blob/devel/_autodocs/02-api-reference-evaluate.md Demonstrates how to create a reusable parallel vectorized function using bpvectorize and then apply it to data. This is useful for creating custom parallel operations. ```r # Create vectorized function vfun <- bpvectorize( function(x, y) x + y, BPPARAM = MulticoreParam(workers = 2) ) # Use vectorized function result <- vfun(1:10, 11:20) ``` -------------------------------- ### Configure No Timeout Source: https://github.com/bioconductor/biocparallel/blob/devel/_autodocs/05-configuration.md Shows how to configure parameters to ignore timeouts, using NA_integer_ for an infinite wait. This is particularly relevant for MulticoreParam and BatchtoolsParam. ```r # No timeout (wait forever) param <- MulticoreParam(timeout = NA_integer_) # Infinite wait (batchtools only) param <- BatchtoolsParam(timeout = NA_integer_) ``` -------------------------------- ### Get Batchtools Job Template Source: https://github.com/bioconductor/biocparallel/blob/devel/_autodocs/03-api-reference-utilities.md Retrieves the path to the default job template file for a given batchtools cluster type. Templates are not needed for interactive or socket-based clusters. ```r # Get template file path tmpl <- batchtoolsTemplate("slurm") # returns path to slurm-simple.tmpl tmpl <- batchtoolsTemplate("socket") # returns NA (no template needed) ``` -------------------------------- ### Configure Task Scheduling (Many Small Tasks) Source: https://github.com/bioconductor/biocparallel/blob/devel/_autodocs/05-configuration.md Creates a large number of small tasks to enable better load balancing, though this increases task scheduling overhead. Suitable for dynamic load distribution. ```r param <- MulticoreParam( workers = 4, tasks = 100 # create 100 small tasks ) # Many small tasks enable better load balancing # But increases task scheduling overhead ``` -------------------------------- ### Get Default Multicore Workers Source: https://github.com/bioconductor/biocparallel/blob/devel/_autodocs/05-configuration.md Determine the default number of workers for MulticoreParam. On Unix/macOS, this is the minimum of system defaults and connection limits. On Windows, it's always 1. ```r multicoreWorkers() # calls .defaultWorkers() ``` -------------------------------- ### Get Default BiocParallelParam Source: https://github.com/bioconductor/biocparallel/blob/devel/_autodocs/03-api-reference-utilities.md Retrieve the current default BiocParallelParam for the session. This function can fetch the general default or a specific registered parameter class. Defaults can be influenced by R options. ```r # Get current default param <- bpparam() # Get specific registered param param <- bpparam("SerialParam") # Set as option to change default options(MulticoreParam = MulticoreParam(workers = 8)) param <- bpparam() # returns modified MulticoreParam ``` -------------------------------- ### NULLRegistry Placeholder (Batchtools) Source: https://github.com/bioconductor/biocparallel/blob/devel/_autodocs/04-types.md Represents an uninitialized batchtools registry. Used internally. ```r structure( list(), class = c("NULLRegistry", "Registry") ) ``` -------------------------------- ### Auto-detect Best Backend for Parallel Processing Source: https://github.com/bioconductor/biocparallel/blob/devel/_autodocs/00-index.md Automatically detects the best available backend for parallel execution on a single machine. Use this for general-purpose parallelization when specific backend configuration is not required. ```r # Auto-detect best backend param <- bpparam() result <- bplapply(data, function(x) process(x), BPPARAM = param) ``` -------------------------------- ### Get Default SNOW Cluster Workers Source: https://github.com/bioconductor/biocparallel/blob/devel/_autodocs/03-api-reference-utilities.md Retrieves the default number of workers for different SNOW cluster types (SOCK, MPI, FORK). Use this to configure SNOW-based parallel processing. ```r n_sock <- snowWorkers("SOCK") n_fork <- snowWorkers("FORK") n_mpi <- snowWorkers("MPI") ``` -------------------------------- ### Get BiocParallel Error Types Source: https://github.com/bioconductor/biocparallel/blob/devel/_autodocs/03-api-reference-utilities.md Retrieve a character vector of all recognized BiocParallel error class names using bperrorTypes. This is useful for specifying which error types to check for when using functions like bpok. ```r all_error_types <- bperrorTypes() # Check if result contains specific error type result <- bplapply(1:10, sqrt, BPPARAM = SerialParam()) ok <- bpok(result, type = "remote_error") ``` -------------------------------- ### Catching bperror Source: https://github.com/bioconductor/biocparallel/blob/devel/_autodocs/06-errors.md Demonstrates how to use tryCatch to intercept any BiocParallel-specific errors. ```r tryCatch( bplapply(...), bperror = function(e) { message("BiocParallel error: ", conditionMessage(e)) } ) ``` -------------------------------- ### Build Batchtools Registry Arguments Source: https://github.com/bioconductor/biocparallel/blob/devel/_autodocs/03-api-reference-utilities.md Constructs arguments for `batchtools::makeRegistry`, allowing customization of registry settings. Automatically configures temporary directories and nullifies default conf.file/seed. ```r # Get default registry args args <- batchtoolsRegistryargs() # Override some defaults args <- batchtoolsRegistryargs( file.dir = "/tmp/my_registry", seed = 123 ) # Use with BatchtoolsParam param <- BatchtoolsParam( registryargs = args ) ``` -------------------------------- ### Get Default Batchtools Workers Source: https://github.com/bioconductor/biocparallel/blob/devel/_autodocs/05-configuration.md Determine the default number of workers for BatchtoolsParam, which depends on the specified cluster type. Manual specification is required for cluster schedulers like SGE, SLURM, LSF, etc. ```r batchtoolsWorkers(cluster) ``` -------------------------------- ### Load Packages in Workers using bpoptions Source: https://github.com/bioconductor/biocparallel/blob/devel/_autodocs/05-configuration.md Specify packages to be loaded in worker processes using bpoptions. This ensures that functions from these packages are available for use within the parallel computations. ```r # Load packages in workers via bpoptions opts <- bpoptions( packages = c("dplyr", "ggplot2") ) result <- bplapply( 1:10, function(x) { library(dplyr) # already loaded if exported x %>% as_tibble() }, BPPARAM = param, BPOPTIONS = opts ) ``` -------------------------------- ### Usage Examples for BiocParallel Errors Source: https://github.com/bioconductor/biocparallel/blob/devel/_autodocs/04-types.md This R code demonstrates common usage patterns for handling BiocParallel errors, including extracting results, checking for successful elements, accessing original messages, and retrieving tracebacks for remote errors. ```r # Extract results from compound error result <- bpresult(error_obj) # Check which elements are OK ok <- bpok(result) # Access original error message msg <- conditionMessage(error_obj) # For remote_error, access traceback if (inherits(error_obj, "remote_error")) { tb <- attr(error_obj, "traceback") } ``` -------------------------------- ### Create DoparParam Source: https://github.com/bioconductor/biocparallel/blob/devel/_autodocs/01-api-reference-param-classes.md Initialize a DoparParam for use with foreach and doParallel backends. Ensure a parallel backend is registered before use. This parameter inherits worker count from the registered backend. ```r library(foreach) library(doParallel) # Register parallel backend registerDoParallel(4) param <- DoparParam(stop.on.error = TRUE) result <- bplapply(1:100, sqrt, BPPARAM = param) ``` -------------------------------- ### Debugging with SerialParam and Logging Source: https://github.com/bioconductor/biocparallel/blob/devel/_autodocs/00-index.md Configures SerialParam for debugging with logging enabled. Shows how to set log directory and threshold, execute tasks, and view logs to diagnose issues before switching to parallel execution. ```r # Start with serial execution param <- SerialParam( log = TRUE, logdir = tempdir(), threshold = "DEBUG" ) result <- bplapply(1:10, myfunction, BPPARAM = param) # View logs to diagnose issues logdir <- bplogdir(param) logs <- list.files(logdir, full.names = TRUE) lapply(logs, readLines) # Once working, switch to parallel param <- MulticoreParam(workers = 4) result <- bplapply(1:100, myfunction, BPPARAM = param) ``` -------------------------------- ### Create BiocParallel Options Overrides Source: https://github.com/bioconductor/biocparallel/blob/devel/_autodocs/03-api-reference-utilities.md Generate a list of options to temporarily override settings in a BiocParallelParam object. Use this when specific computations require different parameters than the default backend. ```r # Create options to override BPPARAM opts <- bpoptions( workers = 8, progressbar = TRUE, timeout = 3600 ) # Use with evaluation function param <- bpparam() result <- bplapply( 1:100, sqrt, BPPARAM = param, BPOPTIONS = opts ) # Only specified options override; others use BPPARAM defaults ``` -------------------------------- ### Create MulticoreParam Source: https://github.com/bioconductor/biocparallel/blob/devel/_autodocs/01-api-reference-param-classes.md Instantiate MulticoreParam for parallel execution using Unix fork(). Suitable for CPU-bound tasks on Unix-like systems. ```r # Create multicore param param <- MulticoreParam( workers = 4, progressbar = TRUE ) result <- bplapply(1:100, sqrt, BPPARAM = param) ``` -------------------------------- ### Configure Batchtools with Custom Template Source: https://github.com/bioconductor/biocparallel/blob/devel/_autodocs/05-configuration.md Sets a custom SLURM job submission template for batchtools. System templates are also available. ```r param <- BatchtoolsParam( cluster = "slurm", template = "/path/to/custom-slurm-template.tmpl" ) # System templates available at: # system.file("templates", "slurm-simple.tmpl", package="batchtools") # system.file("templates", "sge-simple.tmpl", package="batchtools") # etc. ``` -------------------------------- ### batchtoolsRegistryargs Source: https://github.com/bioconductor/biocparallel/blob/devel/_autodocs/03-api-reference-utilities.md Builds arguments suitable for batchtools::makeRegistry, allowing overrides of default settings. ```APIDOC ## batchtoolsRegistryargs ### Description Build arguments for batchtools::makeRegistry. ### Signature ```r batchtoolsRegistryargs(...) ``` ### Parameters | Parameter | Type | Description | |-----------|------|-------------| | `...` | any | Arguments to override defaults | ### Return Type list. Arguments suitable for batchtools::makeRegistry(...). ### Code Example ```r # Get default registry args args <- batchtoolsRegistryargs() # Override some defaults args <- batchtoolsRegistryargs( file.dir = "/tmp/my_registry", seed = 123 ) # Use with BatchtoolsParam param <- BatchtoolsParam( registryargs = args ) ``` ### Notes - Automatically sets temporary file.dir and work directory - Makes registry non-default by setting make.default = FALSE - Nullifies conf.file and seed by default (can override) ``` -------------------------------- ### SnowParam Constructor Source: https://github.com/bioconductor/biocparallel/blob/devel/_autodocs/01-api-reference-param-classes.md The SnowParam constructor allows users to configure a Snow cluster backend for parallel execution. It supports various parameters to control worker configuration, error handling, logging, and more. ```APIDOC ## SnowParam Constructor ### Description Initializes a SnowParam object to configure a Snow cluster backend for parallel processing. This backend supports multiple cluster types including SOCK, MPI, and FORK. ### Parameters #### Parameters - **workers** (integer or character) - Default: `snowWorkers(type)` - Number of workers or host names. - **type** (character) - Default: "SOCK" - Cluster type: "SOCK" (sockets), "MPI", or "FORK". - **tasks** (integer) - Default: 0L - Number of tasks (0 = workers). - **stop.on.error** (logical) - Default: TRUE - Stop on first error. - **progressbar** (logical) - Default: FALSE - Show progress bar. - **RNGseed** (integer or NULL) - Default: NULL - RNG seed. - **timeout** (integer) - Default: WORKER_TIMEOUT - Timeout in seconds. - **exportglobals** (logical) - Default: TRUE - Export global variables. - **exportvariables** (logical) - Default: TRUE - Export variables. - **log** (logical) - Default: FALSE - Enable logging. - **threshold** (character) - Default: "INFO" - Log threshold. - **logdir** (character) - Default: NA_character_ - Log directory. - **resultdir** (character) - Default: NA_character_ - Result directory. - **jobname** (character) - Default: "BPJOB" - Job name. - **force.GC** (logical) - Default: FALSE - Force Garbage Collection. - **fallback** (logical) - Default: TRUE - Fallback to serial execution. - **manager.hostname** (character) - Default: NA_character_ - Manager host name. - **manager.port** (integer) - Default: NA_integer_ - Manager port. ### Cluster Types - **SOCK**: Socket connections (works on all platforms, slightly slower). - **MPI**: Message Passing Interface (requires Rmpi package, HPC systems). - **FORK**: Unix fork (like MulticoreParam, Unix only). ### Usage Example ```r # Socket cluster (cross-platform) param <- SnowParam( workers = 4, type = "SOCK", exportglobals = TRUE ) result <- bplapply(1:100, sqrt, BPPARAM = param) ``` ### Notes - Most flexible cluster type. - Supports character vector of host names for remote execution. - Socket connections work across platforms. - Requires explicit export of variables and packages. ``` -------------------------------- ### Enable Logging for MulticoreParam Source: https://github.com/bioconductor/biocparallel/blob/devel/_autodocs/06-errors.md Configure MulticoreParam to log detailed debugging information. Specify the log directory and threshold for messages. View logs after execution. ```r param <- MulticoreParam( workers = 4, log = TRUE, logdir = tempdir(), threshold = "DEBUG", jobname = "debug" ) result <- bplapply(1:10, function(x) { message("Processing", x) x * 2 }, BPPARAM = param) # View logs logdir <- bplogdir(param) logs <- list.files(logdir, pattern = "debug", full.names = TRUE) lapply(logs, readLines) ``` -------------------------------- ### SnowParam for Windows/Cross-Platform Source: https://github.com/bioconductor/biocparallel/blob/devel/_autodocs/00-index.md Configure SnowParam for parallel processing on Windows or for cross-platform compatibility using SOCK type. ```r param <- SnowParam(workers = 4, type = "SOCK") ``` -------------------------------- ### Abstract bplapply Implementation Source: https://github.com/bioconductor/biocparallel/blob/devel/_autodocs/07-developer-interface.md The core logic for the bplapply function, designed to work with any BiocParallelParam. It handles splitting work, distributing tasks to workers, collecting results, and managing errors. ```r .bplapply_impl( X, FUN, ..., BPREDO = list(), BPPARAM = bpparam(), BPOPTIONS = bpoptions() ) ``` -------------------------------- ### SnowParam Constructor Source: https://github.com/bioconductor/biocparallel/blob/devel/_autodocs/01-api-reference-param-classes.md Defines the SnowParam object for setting up a Snow cluster. Specifies the number of workers, cluster type, task distribution, error handling, and other configuration options. ```r SnowParam( workers = snowWorkers(type), type = c("SOCK", "MPI", "FORK"), tasks = 0L, stop.on.error = TRUE, progressbar = FALSE, RNGseed = NULL, timeout = WORKER_TIMEOUT, exportglobals = TRUE, exportvariables = TRUE, log = FALSE, threshold = "INFO", logdir = NA_character_, resultdir = NA_character_, jobname = "BPJOB", force.GC = FALSE, fallback = TRUE, manager.hostname = NA_character_, manager.port = NA_integer_, ... ) ``` -------------------------------- ### Export BiocParallel Methods Source: https://github.com/bioconductor/biocparallel/blob/devel/_autodocs/07-developer-interface.md Use `exportMethods()` in the NAMESPACE file to export core BiocParallel generics and their methods. This ensures that functions like `bpstart`, `bpstop`, and `bpworkers` are accessible. ```r exportMethods( bpstart, bpstop, bpisup, bpbackend, bpworkers, "bpworkers<-", # ... other generics ) ``` -------------------------------- ### Implement .send_to for MyBackend Source: https://github.com/bioconductor/biocparallel/blob/devel/_autodocs/07-developer-interface.md Provides an implementation for the .send_to generic for a custom 'MyBackend' class. It sends the value to the specified worker node. ```r setMethod(".send_to", "MyBackend", function(backend, node, value) { # Send value to worker node # Return TRUE on success TRUE } ) ``` -------------------------------- ### Parallel Iteration with a Custom Iterator Function Source: https://github.com/bioconductor/biocparallel/blob/devel/_autodocs/02-api-reference-evaluate.md Shows how to use bpiterate with a custom iterator function to process elements in parallel. The iterator should return NULL when exhausted. This is memory-efficient for large sequences. ```r # Define iterator function iter <- function() { i <- 0 function() { if (i < 10) { i <<- i + 1 list(i, i * 2) } else { NULL } } } result <- bpiterate( iter(), function(x) sum(unlist(x)), BPPARAM = MulticoreParam(workers = 2) ) ```