### Install Development Version of mudata from GitHub Source: https://github.com/scverse/mudata/blob/main/docs/source/install.rst Installs a pre-release or development version of mudata directly from its GitHub repository. This is useful for users who want to test the latest features or contribute to the project. Requires git to be installed. ```bash pip install git+https://github.com/scverse/mudata ``` -------------------------------- ### Install MuData Source: https://github.com/scverse/mudata/blob/main/docs/source/notebooks/nuances.ipynb Installs the mudata library using pip. This is the first step to using MuData and its functionalities. ```python %pip install mudata ``` -------------------------------- ### Development-Mode Install mudata using flit Source: https://github.com/scverse/mudata/blob/main/docs/source/install.rst Performs a development-mode installation of mudata using flit. This method symlinks the package files, allowing changes to be reflected immediately without reinstallation. Requires flit to be installed globally. ```bash flit install -s ``` -------------------------------- ### Install Stable mudata with pip Source: https://github.com/scverse/mudata/blob/main/docs/source/install.rst Installs the latest stable version of the mudata package from PyPI using pip. This is the recommended method for most users. No specific dependencies are required beyond pip itself. ```bash pip install mudata ``` -------------------------------- ### Install MuData and Import Libraries Source: https://github.com/scverse/mudata/blob/main/docs/source/notebooks/axes.ipynb Installs the MuData library and imports necessary components like MuData and AnnData, along with utility libraries such as numpy and pandas for data manipulation and random number generation. ```python %pip install mudata import mudata as md from mudata import MuData, AnnData import numpy as np import pandas as pd np.random.seed(1) ``` -------------------------------- ### Inspect MuData HDF5 File Structure (Shell) Source: https://github.com/scverse/mudata/blob/main/DESIGN.md Provides a command-line example to inspect the internal structure of a MuData HDF5 file. This allows users to verify how AnnData objects and their components (like 'X', 'obs', 'var') are organized within the HDF5 file, confirming the storage mechanism. ```sh h5ls pbmc_10k.h5mu/mod/rna # X\t\tGroup # obs\t\tGroup # var\t\tGroup # ... ``` -------------------------------- ### Pulling and dropping unique/non-unique observation columns in MuData Source: https://github.com/scverse/mudata/blob/main/docs/source/notebooks/annotations_management.ipynb Shows how to pull observation columns while optionally dropping them from the source. This example demonstrates the effect of `nonunique=False`, `unique=False`, and `drop=True`. ```python mdata.pull_obs(nonunique=False, unique=False, drop=True) print(mdata.obs.dtypes) ``` ```text mod1:qc boolean mod2:qc boolean mod3:qc boolean dtype: object ``` -------------------------------- ### Create MuData with Concatenated Variables (Default axis) Source: https://github.com/scverse/mudata/blob/main/docs/source/notebooks/axes.ipynb Illustrates the default MuData creation when 'axis' is not specified. In this case, variables are concatenated, and observations are combined. This example uses two AnnData objects with distinct observation names. ```python n1, n2, d = 100, 500, 1000 ad1 = AnnData(np.random.normal(size=(n1,d))) ad2 = AnnData(np.random.normal(size=(n2,d))) ad1.obs_names = [f"dat1-cell{i+1}" for i in range(n1)] ad2.obs_names = [f"dat2-cell{i+1}" for i in range(n2)] mdata = MuData({"dat1": ad1, "dat2": ad2}) print(mdata) ``` -------------------------------- ### Push Non-Common Observation Annotations in MuData Source: https://github.com/scverse/mudata/blob/main/docs/source/notebooks/annotations_management.ipynb This example shows how to use the `push_obs()` method with `common=False` to push only non-prefixed observation columns. The code first calls `mdata.push_obs(common=False)` and then iterates through the modalities to print their observation dtypes, demonstrating that no annotations were pushed in this case because 'true' is a common column. ```python mdata.push_obs(common=False) for m in mdata.mod.keys(): print(mdata[m].obs.dtypes) ``` -------------------------------- ### Create MuData with Shared Observations (axis=0) Source: https://github.com/scverse/mudata/blob/main/docs/source/notebooks/axes.ipynb Demonstrates the default behavior of MuData where observations are shared (axis=0) and variables are concatenated. This example prepares two AnnData objects and combines them into a MuData object. ```python n, d1, d2 = 100, 1000, 1500 ax = AnnData(np.random.normal(size=(n,d1))) ay = AnnData(np.random.normal(size=(n,d2))) # Same as: mdata = MuData({"x": ax, "y": ay}) mdata = MuData({"x": ax, "y": ay}, axis=0) print(mdata) ``` -------------------------------- ### Pull Variables Without Joining Non-Unique (Python) Source: https://github.com/scverse/mudata/blob/main/docs/source/notebooks/annotations_management.ipynb This example demonstrates pulling variables named 'arange' while ensuring that non-unique annotations are not joined across modalities. `join_nonunique=False` results in modality-specific prefixes for non-unique columns. The output shows the prefixed data types. ```python mdata.pull_var(columns=["arange"], join_nonunique=False) mdata.var.dtypes ``` -------------------------------- ### Pull Variables and Drop from Source (Python) Source: https://github.com/scverse/mudata/blob/main/docs/source/notebooks/annotations_management.ipynb This example demonstrates pulling variables while simultaneously dropping them from their original source modalities. By setting `nonunique=False`, `unique=False`, and `drop=True`, the 'highly_variable' is moved to the global annotation and removed from individual modalities. The output shows the data type of the globally moved variable. ```python mdata.pull_var(nonunique=False, unique=False, drop=True) mdata.var.dtypes ``` -------------------------------- ### Initialize MuData Object Source: https://github.com/scverse/mudata/blob/main/docs/source/io/mudata.rst Demonstrates the basic initialization of a MuData object. This is the foundational step for creating any multimodal dataset within the MuData framework. ```python from mudata import MuData ``` -------------------------------- ### Configure MuData Global Options Source: https://context7.com/scverse/mudata/llms.txt Explains how to set global configuration options for MuData using `mudata.set_options()`. This includes customizing display styles (text/html), controlling HTML display expansion levels, and managing annotation update behavior. Options can be set globally or temporarily using a context manager. ```python import mudata as md from mudata import set_options # Assume mdata is a pre-defined MuData object # Set options globally md.set_options(display_style="html") # Configure HTML display expansion (3-bit flag) # Bit 2 (100): MuData slots expanded # Bit 1 (010): modalities expanded # Bit 0 (001): AnnData slots expanded md.set_options(display_html_expand=0b010) # Configure annotation behavior md.set_options(pull_on_update=False) # Use options as context manager with md.set_options(display_style="html", display_html_expand=0b111): display(mdata) # Access current option values from mudata._core.config import OPTIONS print(OPTIONS["display_style"]) print(OPTIONS["pull_on_update"]) ``` -------------------------------- ### Work with MuData Backed Objects Source: https://context7.com/scverse/mudata/llms.txt Illustrates how to use MuData's backed mode for handling large datasets that may not fit into memory. This involves writing MuData objects to disk and opening them in read-only ('r') or read-write ('r+') backed modes. It also covers checking if an object is backed, modifying the backing file, and loading backed data into memory. ```python import mudata as md # Assume mdata is a pre-defined MuData object # Write data to file mdata.write("large_dataset.h5mu") # Open in backed mode (read-only) mdata_backed = md.read_h5mu("large_dataset.h5mu", backed="r") print(mdata_backed.isbacked) print(mdata_backed.filename) # Open in backed mode (read-write) mdata_rw = md.read_h5mu("large_dataset.h5mu", backed="r+") # Individual modalities are also backed print(mdata_backed["rna"].isbacked) # Load into memory by setting filename to None mdata_backed.filename = None print(mdata_backed.isbacked) # Change backing file mdata_rw.filename = "new_location.h5mu" # Write changes to backed file mdata_rw.write() ``` -------------------------------- ### Get MuData Object Shape Source: https://github.com/scverse/mudata/blob/main/docs/source/io/mudata.rst Shows how to retrieve the overall shape of a MuData object, represented by the total number of observations and variables across all modalities. It also provides access to the individual counts for observations (`.n_obs`) and variables (`.n_vars`). ```python mdata.shape # => (9573, 132465) mdata.n_obs # => 9573 mdata.n_vars # => 132465 ``` ```python [ad.shape for ad in mdata.mod.values()] # => [(9500, 10100), (9573, 122364)] ``` -------------------------------- ### Creating a sample MuData object for annotation demonstration Source: https://github.com/scverse/mudata/blob/main/docs/source/notebooks/annotations_management.ipynb Defines a function to generate a MuData object with multiple modalities and initial global annotations on the `.var` dataframe, which can then be pushed to individual modalities. ```python import numpy as np import pandas as pd from anndata import AnnData from mudata import MuData def make_mdata(): N = 100 D1, D2, D3 = 10, 20, 30 D = D1 + D2 + D3 mod1 = AnnData(np.arange(0, 100, 0.1).reshape(-1, D1)) mod1.obs_names = [f"obs{i}" for i in range(mod1.n_obs)] mod1.var_names = [f"var{i}" for i in range(D1)] mod2 = AnnData(np.arange(3101, 5101, 1).reshape(-1, D2)) mod2.obs_names = mod1.obs_names.copy() mod2.var_names = [f"var{i}" for i in range(D1, D1 + D2)] mod3 = AnnData(np.arange(5101, 8101, 1).reshape(-1, D3)) mod3.obs_names = mod1.obs_names.copy() mod3.var_names = [f"var{i}" for i in range(D1 + D2, D)] mdata = MuData({"mod1": mod1, "mod2": mod2, "mod3": mod3}) # common column to be propagated to all modalities mdata.var["highly_variable"] = True # prefix column to be propagated to the respective modalities mdata.var["mod2:if_mod2"] = np.concatenate([ np.repeat(pd.NA, D1), np.repeat(True, D2), np.repeat(pd.NA, D3), ]) return mdata ``` -------------------------------- ### Drop Annotations from MuData Object (Python) Source: https://github.com/scverse/mudata/blob/main/docs/source/notebooks/annotations_management.ipynb Demonstrates how to manage annotations in a MuData object. The `push_obs(only_drop=True)` method drops annotations without propagating them to other modalities, while `drop=True` would drop them and propagate. This specific example shows dropping annotations only from the main object. ```python mdata.push_obs(only_drop=True) ``` -------------------------------- ### Import MuData and Create from AnnData Objects Source: https://github.com/scverse/mudata/blob/main/docs/source/io/input.rst Demonstrates the basic import of the MuData library and the creation of a MuData object by combining multiple AnnData objects, where each AnnData object represents a different omics layer (e.g., 'rna', 'atac'). ```python from mudata import MuData mdata = MuData({'rna': adata_rna, 'atac': adata_atac}) ``` -------------------------------- ### Create MuData Object from AnnData Modalities (Python) Source: https://context7.com/scverse/mudata/llms.txt Demonstrates how to create a MuData object by combining multiple AnnData objects, each representing a different modality. This involves initializing AnnData objects for each modality and then passing them as a dictionary to the MuData constructor. The resulting MuData object integrates these modalities and allows for shared or distinct annotations. ```python import mudata as md from mudata import MuData from anndata import AnnData import numpy as np import pandas as pd # Create individual modality data n_obs = 1000 rna_data = AnnData( X=np.random.randn(n_obs, 2000), obs=pd.DataFrame({"cell_type": np.random.choice(["T", "B", "NK"], n_obs)}, index=[f"cell_{i}" for i in range(n_obs)]), var=pd.DataFrame(index=[f"gene_{i}" for i in range(2000)]) ) atac_data = AnnData( X=np.random.randn(n_obs, 5000), obs=pd.DataFrame(index=[f"cell_{i}" for i in range(n_obs)]), var=pd.DataFrame(index=[f"peak_{i}" for i in range(5000)]) ) # Create MuData object from dictionary of modalities mdata = MuData({"rna": rna_data, "atac": atac_data}) print(mdata) # MuData object with n_obs × n_vars = 1000 × 7000 # 2 modalities # rna: 1000 x 2000 # atac: 1000 x 5000 # Access modalities print(mdata.mod["rna"]) print(mdata["rna"]) print(mdata.mod_names) print(mdata.n_mod) ``` -------------------------------- ### Create a Staged MuData Object (Python) Source: https://github.com/scverse/mudata/blob/main/docs/source/notebooks/annotations_management.ipynb This function creates a MuData object with two modalities, 'raw' and 'qced', using `axis=-1`. This setup is suitable for representing different stages of data processing, where both observations and variables might be shared or filtered across stages. ```python import numpy as np from anndata import AnnData from mudata import MuData def make_staged_mdata(): N, D = 10, 100 Nsub, Dsub = 8, 50 mod1 = AnnData(np.arange(0, 100, 0.1).reshape(N, D)) mod1.obs_names = [f"obs{i}" for i in range(N)] mod1.var_names = [f"var{i}" for i in range(D)] mod2 = AnnData(np.arange(3101, 3501, 1).reshape(Nsub, Dsub)) mod2.obs_names = [f"obs{i}" for i in range(Nsub)] mod2.var_names = [f"var{i}" for i in range(Dsub)] # common column already present in all modalities mod1.obs["status"] = True mod2.obs["status"] = True # column present in one modality (unique) mod2.obs["filtered"] = True mod2.var["filtered"] = True mdata = MuData({"raw": mod1, "qced": mod2}, axis=-1) return mdata ``` -------------------------------- ### Initialize and Display MuData Object Source: https://github.com/scverse/mudata/blob/main/docs/source/notebooks/annotations_management.ipynb This snippet initializes a MuData object by calling the `make_mdata()` function and then displays a summary of the created object. The output shows the total number of observations and variables, the global observation annotations, and the number and dimensions of the included modalities. ```python mdata = make_mdata() mdata ``` -------------------------------- ### Examine Variable Maps in MuData with h5py and numpy Source: https://github.com/scverse/mudata/blob/main/docs/source/io/spec.rst Demonstrates how to inspect the '.varmap' group in a MuData file, listing the keys (modalities) and retrieving the integer map for a specific modality using h5py and numpy. ```python >>> import h5py >>> import numpy as np >>> f = h5py.File("citeseq.h5mu") >>> list(f["varmap"].keys()) ['prot', 'rna'] >>> np.array(f["varmap"]["rna"]) array([ 0, 0, 0, ..., 17804, 17805, 17806], dtype=uint32) >>> np.array(f["varmap"]["prot"]) array([1, 2, 3, ..., 0, 0, 0], dtype=uint32) ``` -------------------------------- ### Inspect MuData File Structure with h5py Source: https://github.com/scverse/mudata/blob/main/docs/source/io/spec.rst Demonstrates how to open a .h5mu file using the h5py library and list its top-level keys, providing insight into the stored data components. ```python >>> import h5py >>> f = h5py.File("citeseq.h5mu") >>> list(f.keys()) ['mod', 'obs', 'obsm', 'obsmap', 'uns', 'var', 'varm', 'varmap'] ``` -------------------------------- ### Subsetting MuData Objects to Create Views Source: https://context7.com/scverse/mudata/llms.txt This section demonstrates how to create views of MuData objects using slicing and boolean indexing without copying data. It covers subsetting by indices, using boolean masks, indexing by observation names, and accessing individual modalities. It also shows how to create a full copy and make observation/variable names unique. ```python import mudata as md import numpy as np # Subset by observation indices subset = mdata[:100] # First 100 observations print(subset.is_view) # True # Subset by observation and variable indices subset = mdata[:100, :500] # Boolean indexing mask = mdata.obs["cell_type"] == "T" t_cells = mdata[mask] # Index by observation names selected_cells = mdata[["cell_0", "cell_1", "cell_5"]] # Access modality directly (returns AnnData) rna = mdata["rna"] # Shorthand for mdata.mod["rna"] # Create a copy (not a view) mdata_copy = mdata.copy() print(mdata_copy.is_view) # False # Copy backed object to new file if mdata.isbacked: mdata_copy = mdata.copy(filename="new_file.h5mu") # Make names unique to avoid duplicates mdata.obs_names_make_unique() mdata.var_names_make_unique() ``` -------------------------------- ### Create a Sample MuData Object for Demonstration Source: https://github.com/scverse/mudata/blob/main/docs/source/notebooks/annotations_management.ipynb This Python function `make_mdata` generates a sample MuData object with two modalities, 'mod1' and 'mod2'. It initializes each modality with AnnData objects containing specified dimensions and creates a common observation annotation ('true') to demonstrate annotation pushing capabilities. ```python import numpy as np from anndata import AnnData from mudata import MuData def make_mdata(): N = 100 D1, D2 = 10, 20 D = D1 + D2 mod1 = AnnData(np.arange(0, 100, 0.1).reshape(-1, D1)) mod1.obs_names = [f"obs{i}" for i in range(mod1.n_obs)] mod1.var_names = [f"var{i}" for i in range(D1)] mod2 = AnnData(np.arange(3101, 5101, 1).reshape(-1, D2)) mod2.obs_names = mod1.obs_names.copy() mod2.var_names = [f"var{i}" for i in range(D1, D1 + D2)] mdata = MuData({"mod1": mod1, "mod2": mod2}) # common column to be propagated to all modalities mdata.obs["true"] = True return mdata ``` -------------------------------- ### MuData Core Functions Source: https://github.com/scverse/mudata/blob/main/docs/source/_templates/autosummary/function.rst Documentation for core MuData functions related to data access and manipulation. ```APIDOC ## MuData API Documentation This document outlines the API for the MuData library, designed for handling multimodal single-cell data. ### General Information - **Project**: /scverse/mudata ### Core Functions This section details the core functions available within the MuData library. #### `autofunction` **Description**: This is a placeholder for documented functions within the MuData module. Specific details would depend on the actual function being documented. **Usage**: Refer to individual function documentation for specific parameters and return values. **Example**: ```python # Example usage would depend on the specific function import mudata as mu # Assuming 'my_data.h5ad' is a valid MuData file adata = mu.read('my_data.h5ad') print(adata) ``` ### Data Access and Manipulation MuData provides methods to access and manipulate different modalities within a single object. **Example**: ```python # Accessing data modalities print(adata.obs) print(adata.var) print(adata.obsm) print(adata.uns) print(adata.layers) print(adata.obsp) print(adata.varp) # Accessing specific modalities (e.g., 'rna') rna_data = adata.modalities['rna'] print(rna_data) # Adding a new modality new_modality = mu.MuData({'expression': new_adata}) adata.modalities['new'] = new_modality ``` ``` -------------------------------- ### Read and Write MuData Files (.h5mu, .zarr) (Python) Source: https://context7.com/scverse/mudata/llms.txt Provides functions for persisting and loading MuData objects to and from disk using HDF5 (.h5mu) and Zarr formats. It covers writing the entire MuData object, reading it back, and also supports reading or writing individual modalities. The `backed` mode allows for lazy loading of large datasets. ```python import mudata as md # Write MuData to HDF5 file mdata.write("multimodal_data.h5mu") # Read MuData from HDF5 file mdata = md.read("multimodal_data.h5mu") # Read with backed mode (lazy loading for large files) mdata_backed = md.read_h5mu("multimodal_data.h5mu", backed="r") print(mdata_backed.isbacked) # Read individual modality from .h5mu file rna_only = md.read("multimodal_data.h5mu/rna") # Write individual modality to existing .h5mu file md.write("multimodal_data.h5mu/rna", rna_data) # Write to Zarr store md.write_zarr("multimodal_data.zarr", mdata) # Read from Zarr store mdata_zarr = md.read_zarr("multimodal_data.zarr") # Generic read function auto-detects format mdata = md.read("multimodal_data.h5mu") adata = md.read("single_modality.h5ad") ``` -------------------------------- ### Read MuData from Remote URL with fsspec Caching Source: https://github.com/scverse/mudata/blob/main/docs/source/io/input.rst Demonstrates how to read a MuData object from a remote URL while utilizing a local caching mechanism provided by `fsspec`. This can improve performance for repeated reads by storing data locally in a specified directory. ```python import fsspec import mudata fname = "https://github.com/gtca/h5xx-datasets/raw/main/datasets/minipbcite.h5mu?download=" fname_cached = "filecache::" + fname with fsspec.open(fname_cached, filecache={'cache_storage': '/tmp/'}) as f: mdata = mudata.read_h5mu(f) ``` -------------------------------- ### Read MuData Zarr from S3 Bucket using s3fs Source: https://github.com/scverse/mudata/blob/main/docs/source/io/input.rst Demonstrates reading a MuData object stored in the `.zarr` format on an S3 bucket. This method uses `s3fs` to create a Zarr store mapper from the S3 path, which is then passed to `mudata.read_zarr`. ```python import s3fs import mudata storage_options = { 'endpoint_url': 'localhost:9000', 'key': 'AWS_ACCESS_KEY_ID', 'secret': 'AWS_SECRET_ACCESS_KEY', } s3 = s3fs.S3FileSystem(**storage_options) store = s3.get_mapper('s3://bucket/dataset.zarr') mdata = mudata.read_zarr(store) ``` -------------------------------- ### Create and Write Temporary MuData File (Python) Source: https://github.com/scverse/mudata/blob/main/docs/source/notebooks/quickstart_mudata.ipynb This snippet demonstrates how to create a temporary MuData file using Python's tempfile module and write a MuData object to it. It then reads the file back in a 'backed' mode for efficient handling of large datasets. ```python import tempfile import muon as md # Assuming 'mdata' is an existing MuData object # Create a temporary file temp_file = tempfile.NamedTemporaryFile(mode="w", suffix=".h5mu", prefix="muon_getting_started_") mdata.write(temp_file.name) mdata_r = md.read(temp_file.name, backed=True) print(mdata_r) ``` -------------------------------- ### Examine Observation Maps in MuData with h5py and numpy Source: https://github.com/scverse/mudata/blob/main/docs/source/io/spec.rst Illustrates how to inspect the '.obsmap' group in a MuData file, listing the keys (modalities) and retrieving the integer map for a specific modality using h5py and numpy. ```python >>> import h5py >>> import numpy as np >>> f = h5py.File("citeseq.h5mu") >>> list(f["obsmap"].keys()) ['prot', 'rna'] >>> np.array(f["obsmap"]["rna"]) array([ 1, 2, 3, ..., 3889, 3890, 3891], dtype=uint32) >>> np.array(f["obsmap"]["prot"]) array([ 1, 2, 3, ..., 3889, 3890, 3891], dtype=uint32) ``` -------------------------------- ### Display MuData Object and Modalities in Python Source: https://github.com/scverse/mudata/blob/main/docs/source/notebooks/annotations_management.ipynb This snippet demonstrates how to instantiate and then print a MuData object to view its structure. It also shows how to print the modalities contained within a MuData object. ```python mdata = make_nested_mdata() mdata ``` ```python print(mdata.mod) ``` -------------------------------- ### Copy Backed MuData Object Source: https://github.com/scverse/mudata/blob/main/docs/source/io/mudata.rst When creating a copy of a backed MuData object, a filename must be provided. The new copy will be backed at the specified new location, maintaining the backed mode for the duplicated object. ```python mdata_copy = mdata_b.copy("filename_copy.h5mu") print(mdata_b.file.filename) # => 'filename_copy.h5mu' ``` -------------------------------- ### Create a sample MuData object with annotations Source: https://github.com/scverse/mudata/blob/main/docs/source/notebooks/annotations_management.ipynb Defines a function `make_mdata` that generates a MuData object composed of three AnnData modalities. It includes common, unique, and non-unique annotations in the `.var` tables of these modalities for demonstration purposes. ```python def make_mdata(): N = 100 D1, D2, D3 = 10, 20, 30 D = D1 + D2 + D3 mod1 = AnnData(np.arange(0, 100, 0.1).reshape(-1, D1)) mod1.obs_names = [f"obs{i}" for i in range(mod1.n_obs)] mod1.var_names = [f"var{i}" for i in range(D1)] mod2 = AnnData(np.arange(3101, 5101, 1).reshape(-1, D2)) mod2.obs_names = mod1.obs_names.copy() mod2.var_names = [f"var{i}" for i in range(D1, D1 + D2)] mod3 = AnnData(np.arange(5101, 8101, 1).reshape(-1, D3)) mod3.obs_names = mod1.obs_names.copy() mod3.var_names = [f"var{i}" for i in range(D1 + D2, D)] # common column already present in all modalities mod1.var["highly_variable"] = True mod2.var["highly_variable"] = np.tile([False, True], D2 // 2) mod3.var["highly_variable"] = np.tile([True, False], D3 // 2) # column present in some (2 out of 3) modalities (non-unique) mod2.var["arange"] = np.arange(D2) mod3.var["arange"] = np.arange(D3) # column present in one modality (unique) mod3.var["is_region"] = True mdata = MuData({"mod1": mod1, "mod2": mod2, "mod3": mod3}) return mdata ``` -------------------------------- ### Read/Write AnnData from/to .h5mu file in Python Source: https://github.com/scverse/mudata/blob/main/README.md Illustrates how to access and manipulate individual AnnData objects stored within a .h5mu file. This allows for targeted reading and writing of specific modalities. ```python adata = md.read("pbmc_10k.h5mu/rna") md.write("pbmc_10k.h5mu/rna", adata) ``` -------------------------------- ### Prepare AnnData Objects Source: https://github.com/scverse/mudata/blob/main/docs/source/notebooks/nuances.ipynb Creates two AnnData objects, 'mod1' and 'mod2', using NumPy for random data generation. This demonstrates the preparation of individual data modalities before combining them into a MuData object. ```python n, d1, d2, k = 1000, 100, 200, 10 np.random.seed(1) z = np.random.normal(loc=np.arange(k), scale=np.arange(k)*2, size=(n,k)) w1 = np.random.normal(size=(d1,k)) w2 = np.random.normal(size=(d2,k)) mod1 = AnnData(X=np.dot(z, w1.T)) mod2 = AnnData(X=np.dot(z, w2.T)) ``` -------------------------------- ### Push Annotations to Modalities Source: https://github.com/scverse/mudata/blob/main/docs/source/io/mudata.rst The push_obs and push_var methods allow updating the annotations of individual modalities with the global annotations stored in the MuData object. This is useful for broadcasting changes made at the container level to all its constituent modalities. ```python mudata.MuData.push_obs(mdata, key='modality_name') ``` ```python mudata.MuData.push_var(mdata, key='modality_name') ``` -------------------------------- ### Inspect .h5mu file contents Source: https://github.com/scverse/mudata/blob/main/docs/source/io/output.rst Uses the 'h5ls' command-line tool to list the top-level groups within an .h5mu file, showing the 'mod', 'obs', 'obsm', 'var', and 'varm' groups. ```console > h5ls mudata.h5mu mod Group obs Group obsm Group var Group varm Group ``` -------------------------------- ### Read Backed MuData Container Source: https://github.com/scverse/mudata/blob/main/docs/source/io/mudata.rst Enable backed mode when reading '.h5mu' files by setting the 'backed' flag to True. This allows for memory-efficient handling of large datasets by keeping data on disk until accessed. ```python mdata_b = mudata.read("filename.h5mu", backed=True) print(mdata_b.isbacked) # => True ``` -------------------------------- ### Write and Read MuData to/from .h5mu file in Python Source: https://github.com/scverse/mudata/blob/main/README.md Demonstrates how to save a MuData object to an .h5mu file and subsequently read it back. This is essential for persistent storage and retrieval of multimodal omics data. ```python import mudata as md mdata_pbmc.write("pbmc_10k.h5mu") mdata = md.read("pbmc_10k.h5mu") ``` -------------------------------- ### Initialize and prepare MuData object for annotation pulling Source: https://github.com/scverse/mudata/blob/main/docs/source/notebooks/annotations_management.ipynb Initializes a MuData object using the `make_mdata` function and then clears the `.var` table. This prepares the object for demonstrating the `.pull_var()` method. ```python mdata = make_mdata() # TODO: shouldn't be needed from 0.4 # mdata.update(pull=False) mdata.var = mdata.var.loc[:,[]] ``` -------------------------------- ### Serialize MuData object to .h5mu file Source: https://github.com/scverse/mudata/blob/main/docs/source/notebooks/quickstart_mudata.ipynb Demonstrates the initial step for serializing a MuData object into the `.h5mu` file format using Python's `tempfile` module. This is a common operation for saving and loading MuData datasets. ```python import tempfile ``` -------------------------------- ### Create MuData Object for Demonstration Source: https://github.com/scverse/mudata/blob/main/docs/source/notebooks/annotations_management.ipynb This Python function `make_mdata` generates a sample MuData object with three modalities (mod1, mod2, mod3), each containing AnnData objects with varying observation and variable annotations. This is used to demonstrate the functionality of `.pull_obs()`. ```python import numpy as np from anndata import AnnData from mudata import MuData def make_mdata(): N = 100 D1, D2, D3 = 10, 20, 30 D = D1 + D2 + D3 mod1 = AnnData(np.arange(0, 100, 0.1).reshape(-1, D1)) mod1.obs_names = [f"obs{i}" for i in range(mod1.n_obs)] mod1.var_names = [f"var{i}" for i in range(D1)] mod2 = AnnData(np.arange(3101, 5101, 1).reshape(-1, D2)) mod2.obs_names = mod1.obs_names.copy() mod2.var_names = [f"var{i}" for i in range(D1, D1 + D2)] mod3 = AnnData(np.arange(5101, 8101, 1).reshape(-1, D3)) mod3.obs_names = mod1.obs_names.copy() mod3.var_names = [f"var{i}" for i in range(D1 + D2, D)] # common column already present in all modalities mod1.obs["qc"] = True mod2.obs["qc"] = True mod3.obs["qc"] = np.tile([True, False], N // 2) # column present in some (2 out of 3) modalities (non-unique) mod2.obs["arange"] = np.arange(N) mod3.obs["arange"] = np.arange(N, 2*N) # column present in one modality (unique) mod3.obs["mod3_cell"] = True mdata = MuData({"mod1": mod1, "mod2": mod2, "mod3": mod3}) return mdata ``` -------------------------------- ### Import MuData and AnnData Source: https://github.com/scverse/mudata/blob/main/docs/source/notebooks/nuances.ipynb Imports the necessary classes and functions from the mudata library, including MuData and AnnData. These are fundamental for creating and manipulating multi-modal datasets. ```python import mudata from mudata import MuData, AnnData ``` -------------------------------- ### Set Annotation Update Behavior Source: https://github.com/scverse/mudata/blob/main/docs/source/io/mudata.rst Demonstrates how to configure the behavior of annotation updates in MuData objects using `mudata.set_options`. This is particularly relevant for managing how annotations are pulled and pushed between modalities and the main MuData object, with a warning about upcoming changes in default behavior. ```python mudata.set_options(pull_on_update=False) ``` -------------------------------- ### Read MuData from S3 Bucket using fsspec Source: https://github.com/scverse/mudata/blob/main/docs/source/io/input.rst Shows how to read a MuData object stored in the `.h5mu` format on an S3 bucket. It requires configuring storage options such as endpoint URL, access key, and secret key, and uses `fsspec` for accessing the S3 filesystem. ```python import fsspec import mudata storage_options = { 'endpoint_url': 'localhost:9000', 'key': 'AWS_ACCESS_KEY_ID', 'secret': 'AWS_SECRET_ACCESS_KEY', } with fsspec.open('s3://bucket/dataset.h5mu', **storage_options) as f: mudata.read_h5mu(f) ``` -------------------------------- ### Import MuData and AnnData Modules Source: https://github.com/scverse/mudata/blob/main/docs/source/notebooks/quickstart_mudata.ipynb Imports the necessary libraries, mudata (as md) and MuData class, along with AnnData from the anndata library. These imports are essential for defining and working with multimodal data structures. ```python import mudata as md from mudata import MuData from anndata import AnnData ``` -------------------------------- ### Convert Between MuData and AnnData Source: https://context7.com/scverse/mudata/llms.txt This snippet shows how to convert between MuData and AnnData formats, facilitating interoperability. It includes converting a MuData object to a single AnnData by concatenating modalities and converting an AnnData object to a MuData object by splitting based on variable annotations or predefined feature types. ```python import mudata as md from mudata import MuData, to_anndata, to_mudata from anndata import AnnData import numpy as np import pandas as pd # Convert MuData to AnnData by concatenating modalities # For axis=0 (shared obs): concatenates along variables (axis=1) # For axis=1 (shared var): concatenates along observations (axis=0) adata = mdata.to_anndata() # or equivalently: adata = to_anndata(mdata) # Convert AnnData to MuData by splitting # Create example AnnData with feature types adata = AnnData( X=np.random.randn(100, 150), var=pd.DataFrame({ "feature_type": ["rna"]*100 + ["protein"]*50 }, index=[f"feat_{i}" for i in range(150)]) ) # Split AnnData into MuData by variable annotation mdata = to_mudata(adata, axis=0, by="feature_type") print(mdata.mod_names) # ['rna', 'protein'] # MuData constructor can also split AnnData by feature_types column # Default mapping: "Gene Expression" -> "rna", "Peaks" -> "atac", "Antibody Capture" -> "prot" mdata = MuData(adata, feature_types_names={"rna": "rna", "protein": "prot"}) ``` -------------------------------- ### Manage Annotations with MuData Pull/Push Source: https://context7.com/scverse/mudata/llms.txt This snippet demonstrates how to pull and push annotations between global MuData objects and individual modalities. It covers pulling/pushing specific columns, from/to specific modalities, and using various options like `common`, `unique`, `prefixed`, and `join_common`. ```python import mudata as md # Pull annotations from modalities to global .obs/.var # Copies columns from modality .obs tables to mdata.obs mdata.pull_obs() # Pull all columns from all modalities # Pull specific columns mdata.pull_obs(columns=["cell_type", "n_counts"]) # Pull from specific modalities only mdata.pull_obs(mods=["rna"]) # Pull with options mdata.pull_obs( common=True, # Pull columns present in all modalities (joined) unique=True, # Pull columns unique to one modality (prefixed) nonunique=True, # Pull columns in multiple (but not all) modalities prefix_unique=True, # Prefix unique column names with modality name join_common=True, # Join common columns across modalities drop=False # Remove columns from modalities after pulling ) # Same options available for pull_var() mdata.pull_var(columns=["highly_variable"]) # Push annotations from global to modalities # Copies columns from mdata.obs to modality .obs tables mdata.push_obs() # Push all columns to all modalities # Push specific columns mdata.push_obs(columns=["batch", "sample_id"]) # Push to specific modalities mdata.push_obs(mods=["rna", "atac"]) # Push with options mdata.push_obs( common=True, # Push columns without modality prefix prefixed=True, # Push columns with modality prefix (to matching modality) drop=False # Remove columns from global after pushing ) # Same options available for push_var() mdata.push_var(columns=["gene_symbol"], mods=["rna"]) ``` -------------------------------- ### Sample Merged Variable Annotations (Python) Source: https://github.com/scverse/mudata/blob/main/docs/source/notebooks/nuances.ipynb Samples and displays rows from the global MuData object's variable metadata after annotations have been pulled. It shows how annotations from different modalities (prefixed with modality names) are merged into a single table. ```python import numpy as np np.random.seed(10) mdata.var.sample(5) ``` -------------------------------- ### Simulate Multimodal Data Source: https://github.com/scverse/mudata/blob/main/docs/source/notebooks/quickstart_mudata.ipynb Generates simulated data matrices (y and y2) and associated AnnData objects (adata and adata2) for demonstrating MuData. This involves creating random arrays and performing dot products. The output is the shape of the simulated data matrices. ```python import numpy as np np.random.seed(1) n, d, k = 1000, 100, 10 z = np.random.normal(loc=np.arange(k), scale=np.arange(k)*2, size=(n,k)) w = np.random.normal(size=(d,k)) y = np.dot(z, w.T) print(y.shape) d2 = 50 w2 = np.random.normal(size=(d2,k)) y2 = np.dot(z, w2.T) adata = AnnData(y) adata.obs_names = [f"obs_{i+1}" for i in range(n)] adata.var_names = [f"var_{j+1}" for j in range(d)] adata2 = AnnData(y2) adata2.obs_names = [f"obs_{i+1}" for i in range(n)] adata2.var_names = [f"var2_{j+1}" for j in range(d2)] ``` -------------------------------- ### Construct AnnData Object from Arrays/DataFrames Source: https://github.com/scverse/mudata/blob/main/docs/source/io/input.rst Shows how to create an AnnData object, a fundamental data structure used within MuData, from NumPy arrays for the main data matrix (X) and Pandas DataFrames for observation (obs) and variable (var) annotations. ```python from anndata import AnnData adata = AnnData(X=matrix, obs=metadata_df, var=features_df) ``` -------------------------------- ### Display Backed MuData Object in HTML (Python) Source: https://github.com/scverse/mudata/blob/main/docs/source/notebooks/quickstart_mudata.ipynb This snippet shows how to display a backed MuData object in an HTML format using muon's display options. It allows for controlling the expansion of the HTML representation, useful for generating reports or interactive visualizations. ```python import muon as md from IPython.display import display # Assuming 'mdata_r' is a backed MuData object with md.set_options(display_style = "html", display_html_expand = 0b000): display(mdata_r) ``` -------------------------------- ### Push Common Observation Annotations in MuData Source: https://github.com/scverse/mudata/blob/main/docs/source/notebooks/annotations_management.ipynb This demonstrates the `push_obs()` method in MuData to propagate observation annotations to all modalities. After calling `mdata.push_obs()`, the subsequent loop iterates through each modality and prints its observation data types, showing that the 'true' column has been added to each. ```python mdata.push_obs() for m in mdata.mod.keys(): print(mdata[m].obs.dtypes) ``` -------------------------------- ### Pushing only common .var annotations in MuData Source: https://github.com/scverse/mudata/blob/main/docs/source/notebooks/annotations_management.ipynb Demonstrates pushing only common annotations (not prefixed) from the global `.var` to each modality's `.var` using `common=True` and `prefixed=False`. It then shows the resulting dtypes. ```python mdata.push_var(common=True, prefixed=False) for m in mdata.mod.keys(): print(mdata[m].var.dtypes) ``` ```text highly_variable bool dtype: object highly_variable bool dtype: object highly_variable bool dtype: object ``` ```python # Clean up for m in mdata.mod.keys(): mdata[m].var = mdata[m].var.loc[:,[]] ``` -------------------------------- ### Pushing global .var annotations to modalities in MuData Source: https://github.com/scverse/mudata/blob/main/docs/source/notebooks/annotations_management.ipynb Executes the `.push_var()` method to propagate annotations from the global `.var` to each modality's `.var`. It then prints the data types of the `.var` columns for each modality. ```python mdata.push_var() for m in mdata.mod.keys(): print(mdata[m].var.dtypes) ``` ```text highly_variable bool dtype: object highly_variable bool if_mod2 object dtype: object highly_variable bool dtype: object ``` ```python # Clean up for m in mdata.mod.keys(): mdata[m].var = mdata[m].var.loc[:,[]] ``` -------------------------------- ### Inspect .zarr directory structure Source: https://github.com/scverse/mudata/blob/main/docs/source/io/output.rst Displays the directory structure of a MuData object saved in the .zarr format using the 'tree' command. It shows the hierarchical organization of data within the Zarr store. ```console > tree -L 1 mudata.zarr mudata.zarr ├── mod ├── obs ├── obsm ├── obsmap ├── obsp ├── uns ├── var ├── varm ├── varmap └── varp ``` -------------------------------- ### Pushing only prefixed .var annotations in MuData Source: https://github.com/scverse/mudata/blob/main/docs/source/notebooks/annotations_management.ipynb Shows how to push only prefixed annotations from the global `.var` to the respective modalities using `common=False` and `prefixed=True`. The output displays the dtypes after pushing. ```python mdata.push_var(common=False, prefixed=True) for m in mdata.mod.keys(): print(mdata[m].var.dtypes) ``` ```text Series([], dtype: object) if_mod2 object dtype: object Series([], dtype: object) ``` ```python # Clean up for m in mdata.mod.keys(): mdata[m].var = mdata[m].var.loc[:,[]] ``` -------------------------------- ### Pull Annotations from Modalities Source: https://github.com/scverse/mudata/blob/main/docs/source/io/mudata.rst Illustrates how to copy observation (`.obs`) and variable (`.var`) annotations from individual modalities into the main MuData object. This is useful for consolidating metadata. Prefixes are added to observation columns to indicate their origin, while variable columns with identical names are merged. ```python mdata.pull_obs() mdata.pull_var() ``` -------------------------------- ### Concatenate MuData Objects Source: https://context7.com/scverse/mudata/llms.txt Demonstrates concatenating multiple MuData objects into a single object. Supports different join strategies ('inner', 'outer') for variable intersection/union, adding batch labels, and merging strategies for non-concatenated elements. It also shows concatenation from a dictionary. ```python import mudata as md # Assume mdata1 and mdata2 are pre-defined MuData objects mdata1 = md.MuData({'rna': ad.AnnData(...), 'atac': ad.AnnData(...)}) mdata2 = md.MuData({'rna': ad.AnnData(...), 'atac': ad.AnnData(...)}) # Concatenate from a list combined_list = md.concat( [mdata1, mdata2], join="inner", label="batch", keys=["sample1", "sample2"], index_unique="-", merge="unique", uns_merge="same", pairwise=False ) print(combined_list.n_obs) print(combined_list.obs["batch"].value_counts()) # Concatenate from a dictionary combined_dict = md.concat( {"sample1": mdata1, "sample2": mdata2}, join="outer", fill_value=0 ) ``` -------------------------------- ### Access Modality A's 'misc' data in MuData Source: https://github.com/scverse/mudata/blob/main/docs/source/notebooks/quickstart_mudata.ipynb Demonstrates how to access specific unstructured data ('misc') within a modality ('A') of a MuData object. This is useful for retrieving auxiliary information associated with a modality. ```python mdata.mod["A"].uns["misc"] ```