### Clone Repository and Install from Source Source: https://www.mdanalysis.org/MDAnalysisData/_sources/install.rst.txt Clone the MDAnalysisData repository from GitHub and install it using pip. ```bash git clone https://github.com/MDAnalysis/MDAnalysisData.git ``` ```bash pip install MDAnalysisData/ ``` -------------------------------- ### Install MDAnalysisData with pip Source: https://www.mdanalysis.org/MDAnalysisData/_sources/install.rst.txt Use pip to install or upgrade the MDAnalysisData package from PyPi. ```bash pip install --upgrade MDAnalysisData ``` -------------------------------- ### Install MDAnalysisData from source with pip Source: https://www.mdanalysis.org/MDAnalysisData/install.html Install the MDAnalysisData package locally after cloning the repository, using pip. ```bash pip install MDAnalysisData/ ``` -------------------------------- ### Install MDAnalysisData with conda Source: https://www.mdanalysis.org/MDAnalysisData/_sources/install.rst.txt Configure conda to use the conda-forge channel and then install the mdanalysisdata package. ```bash conda config --add channels conda-forge conda install mdanalysisdata ``` -------------------------------- ### Install MDAnalysisData with conda Source: https://www.mdanalysis.org/MDAnalysisData/install.html Install MDAnalysisData using the conda package manager, ensuring the conda-forge channel is added. ```bash conda config --add channels conda-forge conda install mdanalysisdata ``` -------------------------------- ### Get MDAnalysis Data Home Directory Source: https://www.mdanalysis.org/MDAnalysisData/_modules/MDAnalysisData/base.html Retrieves the path to the MDAnalysis data directory, creating it if it doesn't exist. The directory can be specified via an environment variable or programmatically. ```python [docs] def get_data_home(data_home=None): """Return the path of the MDAnalysisData data dir. This folder is used by some large dataset loaders to avoid downloading the data several times. By default the data dir is set to a folder named 'MDAnalysis_data' in the user's home directory. Alternatively, it can be set by the :envvar:`MDANALYSIS_DATA` environment variable or programmatically by giving an explicit folder path. The '~' symbol is expanded to the user home folder. If the folder does not already exist, it is automatically created. Parameters ---------- data_home : str | None The path to MDAnalysisData data dir. """ if data_home is None: data_home = environ.get('MDANALYSIS_DATA', DEFAULT_DATADIR) data_home = expanduser(data_home) if not exists(data_home): makedirs(data_home) return data_home ``` -------------------------------- ### Load I-FABP Water Trajectory - Python Source: https://www.mdanalysis.org/MDAnalysisData/ifabp_water.html Use this function to load the I-FABP with water 0.5 ns equilibrium trajectory. Specify a custom data directory or disable downloading if the data is not locally available. ```python MDAnalysisData.ifabp_water.fetch_ifabp_water(_data_home =None_, _download_if_missing =True_) ``` -------------------------------- ### fetch_yiip_equilibrium_short Source: https://www.mdanalysis.org/MDAnalysisData/_modules/MDAnalysisData/yiip_equilibrium.html Loads the Yii P 9 ns equilibrium trajectory. Allows specifying a custom data home directory and controlling download behavior. ```APIDOC ## fetch_yiip_equilibrium_short ### Description Loads the Yii P 9 ns equilibrium trajectory. This function allows users to specify an alternative directory for downloading and caching data, and to control whether the data should be downloaded if it's not found locally. ### Parameters - **data_home** (optional, default: None): Specify another download and cache folder for the datasets. By default, all MDAnalysisData data is stored in '~/MDAnalysis_data' subfolders. This dataset is stored in ``/yiip_equilibrium``. - **download_if_missing** (optional, default=True): If ``False``, raise a :exc:`IOError` if the data is not locally available instead of trying to download the data from the source site. ### Returns - **dataset** (dict-like object): An object with the following attributes: - **topology** (filename): Filename of the topology file. - **trajectory** (filename): Filename of the trajectory file. - **DESCR** (string): Description of the trajectory. ### See Also See :ref:`yiip-equilibrium-dataset` for a more detailed description. ``` -------------------------------- ### fetch_yiip_equilibrium_short Source: https://www.mdanalysis.org/MDAnalysisData/_sources/yiip_equilibrium.rst.txt Fetches the short (9-ns) equilibrium MD trajectory of the YiiP membrane-protein system. ```APIDOC ## fetch_yiip_equilibrium_short ### Description Fetches the short (9-ns) equilibrium MD trajectory of the YiiP membrane-protein system. ### Method This is a function call, not an HTTP request. ### Parameters This function does not take any parameters. ### Returns - A trajectory object representing the 9-ns MD simulation. ``` -------------------------------- ### Get Data Home Directory Source: https://www.mdanalysis.org/MDAnalysisData/helpers.html Retrieves the path to the MDAnalysisData cache directory. This directory is used to store downloaded datasets. The path can be programmatically set or determined by the MDANALYSIS_DATA environment variable. If the directory does not exist, it will be created. ```python MDAnalysisData.base.get_data_home() ``` -------------------------------- ### fetch_adk_transitions_DIMS Source: https://www.mdanalysis.org/MDAnalysisData/_modules/MDAnalysisData/adk_transitions.html Loads the AdK DIMS transition dataset. This function downloads the dataset if it's not found locally and returns a dataset object containing topology, trajectories, and descriptive information. ```APIDOC ## fetch_adk_transitions_DIMS ### Description Loads the AdK DIMS transititions dataset. This function downloads the dataset if it's not found locally and returns a dataset object containing topology, trajectories, and descriptive information. ### Parameters #### Path Parameters None #### Query Parameters - **data_home** (optional, default: None) - Specify another download and cache folder for the datasets. By default all MDAnalysisData data is stored in '~/MDAnalysis_data' subfolders. This dataset is stored in ``/adk_transitions_DIMS``. - **download_if_missing** (optional, default=True) - If ``False``, raise a :exc:`IOError` if the data is not locally available instead of trying to download the data from the source site. ### Returns - **dataset** (dict-like object) - An object with the following attributes: - **topology** (filename): Filename of the topology file. - **trajectories** (list): List with filenames of the trajectory ensemble. - **N_trajectories** (int): Number of trajectories in the ensemble. - **DESCR** (string): Description of the ensemble. ### See Also See :ref:`adk-transitions-DIMS-dataset` for description. ``` -------------------------------- ### fetch_adk_transitions_FRODA Source: https://www.mdanalysis.org/MDAnalysisData/_modules/MDAnalysisData/adk_transitions.html Loads the AdK FRODA transition dataset. This function downloads the dataset if it's not found locally and returns a dataset object containing topology, trajectories, and descriptive information. ```APIDOC ## fetch_adk_transitions_FRODA ### Description Loads the AdK FRODA transititions dataset. This function downloads the dataset if it's not found locally and returns a dataset object containing topology, trajectories, and descriptive information. ### Parameters #### Path Parameters None #### Query Parameters - **data_home** (optional, default: None) - Specify another download and cache folder for the datasets. By default all MDAnalysisData data is stored in '~/MDAnalysis_data' subfolders. This dataset is stored in ``/adk_transitions_FRODA``. - **download_if_missing** (optional, default=True) - If ``False``, raise a :exc:`IOError` if the data is not locally available instead of trying to download the data from the source site. ### Returns - **dataset** (dict-like object) - An object with the following attributes: - **topology** (filename): Filename of the topology file. - **trajectories** (list): List with filenames of the trajectory ensemble. - **N_trajectories** (int): Number of trajectories in the ensemble. - **DESCR** (string): Description of the ensemble. ### See Also See :ref:`adk-transitions-FRODA-dataset` for description. ``` -------------------------------- ### Load AdK Equilibrium Trajectory - Python Source: https://www.mdanalysis.org/MDAnalysisData/adk_equilibrium.html Use this function to load the AdK 1us equilibrium trajectory without water. Specify a custom download directory or set download_if_missing to False to prevent automatic downloads. ```python MDAnalysisData.adk_equilibrium.fetch_adk_equilibrium(_data_home =None_, _download_if_missing =True_) ``` -------------------------------- ### _fetch_remote() Source: https://www.mdanalysis.org/MDAnalysisData/helpers.html Downloads a remote dataset, saves it locally, and verifies its integrity using a checksum. ```APIDOC ## MDAnalysisData.base._fetch_remote(_remote_ , _dirname =None_) ### Description Helper function to download a remote dataset into path Fetch a dataset pointed by remote’s url, save into path using remote’s filename and ensure its integrity based on the SHA256 Checksum of the downloaded file. ### Parameters * **remote** (_RemoteFileMetadata_) – Named tuple containing remote dataset meta information: url, filename and checksum * **dirname** (_string_) – Directory to save the file to. ### Returns **file_path** – Full path of the created file. ### Return type string ``` -------------------------------- ### fetch_nhaa_equilibrium Source: https://www.mdanalysis.org/MDAnalysisData/_sources/nhaa_equilibrium.rst.txt Fetches the NhaA equilibrium dataset, which includes a 500-ns MD trajectory of a membrane-protein system with solvent removed. ```APIDOC ## fetch_nhaa_equilibrium ### Description Fetches the NhaA equilibrium dataset, which contains a 500-ns MD trajectory of a membrane-protein system with all solvent removed. This dataset is useful for studying membrane-protein dynamics. ### Usage ```python import MDAnalysisData.nhaa_equilibrium # Fetch the dataset data = MDAnalysisData.nhaa_equilibrium.fetch_nhaa_equilibrium() # The 'data' object can now be used with MDAnalysis or other tools. ``` ### Returns - The NhaA equilibrium dataset, typically a file path or an object that can be loaded into a molecular dynamics analysis tool. ``` -------------------------------- ### Fetch PEG_1chain Dataset Source: https://www.mdanalysis.org/MDAnalysisData/_modules/MDAnalysisData/PEG_1chain.html Use this function to load the PEG polymer trajectory. Specify `data_home` to set a custom download location or `download_if_missing=False` to prevent automatic downloads. ```python from MDAnalysisData.datasets import PEG_1chain # Load the dataset, downloading if necessary dataset = PEG_1chain.fetch_PEG_1chain() # Access the topology and trajectory files topology_file = dataset.topology trajectory_file = dataset.trajectory # Access the description description = dataset.DESCR print(f"Topology file: {topology_file}") print(f"Trajectory file: {trajectory_file}") print(f"Description: {description}") ``` -------------------------------- ### Fetch Vesicle Library Dataset Source: https://www.mdanalysis.org/MDAnalysisData/_modules/MDAnalysisData/vesicles.html Use this function to download and load the vesicle library dataset. Specify `data_home` to set a custom download location. `download_if_missing` controls whether to attempt download if data is not found locally. ```python # -*- coding: utf-8 -*- """Large vesicles library (coarse grained). https://figshare.com/articles/Large_System_Vesicle_Benchmark_Library/3406708 """ from os.path import dirname, exists, join from os import makedirs, remove import tarfile import logging from .base import get_data_home from .base import _fetch_remote, _read_description from .base import RemoteFileMetadata from .base import Bunch METADATA = { 'vesicle_lib': { 'NAME': "vesicle_library", 'DESCRIPTION': "vesicle_lib.rst", 'ARCHIVE': { 'tarfile': RemoteFileMetadata( filename='vesicles_1.0.tar.bz2', url='https://ndownloader.figshare.com/files/5320846', checksum='cba5a6221df664c79229a27d82faf779f63dee608f96a7b3b64ef209b93ec0d0', ), }, 'CONTENTS': { 'structures': ["vesicles/1_75M/system.gro", "vesicles/3_5M/system.gro", "vesicles/10M/system.gro"], 'labels': ["1_75M", "3_5M", "10M"], 'N_structures': 3, }, }, } logger = logging.getLogger(__name__) [docs] def fetch_vesicle_lib(data_home=None, download_if_missing=True): """Load the vesicle library dataset Parameters ---------- data_home : optional, default: None Specify another download and cache folder for the datasets. By default all MDAnalysisData data is stored in '~/MDAnalysis_data' subfolders. This dataset is stored in ``/vesicle_library``. download_if_missing : optional, default=True If ``False``, raise a :exc:`IOError` if the data is not locally available instead of trying to download the data from the source site. Returns ------- dataset : dict-like object with the following attributes: dataset.structures : list list with filenames of the different vesicle systems (in GRO format) dataset.N_structures : int number of structures dataset.labels : list descriptors of the files in `dataset.structures` (same order), giving their approximate sizes in number of particles dataset.DESCR : string Description of the ensemble See :ref:`vesicle-library-dataset` for description. """ metadata = METADATA['vesicle_lib'] name = metadata['NAME'] data_location = join(get_data_home(data_home=data_home), name) if not exists(data_location): makedirs(data_location) records = Bunch() meta = metadata['ARCHIVE']['tarfile'] local_path = join(data_location, meta.filename) if not exists(local_path): if not download_if_missing: raise IOError("Data {0}={1} not found and `download_if_missing` is " "False".format(file_type, local_path)) logger.info("Downloading {0}: {1} -> {2}...".format( "tarfile", meta.url, local_path)) archive_path = _fetch_remote(meta, dirname=data_location) logger.info("Unpacking {}...".format(archive_path)) with tarfile.open(archive_path, 'r') as tar: tar.extractall(path=data_location) records.structures = [join(data_location, path) for path in metadata['CONTENTS']['structures'] if exists(join(data_location, path))] records.N_structures = metadata['CONTENTS']['N_structures'] records.labels = metadata['CONTENTS']['labels'] if len(records.structures) != records.N_structures: # should not happen... raise RuntimeError("structure files in {0} are incomplete: only {1} " "but should be {2}.".format( metadata['CONTENTS']['structures'], len(records.structures), records.N_structures)) records.DESCR = _read_description(metadata['DESCRIPTION']) return records ``` -------------------------------- ### Import Fetch Function in Datasets Source: https://www.mdanalysis.org/MDAnalysisData/_sources/contributing.rst.txt Import the fetch function for a new dataset into the main datasets module. ```python from .{MODULE_NAME} import fetch_{NAME} ``` -------------------------------- ### Load PEG Polymer Trajectory Source: https://www.mdanalysis.org/MDAnalysisData/PEG_1chain.html Use this function to load the PEG polymer trajectory. Specify an alternative download directory if needed. By default, data is stored in `~/MDAnalysis_data/CG_fiber`. If `download_if_missing` is `False`, an `IOError` will be raised if the data is not found locally. ```python MDAnalysisData.PEG_1chain.fetch_PEG_1chain(_data_home =None_, _download_if_missing =True_) ``` -------------------------------- ### Fetch AdK Equilibrium Dataset Source: https://www.mdanalysis.org/MDAnalysisData/usage.html Downloads and unpacks the AdK equilibrium dataset. The first call is slow as it downloads from figshare; subsequent calls use cached files. ```python >>> from MDAnalysisData import datasets >>> adk = datasets.fetch_adk_equilibrium() ``` -------------------------------- ### fetch_adk_transitions_FRODA Source: https://www.mdanalysis.org/MDAnalysisData/adk_transitions.html Loads the AdK FRODA transitions dataset. This function allows specifying a custom data directory and controlling the download behavior if the data is not found locally. ```APIDOC ## fetch_adk_transitions_FRODA ### Description Loads the AdK FRODA transititions dataset. ### Parameters #### Path Parameters None #### Query Parameters None #### Request Body None ### Method None (Python function) ### Endpoint None (Python function) ### Parameters * **data_home** (_optional_, default: None) - Specify another download and cache folder for the datasets. By default all MDAnalysisData data is stored in ‘~/MDAnalysis_data’ subfolders. This dataset is stored in `/adk_transitions_FRODA`. * **download_if_missing** (_optional_, default: True) - If `False`, raise a `IOError` if the data is not locally available instead of trying to download the data from the source site. ### Returns * **dataset** (_dict-like object with the following attributes:_) * **dataset.topology** (_filename_) – Filename of the topology file * **dataset.trajectories** (_list_) – list with filenames of the trajectory ensemble * **dataset.N_trajectories** (_int_) – number of trajectories in the ensemble * **dataset.DESCR** (_string_) – Description of the ensemble ### Request Example ```python import MDAnalysisData trajectories = MDAnalysisData.adk_transitions.fetch_adk_transitions_FRODA() ``` ### Response #### Success Response Returns a dataset object with topology, trajectories, number of trajectories, and description attributes. #### Response Example ```json { "topology": "path/to/adk_frodar.xtc", "trajectories": ["path/to/adk_frodar_001.xtc", "path/to/adk_frodar_002.xtc", ...], "N_trajectories": 200, "DESCR": "Description of the AdK FRODA transitions ensemble dataset." } ``` ``` -------------------------------- ### Fetch I-FABP Water Data Source: https://www.mdanalysis.org/MDAnalysisData/_modules/MDAnalysisData/ifabp_water.html Loads the I-FABP with water 0.5 ns equilibrium trajectory. Handles downloading data if missing. Specify `data_home` to set a custom cache directory. ```python # -*- coding: utf-8 -*- """MD simulation of I-FABP with water. https://figshare.com/articles/Molecular_dynamics_trajectory_of_I-FABP_for_testing_and_benchmarking_solvent_dynamics_analysis/7058030 """ from os.path import dirname, exists, join from os import makedirs, remove import logging from .base import get_data_home from .base import _fetch_remote, _read_description from .base import RemoteFileMetadata from .base import Bunch NAME = "ifabp_water" DESCRIPTION = "ifabp_water.rst" # The original data can be found at the figshare URL. # The SHA256 checksum of the zip file changes with every download so we # cannot check its checksum. Instead we download individual files. # separately. The keys of this dict are also going to be the keys in the # Bunch that is returned. ARCHIVE = { 'topology': RemoteFileMetadata( filename='ifabp_water.psf', url='https://ndownloader.figshare.com/files/12980639', checksum='ba40714318aabec537015dc550fe5bd5ac1ac0b853f5abdd2f0ae63af9cfcafa', ), 'structure': RemoteFileMetadata( filename='ifabp_water_0.pdb', url='https://ndownloader.figshare.com/files/12980636', checksum='8ccf5f75fd85385921c0cb77f00281a93b933fc1261c42fc9492f43983448a72', ), 'trajectory': RemoteFileMetadata( filename='rmsfit_ifabp_water_1.dcd', url='https://ndownloader.figshare.com/files/12980642', checksum='cebb48e58015abc8ff2f5bb7ba3eb7a289047f256351a8252bf1f29f9aaacf0e', ), } logger = logging.getLogger(__name__) def fetch_ifabp_water(data_home=None, download_if_missing=True): """Load the I-FABP with water 0.5 ns equilibrium trajectory Parameters ---------- data_home : optional, default: None Specify another download and cache folder for the datasets. By default all MDAnalysisData data is stored in '~/MDAnalysis_data' subfolders. This dataset is stored in ``/ifabp_water``. download_if_missing : optional, default=True If ``False``, raise a :exc:`IOError` if the data is not locally available instead of trying to download the data from the source site. Returns ------- dataset : dict-like object with the following attributes: dataset.topology : filename Filename of the topology file dataset.trajectory : filename Filename of the trajectory file dataset.structure : filename Filename of a structure file in PDB format dataset.DESCR : string Description of the trajectory. See :ref:`ifabp-water-dataset` for description. """ name = NAME data_location = join(get_data_home(data_home=data_home), name) if not exists(data_location): makedirs(data_location) records = Bunch() for file_type, meta in ARCHIVE.items(): local_path = join(data_location, meta.filename) records[file_type] = local_path if not exists(local_path): if not download_if_missing: raise IOError("Data {0}={1} not found and `download_if_missing` is " "False".format(file_type, local_path)) logger.info("Downloading {0}: {1} -> {2}...".format( file_type, meta.url, local_path)) archive_path = _fetch_remote(meta, dirname=data_location) records.DESCR = _read_description(DESCRIPTION) return records ``` -------------------------------- ### fetch_adk_transitions_DIMS Source: https://www.mdanalysis.org/MDAnalysisData/_sources/adk_transitions.rst.txt Fetches the DIMS dataset for adenylate kinase transitions. ```APIDOC ## fetch_adk_transitions_DIMS ### Description Fetches the DIMS dataset for adenylate kinase transitions. ### Method ```python fetch_adk_transitions_DIMS() ``` ### Parameters This function does not take any parameters. ### Response Returns the DIMS dataset for adenylate kinase transitions. ``` -------------------------------- ### Access Dataset Topology and Trajectory Files Source: https://www.mdanalysis.org/MDAnalysisData/usage.html Prints the file paths for the topology and trajectory files of the dataset. ```python >>> print(adk.topology) >>> print(adk.trajectory) ``` -------------------------------- ### TqdmUpTo Callback for Progress Updates Source: https://www.mdanalysis.org/MDAnalysisData/_modules/MDAnalysisData/base.html A callback class that extends `tqdm` to provide an `update_to` method for tracking progress. Useful for download or processing tasks. ```python class TqdmUpTo(tqdm): """Provides `update_to(n)` which uses `tqdm.update(delta_n)`. From https://pypi.org/project/tqdm/#hooks-and-callbacks """ def update_to(self, b=1, bsize=1, tsize=None): """ b : int, optional Number of blocks transferred so so far [default: 1]. bsize : int, optional Size of each block (in tqdm units) [default: 1]. tsize : int, optional Total size (in tqdm units). If [default: None] remains unchanged. """ if tsize is not None: self.total = tsize self.update(b * bsize - self.n) # will also set self.n = b * bsize ``` -------------------------------- ### Import MDAnalysisData yiip_equilibrium Module Source: https://www.mdanalysis.org/MDAnalysisData/_modules/MDAnalysisData/yiip_equilibrium.html Imports the necessary yiip_equilibrium module and its associated components. This is typically the first step when working with this module's functionalities. ```python from os.path import dirname, exists, join from os import makedirs, remove import codecs import logging from .base import get_data_home from .base import _fetch_remote, _read_description from .base import RemoteFileMetadata from .base import Bunch NAME = "yiip_equilibrium" DESCRIPTION = "yiip_equilibrium.rst" # The original data can be found at the figshare URL. # The SHA256 checksum of the zip file changes with every download so we # cannot check its checksum. Instead we download individual files. # separately. The keys of this dict are also going to be the keys in the ``` -------------------------------- ### fetch_vesicle_lib Source: https://www.mdanalysis.org/MDAnalysisData/_modules/MDAnalysisData/vesicles.html Loads the vesicle library dataset. This function handles downloading the data if it's missing and provides access to vesicle structures, their labels, and a description of the dataset. ```APIDOC ## fetch_vesicle_lib ### Description Loads the vesicle library dataset. This function handles downloading the data if it's missing and provides access to vesicle structures, their labels, and a description of the dataset. ### Parameters #### Path Parameters None #### Query Parameters - **data_home** (optional, default: None) - Specify another download and cache folder for the datasets. By default all MDAnalysisData data is stored in '~/MDAnalysis_data' subfolders. This dataset is stored in `/vesicle_library`. - **download_if_missing** (optional, default: True) - If `False`, raise a :exc:`IOError` if the data is not locally available instead of trying to download the data from the source site. ### Returns - **dataset** (dict-like object) - An object with the following attributes: - **structures** (list) - List with filenames of the different vesicle systems (in GRO format). - **N_structures** (int) - Number of structures. - **labels** (list) - Descriptors of the files in `dataset.structures` (same order), giving their approximate sizes in number of particles. - **DESCR** (string) - Description of the ensemble. ### See Also :ref:`vesicle-library-dataset` for description. ``` -------------------------------- ### fetch_adk_transitions_FRODA Source: https://www.mdanalysis.org/MDAnalysisData/_sources/adk_transitions.rst.txt Fetches the FRODA dataset for adenylate kinase transitions. ```APIDOC ## fetch_adk_transitions_FRODA ### Description Fetches the FRODA dataset for adenylate kinase transitions. ### Method ```python fetch_adk_transitions_FRODA() ``` ### Parameters This function does not take any parameters. ### Response Returns the FRODA dataset for adenylate kinase transitions. ``` -------------------------------- ### fetch_PEG_1chain Source: https://www.mdanalysis.org/MDAnalysisData/_sources/PEG_1chain.rst.txt Fetches the PEG polymer dataset, which contains a coarse-grained MD trajectory showing the self-assembly of an amphiphilic molecule. ```APIDOC ## fetch_PEG_1chain ### Description Fetches the PEG polymer dataset. This dataset includes a coarse-grained MD trajectory illustrating the self-assembly of an amphiphilic molecule. ### Method `fetch_PEG_1chain()` ### Parameters This function does not take any parameters. ### Returns - A dataset object containing the PEG polymer trajectory data. ``` -------------------------------- ### get_data_home Source: https://www.mdanalysis.org/MDAnalysisData/_sources/helpers.rst.txt Returns the path to the data home directory. ```APIDOC def get_data_home() -> str: """Returns the path to the data home directory.""" pass ``` -------------------------------- ### Compute SHA256 Checksum from Command Line Source: https://www.mdanalysis.org/MDAnalysisData/_sources/contributing.rst.txt An alternative method to compute the SHA256 checksum of a file directly from the command line. Replace FILENAME with the actual file path. ```bash python -c 'import MDAnalysisData; print(MDAnalysisData.base._sha256("FILENAME"))' ``` -------------------------------- ### Fetch Remote Dataset Source: https://www.mdanalysis.org/MDAnalysisData/_modules/MDAnalysisData/base.html Downloads a dataset from a given URL, saves it to a specified directory, and verifies its integrity using a SHA256 checksum. Displays download progress using a tqdm progress bar. ```python [docs] def _fetch_remote(remote, dirname=None): """Helper function to download a remote dataset into path Fetch a dataset pointed by remote's url, save into path using remote's filename and ensure its integrity based on the SHA256 Checksum of the downloaded file. Parameters ----------- remote : RemoteFileMetadata Named tuple containing remote dataset meta information: url, filename and checksum dirname : string Directory to save the file to. Returns ------- file_path: string Full path of the created file. """ file_path = (remote.filename if dirname is None else join(dirname, remote.filename)) with TqdmUpTo(unit='B', unit_scale=True, miniters=1, desc=remote.filename) as t: urlretrieve(remote.url, filename=file_path, reporthook=t.update_to, data=None) checksum = _sha256(file_path) if remote.checksum != checksum: raise IOError("{} has an SHA256 checksum ({}) " ``` -------------------------------- ### fetch_yiip_equilibrium_long Source: https://www.mdanalysis.org/MDAnalysisData/_sources/yiip_equilibrium.rst.txt Fetches the long (90-ns) equilibrium MD trajectory of the YiiP membrane-protein system. ```APIDOC ## fetch_yiip_equilibrium_long ### Description Fetches the long (90-ns) equilibrium MD trajectory of the YiiP membrane-protein system. ### Method This is a function call, not an HTTP request. ### Parameters This function does not take any parameters. ### Returns - A trajectory object representing the 90-ns MD simulation. ``` -------------------------------- ### Fetch AdK FRODA transitions dataset Source: https://www.mdanalysis.org/MDAnalysisData/adk_transitions.html Loads the AdK FRODA transitions dataset. This dataset contains trajectories sampled using the Framework Rigidity Optimized Dynamics Algorithm (FRODA). ```APIDOC ## fetch_adk_transitions_FRODA ### Description Loads the AdK FRODA transititions dataset. ### Method `fetch_adk_transitions_FRODA(data_home=None, download_if_missing=True)` ### Parameters #### Path Parameters None #### Query Parameters - **data_home** (str) - Optional - Specify another download and cache folder for the datasets. By default all MDAnalysisData data is stored in ‘~/MDAnalysis_data’ subfolders. This dataset is stored in `/adk_transitions_FRODA`. - **download_if_missing** (bool) - Optional - If `False`, raise a `IOError` if the data is not locally available instead of trying to download the data from the source site. Default is `True`. ### Request Example ```python import MDAnalysisData.adk_transitions topology_file, trajectories = MDAnalysisData.adk_transitions.fetch_adk_transitions_FRODA() ``` ### Response #### Success Response (dataset) - **dataset.topology** (filename) - Filename of the topology file - **dataset.trajectories** (list) - list with filenames of the trajectory ensemble - **dataset.N_trajectories** (int) - number of trajectories in the ensemble - **dataset.DESCR** (string) - Description of the ensemble #### Response Example ```json { "topology": "path/to/1ake.pdb", "trajectories": ["path/to/froda_traj1.dcd", "path/to/froda_traj2.dcd", ...], "N_trajectories": 200, "DESCR": "FRODA AdK with geometric targeting on a rigid decomposition (FRODA server); closed (1AKE) to open (4AKE). Topology file: 1ake.pdb (without hydrogens)" } ``` ``` -------------------------------- ### Load Dataset into MDAnalysis Universe Source: https://www.mdanalysis.org/MDAnalysisData/_sources/usage.rst.txt Loads the topology and trajectory files into an MDAnalysis Universe object for analysis. ```python import MDAnalysis as mda u = mda.Universe(adk.topology, adk.trajectory) ``` -------------------------------- ### fetch_vesicle_lib Source: https://www.mdanalysis.org/MDAnalysisData/_sources/vesicles.rst.txt Fetches a vesicle library dataset. This function provides access to pre-constructed vesicle models. ```APIDOC ## fetch_vesicle_lib ### Description Fetches a vesicle library dataset. This function provides access to pre-constructed vesicle models. ### Method `fetch_vesicle_lib()` ### Parameters This function does not take any parameters. ### Returns - A dataset object representing a vesicle model. ``` -------------------------------- ### _read_description() Source: https://www.mdanalysis.org/MDAnalysisData/helpers.html Reads dataset descriptions from restructured text files. ```APIDOC ## MDAnalysisData.base._read_description(_filename_ , _description_dir ='descr'_) ### Description Read the description from restructured text file. ### Parameters * **filename** (_str_) – name of the description file under the `descr` directory ### Note All description files are supposed to be stored in the directory description_dir `="descr"` that lives in the same directory as the `MDAnalysisData.base` module file. All descriptions are assumed to be in restructured text format and in UTF-8 encoding. ``` -------------------------------- ### Set Data Directory via Environment Variable Source: https://www.mdanalysis.org/MDAnalysisData/_sources/usage.rst.txt Configures the local data directory for MDAnalysisData by setting the MDANALYSIS_DATA environment variable. This affects where datasets are downloaded and cached. ```bash export MDANALYSIS_DATA=/tmp/MDAnalysis_data ``` -------------------------------- ### Read Description File Source: https://www.mdanalysis.org/MDAnalysisData/helpers.html Reads dataset descriptions from reStructuredText files located in the 'descr' directory. Assumes UTF-8 encoding for the description files. ```python MDAnalysisData.base._read_description(_filename_ , _description_dir ='descr'_) ``` -------------------------------- ### fetch_yiip_equilibrium_long Source: https://www.mdanalysis.org/MDAnalysisData/_modules/MDAnalysisData/yiip_equilibrium.html Loads the Yii P 90 ns equilibrium trajectory. Allows specifying a custom data home directory and controlling download behavior. ```APIDOC ## fetch_yiip_equilibrium_long ### Description Loads the Yii P 90 ns equilibrium trajectory. This function allows users to specify an alternative directory for downloading and caching data, and to control whether the data should be downloaded if it's not found locally. ### Parameters - **data_home** (optional, default: None): Specify another download and cache folder for the datasets. By default, all MDAnalysisData data is stored in '~/MDAnalysis_data' subfolders. This dataset is stored in ``/yiip_equilibrium``. - **download_if_missing** (optional, default=True): If ``False``, raise a :exc:`IOError` if the data is not locally available instead of trying to download the data from the source site. ### Returns - **dataset** (dict-like object): An object with the following attributes: - **topology** (filename): Filename of the topology file. - **trajectory** (filename): Filename of the trajectory file. - **DESCR** (string): Description of the trajectory. ### See Also See :ref:`yiip-equilibrium-dataset` for a more detailed description. ``` -------------------------------- ### Download and Unpack Data Archive Source: https://www.mdanalysis.org/MDAnalysisData/_modules/MDAnalysisData/adk_transitions.html This function handles downloading a tar.gz archive if it's missing and then unpacks it to a specified directory. It ensures that the topology and trajectory files are correctly placed and accessible. ```python import tarfile import glob from os.path import join, exists from os import makedirs import logging from MDAnalysis.utils.download import _fetch_remote from MDAnalysis.utils.misc import get_data_home, Bunch logger = logging.getLogger(__name__) def _read_description(metadata): # Placeholder for actual description reading logic return "Description not available." def load_dataset(metadata, download_if_missing=True, data_home=None, file_type='tarfile'): """ Loads a dataset from a local directory or downloads it if missing. Parameters ---------- metadata : dict Metadata describing the dataset, including archive details and contents. download_if_missing : bool, optional If True, download the data if it is not found locally. Defaults to True. data_home : str, optional The base directory for storing downloaded data. Defaults to MDAnalysis's default data directory. file_type : str, optional The type of archive file. Currently supports 'tarfile'. Defaults to 'tarfile'. Returns ------- dataset : dict-like object with the following attributes: dataset.topology : filename Filename of the topology file dataset.trajectories : list list with filenames of the trajectory ensemble dataset.DESCR : string Description of the ensemble Note ---- Assumptions that are built in: - download a single tar.gz file - trajectories are given with a glob pattern """ name = metadata['NAME'] data_location = join(get_data_home(data_home=data_home), name) if not exists(data_location): makedirs(data_location) records = Bunch() meta = metadata['ARCHIVE']['tarfile'] local_path = join(data_location, meta.filename) if not exists(local_path): if not download_if_missing: raise IOError("Data {0}={1} not found and `download_if_missing` is " "False".format(file_type, local_path)) logger.info("Downloading {0}: {1} -> {2}...".format( "tarfile", meta.url, local_path)) archive_path = _fetch_remote(meta, dirname=data_location) logger.info("Unpacking {}...".format(archive_path)) with tarfile.open(archive_path, 'r') as tar: tar.extractall(path=data_location) records.topology = join(data_location, metadata['CONTENTS']['topology']) if not exists(records.topology): # should not happen... raise RuntimeError("topology file {} is missing".format(records.topology)) trajectory_pattern = join(data_location, metadata['CONTENTS']['trajectories']) records.trajectories = glob.glob(trajectory_pattern) records.N_trajectories = metadata['CONTENTS']['N_trajectories'] if len(records.trajectories) != records.N_trajectories: # should not happen... raise RuntimeError("trajectory files in {0} are incomplete: only {1} " "but should be {2}.".format( trajectory_pattern, len(records.trajectories), records.N_trajectories)) records.DESCR = _read_description(metadata['DESCRIPTION']) return records ``` -------------------------------- ### Fetch AdK Equilibrium Trajectory Data Source: https://www.mdanalysis.org/MDAnalysisData/_modules/MDAnalysisData/adk_equilibrium.html Loads the AdK 1us equilibrium trajectory (without water). Specify `data_home` to set a custom cache folder. Set `download_if_missing` to `False` to prevent automatic downloads. ```python # -*- coding: utf-8 -*- """AdK equilibrium trajectory without water. https://figshare.com/articles/Molecular_dynamics_trajectory_for_benchmarking_MDAnalysis/5108170/1 """ from os.path import dirname, exists, join from os import makedirs, remove import logging from .base import get_data_home from .base import _fetch_remote, _read_description from .base import RemoteFileMetadata from .base import Bunch NAME = "adk_equilibrium" DESCRIPTION = "adk_equilibrium.rst" # The original data can be found at the figshare URL. # The SHA256 checksum of the zip file changes with every download so we # cannot check its checksum. Instead we download individual files. # separately. The keys of this dict are also going to be the keys in the # Bunch that is returned. ARCHIVE = { 'topology': RemoteFileMetadata( filename='adk4AKE.psf', url='https://ndownloader.figshare.com/files/8672230', checksum='1aa947d58fb41b6805dc1e7be4dbe65c6a8f4690f0bd7fc2ae03e7bd437085f4', ), 'trajectory': RemoteFileMetadata( filename='1ake_007-nowater-core-dt240ps.dcd', url='https://ndownloader.figshare.com/files/8672074', checksum='598fcbcfcc425f6eafbe9997238320fcacc6a4613ecce061e1521732bab734bf', ), } logger = logging.getLogger(__name__) [docs] def fetch_adk_equilibrium(data_home=None, download_if_missing=True): """Load the AdK 1us equilibrium trajectory (without water) Parameters ---------- data_home : optional, default: None Specify another download and cache folder for the datasets. By default all MDAnalysisData data is stored in '~/MDAnalysis_data' subfolders. This dataset is stored in ``/adk_equilibrium``. download_if_missing : optional, default=True If ``False``, raise a :exc:`IOError` if the data is not locally available instead of trying to download the data from the source site. Returns ------- dataset : dict-like object with the following attributes: dataset.topology : filename Filename of the topology file dataset.trajectory : filename Filename of the trajectory file dataset.DESCR : string Description of the trajectory. See :ref:`adk-equilibrium-dataset` for description. """ name = NAME data_location = join(get_data_home(data_home=data_home), name) if not exists(data_location): makedirs(data_location) records = Bunch() for file_type, meta in ARCHIVE.items(): local_path = join(data_location, meta.filename) records[file_type] = local_path if not exists(local_path): if not download_if_missing: raise IOError("Data {0}={1} not found and `download_if_missing` is " "False".format(file_type, local_path)) logger.info("Downloading {0}: {1} -> {2}...".format( file_type, meta.url, local_path)) archive_path = _fetch_remote(meta, dirname=data_location) records.DESCR = _read_description(DESCRIPTION) return records ``` -------------------------------- ### Read Description File Source: https://www.mdanalysis.org/MDAnalysisData/_modules/MDAnalysisData/base.html Reads a description from a restructured text file located in the 'descr' directory. Assumes UTF-8 encoding. ```python def _read_description(filename, description_dir='descr'): """Read the description from restructured text file. Arguments --------- filename : str name of the description file under the ``descr`` directory Note ---- All description files are supposed to be stored in the directory `description_dir` ``="descr"`` that lives in the same directory as the :mod:`MDAnalysisData.base` module file. All descriptions are assumed to be in restructured text format and in UTF-8 encoding. """ # The descr directory should be in the same directory as this file base.py. # `resource_string` returns bytes, which we need to decode to UTF-8 path = importlib.resources.files('MDAnalysisData') / description_dir / filename DESCR = path.read_bytes().decode("utf-8") return DESCR ``` -------------------------------- ### Fetch membrane peptide dataset Source: https://www.mdanalysis.org/MDAnalysisData/_modules/MDAnalysisData/membrane_peptide.html Use this function to download and load the helical peptide in DMPC membrane equilibrium trajectory. Specify `data_home` to set a custom download location or `download_if_missing=False` to prevent automatic downloads. ```python from MDAnalysisData.membrane_peptide import fetch_membrane_peptide # Load the dataset, downloading if necessary dataset = fetch_membrane_peptide() # Access the topology and trajectory files topology_file = dataset.topology trajectory_file = dataset.trajectory # Access the description description = dataset.DESCR print(f"Topology file: {topology_file}") print(f"Trajectory file: {trajectory_file}") print(f"Description: {description}") ``` -------------------------------- ### Access Dataset Description Source: https://www.mdanalysis.org/MDAnalysisData/usage.html Prints the human-readable description of the fetched dataset, including its characteristics and origin. ```python >>> print(adk.DESCR) ```