### Yet Another Numerical Output Example Source: https://github.com/matchms/ms2deepscore/blob/main/notebooks/model_benchmarking/optimize_sampling_algorithm_settings.ipynb This snippet shows a further numerical output, continuing the pattern of performance metrics being logged. ```text 0.8111033333364384 ``` -------------------------------- ### Another Numerical Output Example Source: https://github.com/matchms/ms2deepscore/blob/main/notebooks/model_benchmarking/optimize_sampling_algorithm_settings.ipynb This snippet displays another numerical output, potentially a different metric or a result from a varied configuration. ```text 0.7509776432384816 ``` -------------------------------- ### Download and Unzip Data Source: https://github.com/matchms/ms2deepscore/blob/main/notebooks/case_studies_ms2deepscore_2/Blood_case_study/Create_molecular_networks.ipynb Downloads necessary files from Zenodo and unzips the case study data. Ensure the 'requests', 'tqdm', and 'zipfile' libraries are installed. ```python import requests from tqdm import tqdm import zipfile import os def unzip_file(zip_path, extract_to="."): """Unzips a zip file to the specified directory.""" with zipfile.ZipFile(zip_path, 'r') as zip_ref: zip_ref.extractall(extract_to) print(f"Unzipped {zip_path} to {extract_to}") def download_file(link, file_name): response = requests.get(link, stream=True) if os.path.exists(file_name): print(f"The file {file_name} already exists, the file won't be downloaded") return total_size = int(response.headers.get('content-length', 0)) with open(file_name, "wb") as f, tqdm(desc="Downloading file", total=total_size, unit='B', unit_scale=True, unit_divisor=1024,) as bar: for chunk in response.iter_content(chunk_size=1024): if chunk: f.write(chunk) bar.update(len(chunk)) # Update progress bar by the chunk size model_file_name = "ms2deepscore_model.pt" case_study_spectra_folder = "Blood_case_study" download_file("https://zenodo.org/records/14290920/files/settings.json?download=1", "ms2deepscore_settings.json") download_file("https://zenodo.org/records/14290920/files/ms2deepscore_model.pt?download=1", model_file_name) download_file("https://zenodo.org/records/16311735/files/Blood plasma case study.zip?download=1", "Blood_plasma_case_study.zip") unzip_file("Blood_plasma_case_study.zip", extract_to="Blood_case_study") ``` -------------------------------- ### Numerical Output Example Source: https://github.com/matchms/ms2deepscore/blob/main/notebooks/model_benchmarking/optimize_sampling_algorithm_settings.ipynb This snippet shows a numerical output, likely a performance score or metric, obtained from a model benchmarking process. ```text 0.6714763613634331 ``` -------------------------------- ### Install MatchMS and MS2DeepScore via Conda Source: https://github.com/matchms/ms2deepscore/blob/main/README.md Install matchms from bioconda and conda-forge channels, followed by MS2DeepScore via pip. ```bash conda create --name ms2deepscore python=3.12 conda activate ms2deepscore conda install --channel bioconda --channel conda-forge matchms pip install ms2deepscore ``` -------------------------------- ### Install MS2DeepScore via Pip Source: https://github.com/matchms/ms2deepscore/blob/main/README.md Install the MS2DeepScore package using pip within your activated environment. ```bash pip install ms2deepscore ``` -------------------------------- ### Figure Output Example Source: https://github.com/matchms/ms2deepscore/blob/main/notebooks/model_benchmarking/optimize_sampling_algorithm_settings.ipynb This snippet represents the output of a figure, typically a plot or visualization generated during the benchmarking process. ```text
``` -------------------------------- ### Load Data and Predictions for Comparison Source: https://github.com/matchms/ms2deepscore/blob/main/notebooks/ms2deepscore_2_preprint/compare_with_cosine.ipynb Loads true values, validation spectra, and pre-calculated cosine and MS2DeepScore predictions from specified file paths. This setup is crucial for subsequent performance evaluation. ```python import os from ms2deepscore.MS2DeepScore import MS2DeepScore from ms2deepscore.models.load_model import load_model from matchms.importing.load_from_mgf import load_from_mgf from ms2deepscore.utils import load_pickled_file data_folder = os.path.join("../../../data/pytorch/gnps_21_08_23_min_5_at_5_percent/") both_mode_folder = "trained_models/both_mode_precursor_mz_2000_2000_2000_layers_500_embedding_2024_01_31_11_51_10/" # True values # pos_true_values = load_pickled_file(os.path.join(data_folder, both_mode_folder, "benchmarking_results", "positive_positive_true_values.pickle")) # neg_true_values = load_pickled_file(os.path.join(data_folder, both_mode_folder, "benchmarking_results", "negative_negative_true_values.pickle")) neg_pos_true_values = load_pickled_file(os.path.join(data_folder, both_mode_folder, "benchmarking_results", "negative_positive_true_values.pickle")) # Validation spectra pos_spectra = list(load_from_mgf(os.path.join(data_folder, "training_and_validation_split", "positive_validation_spectra.mgf"))) neg_spectra = list(load_from_mgf(os.path.join(data_folder, "training_and_validation_split", "negative_validation_spectra.mgf"))) # Cosine predictions # neg_cosine_predictions = load_pickled_file(os.path.join(data_folder, "training_and_validation_split", "cosine_scores", "negative_scores.pickle"))["score"] # pos_cosine_predictions = load_pickled_file(os.path.join(data_folder, "training_and_validation_split", "cosine_scores", "positive_scores.pickle"))["score"] neg_pos_cosines_predictions = load_pickled_file(os.path.join(data_folder, "training_and_validation_split", "cosine_scores", "negative_positive_scores.pickle"))["score"] # Mod Cosine predictions # neg_mod_cosine_predictions = load_pickled_file(os.path.join(data_folder, "training_and_validation_split", "modified_cosine_scores", "negative_scores.pickle"))["score"] # pos_mod_cosine_predictions = load_pickled_file(os.path.join(data_folder, "training_and_validation_split", "modified_cosine_scores", "positive_scores.pickle"))["score"] neg_pos_mod_cosines_predictions = load_pickled_file(os.path.join(data_folder, "training_and_validation_split", "cosine_scores", "negative_positive_scores.pickle"))["score"] # Both models predictions # pos_predictions_both_mode = load_pickled_file(os.path.join(data_folder, both_mode_folder, "benchmarking_results", "positive_positive_predictions.pickle")) # neg_predictions_both_mode = load_pickled_file(os.path.join(data_folder, both_mode_folder, "benchmarking_results", "negative_negative_predictions.pickle")) neg_pos_predictions_both_mode = load_pickled_file(os.path.join(data_folder, both_mode_folder, "benchmarking_results", "negative_positive_predictions.pickle")) # Pos mode predictions pos_mode_folder = "trained_models/positive_mode_precursor_mz_2000_2000_2000_layers_500_embedding_2024_02_07_10_27_04/" # pos_predictions_pos_mode = load_pickled_file(os.path.join(data_folder, pos_mode_folder, "benchmarking_results", "positive_positive_predictions.pickle")) # neg_predictions_pos_mode = load_pickled_file(os.path.join(data_folder, pos_mode_folder, "benchmarking_results", "negative_negative_predictions.pickle")) neg_pos_predictions_pos_mode = load_pickled_file(os.path.join(data_folder, pos_mode_folder, "benchmarking_results", "negative_positive_predictions.pickle")) # neg mode predictions neg_mode_folder = "trained_models/negative_mode_precursor_mz_2000_2000_2000_layers_500_embedding_2024_02_07_11_53_37/" # pos_predictions_neg_mode = load_pickled_file(os.path.join(data_folder, neg_mode_folder, "benchmarking_results", "positive_positive_predictions.pickle")) # neg_predictions_neg_mode = load_pickled_file(os.path.join(data_folder, neg_mode_folder, "benchmarking_results", "negative_negative_predictions.pickle")) neg_pos_predictions_neg_mode = load_pickled_file(os.path.join(data_folder, neg_mode_folder, "benchmarking_results", "negative_positive_predictions.pickle")) ``` -------------------------------- ### Initialize Scoring Objects Source: https://github.com/matchms/ms2deepscore/blob/main/notebooks/model_benchmarking/Compare balanced cross ion mode sampling.ipynb Initializes CalculateScoresBetweenAllIonmodes objects for both a normal and a balanced model. This setup is crucial for comparing scoring performance. ```python from ms2deepscore.benchmarking.CalculateScoresBetweenAllIonmodes import CalculateScoresBetweenAllIonmodes scores_normal_model = CalculateScoresBetweenAllIonmodes(normal_model_file_name, pos_test, neg_test, fingerprint_type="daylight", n_bits_fingerprint=4096) scores_balanced_model = CalculateScoresBetweenAllIonmodes(balanced_model_file_name, pos_test, neg_test, fingerprint_type="daylight", n_bits_fingerprint=4096) ``` -------------------------------- ### Load Data and Predictions for Comparison Source: https://github.com/matchms/ms2deepscore/blob/main/notebooks/ms2deepscore_2_preprint/compare_with_cosine.ipynb Loads true values, validation spectra, and pre-calculated cosine and MS2DeepScore predictions from pickle and mgf files. This setup is essential for evaluating different scoring methods. ```python import os from ms2deepscore.MS2DeepScore import MS2DeepScore from ms2deepscore.models.load_model import load_model from matchms.importing.load_from_mgf import load_from_mgf from ms2deepscore.utils import load_pickled_file data_folder = os.path.join("../../../data/pytorch/gnps_21_08_23_min_5_at_5_percent/") both_mode_folder = "trained_models/both_mode_precursor_mz_ionmode_2000_2000_2000_layers_500_embedding_2024_01_31_11_51_10/" # True values pos_true_values = load_pickled_file(os.path.join(data_folder, both_mode_folder, "benchmarking_results", "positive_positive_true_values.pickle")) # neg_true_values = load_pickled_file(os.path.join(data_folder, both_mode_folder, "benchmarking_results", "negative_negative_true_values.pickle")) # neg_pos_true_values = load_pickled_file(os.path.join(data_folder, both_mode_folder, "benchmarking_results", "negative_positive_true_values.pickle")) # Validation spectra pos_spectra = list(load_from_mgf(os.path.join(data_folder, "training_and_validation_split", "positive_validation_spectra.mgf"))) # neg_spectra = list(load_from_mgf(os.path.join(data_folder, "training_and_validation_split", "negative_validation_spectra.mgf"))) # Cosine predictions # neg_cosine_predictions = load_pickled_file(os.path.join(data_folder, "training_and_validation_split", "cosine_scores", "negative_scores.pickle"))["score"] pos_cosine_predictions = load_pickled_file(os.path.join(data_folder, "training_and_validation_split", "cosine_scores", "positive_scores.pickle"))["score"] # neg_pos_cosines_predictions = load_pickled_file(os.path.join(data_folder, "training_and_validation_split", "cosine_scores", "negative_positive_scores.pickle"))["score"] # Mod Cosine predictions # neg_mod_cosine_predictions = load_pickled_file(os.path.join(data_folder, "training_and_validation_split", "modified_cosine_scores", "negative_scores.pickle"))["score"] pos_mod_cosine_predictions = load_pickled_file(os.path.join(data_folder, "training_and_validation_split", "modified_cosine_scores", "positive_scores.pickle"))["score"] # neg_pos_mod_cosines_predictions = load_pickled_file(os.path.join(data_folder, "training_and_validation_split", "cosine_scores", "negative_positive_scores.pickle"))["score"] # Both models predictions pos_predictions_both_mode = load_pickled_file(os.path.join(data_folder, both_mode_folder, "benchmarking_results", "positive_positive_predictions.pickle")) # neg_predictions_both_mode = load_pickled_file(os.path.join(data_folder, both_mode_folder, "benchmarking_results", "negative_negative_predictions.pickle")) # neg_pos_predictions_both_mode = load_pickled_file(os.path.join(data_folder, both_mode_folder, "benchmarking_results", "negative_positive_predictions.pickle")) # Pos mode predictions pos_mode_folder = "trained_models/positive_mode_precursor_mz_2000_2000_2000_layers_500_embedding_2024_02_07_10_27_04/" pos_predictions_pos_mode = load_pickled_file(os.path.join(data_folder, pos_mode_folder, "benchmarking_results", "positive_positive_predictions.pickle")) # neg_predictions_pos_mode = load_pickled_file(os.path.join(data_folder, pos_mode_folder, "benchmarking_results", "negative_negative_predictions.pickle")) # neg_pos_predictions_pos_mode = load_pickled_file(os.path.join(data_folder, pos_mode_folder, "benchmarking_results", "negative_positive_predictions.pickle")) # neg mode predictions neg_mode_folder = "trained_models/negative_mode_precursor_mz_2000_2000_2000_layers_500_embedding_2024_02_07_11_53_37/" pos_predictions_neg_mode = load_pickled_file(os.path.join(data_folder, neg_mode_folder, "benchmarking_results", "positive_positive_predictions.pickle")) # neg_predictions_neg_mode = load_pickled_file(os.path.join(data_folder, neg_mode_folder, "benchmarking_results", "negative_negative_predictions.pickle")) # neg_pos_predictions_neg_mode = load_pickled_file(os.path.join(data_folder, neg_mode_folder, "benchmarking_results", "negative_positive_predictions.pickle")) ``` -------------------------------- ### Download and Unzip Spectral Data Source: https://github.com/matchms/ms2deepscore/blob/main/notebooks/case_studies_ms2deepscore_2/Rumex_case_study/case_study_mol_network_Rumex.ipynb Downloads necessary model files, settings, and spectral data for the Rumex case study. It then unzips the spectral data to the specified directory. Ensure 'requests' and 'tqdm' are installed. ```python import os import zipfile def unzip_file(zip_path, extract_to="."): """Unzips a zip file to the specified directory.""" with zipfile.ZipFile(zip_path, 'r') as zip_ref: zip_ref.extractall(extract_to) print(f"Unzipped {zip_path} to {extract_to}") def download_file(link, file_name): response = requests.get(link, stream=True) if os.path.exists(file_name): print(f"The file {file_name} already exists, the file won't be downloaded") return total_size = int(response.headers.get('content-length', 0)) with open(file_name, "wb") as f, tqdm(desc="Downloading file", total=total_size, unit='B', unit_scale=True, unit_divisor=1024,) as bar: for chunk in response.iter_content(chunk_size=1024): if chunk: f.write(chunk) bar.update(len(chunk)) # Update progress bar by the chunk size model_file_name = "ms2deepscore_model.pt" case_study_spectra_folder = "" download_file("https://zenodo.org/records/14290920/files/settings.json?download=1", "ms2deepscore_settings.json") download_file("https://zenodo.org/records/14290920/files/ms2deepscore_model.pt?download=1", model_file_name) download_file("https://zenodo.org/records/16311735/files/Rumex_case_study.zip?download=1", "Rumex_case_study.zip") unzip_file("Rumex_case_study.zip", extract_to=case_study_spectra_folder) ``` -------------------------------- ### Plotting Negative Prediction Distribution Source: https://github.com/matchms/ms2deepscore/blob/main/notebooks/ms2deepscore_2_preprint/positive_vs_negative_examples.ipynb Generates a histogram to visualize the distribution of predictions for negative examples within the combined dataset. Requires matplotlib. ```python plt.hist(neg_in_combined_predictions) ``` -------------------------------- ### Search Progress Indicator Source: https://github.com/matchms/ms2deepscore/blob/main/notebooks/ms2deepscore_2_preprint/positive_vs_negative_examples.ipynb This output indicates that a search operation has started and is currently at 0 iterations, with no time elapsed. It suggests the search process has just begun. ```text 0it [00:00, ?it/s] ``` -------------------------------- ### Example SMILES String and Associated Data Source: https://github.com/matchms/ms2deepscore/blob/main/notebooks/ms2deepscore_2_preprint/positive_vs_negative_examples.ipynb Displays a SMILES string representing a molecule, its classification (positive/negative), and associated numerical identifiers and precursor ions. ```text CC1(C)CC[C@]2(C(=O)O[C@@H]3O[C@H](CO)[C@@H](O)[C@H](O)[C@H]3O)CC[C@]3(C)C(=CCC4[C@@]5(C)C[C@H](O)[C@H](O[C@@H]6O[C@H](C(=O)O)[C@@H](O)[C@H](O)[C@H]6O)[C@](C)(C=O)C5CC[C@]43C)[C@@H]2C1 positive negative 1486 159035 [M-H]- [M+K]+ ``` ```text C[C@@H](C1=C[C@@]2(C=C[C@H]3[C@@H]2[C@@H](OC=C3C(=O)OC)O[C@H]4[C@@H]([C@H]([C@@H]([C@H](O4)CO)O)O)O)OC1=O)O positive negative 1546 164206 [M-H]- [M+H]+ ``` ```text C[C@@H]1O[C@@H](OC[C@H]2O[C@@H](Oc3c(-c4ccc(O)c(O)c4)oc4cc(O)cc(O)c4c3=O)[C@H](O)[C@@H](O)[C@@H]2O)[C@H](O)[C@H](O)[C@H]1O negative positive 7181 201967 [M+H]+ [M-H]- ``` ```text CC(C)(O)CCC(=O)C(C)(O)C1C(O)CC2(C)C3CC=C4C(CC(O)C(OC5OC(CO)C(O)C(O)C5O)C4(C)C)C3(C)C(=O)CC12C negative positive 429 1186 [M+NH4]+ [M+FA-H]- ``` ```text CC(C)CC(=O)OC[C@@]1(O)C[C@@]23CC[C@H]4[C@@](C)(CCC[C@@]4(C)C(=O)O)[C@@H]2CC[C@@H]1C3 negative positive 551 28593 [M-H2O+H]+ [M-H]- ``` ```text C[C@H]1C/C=C/[C@H]2[C@@H]3O[C@]3(C)[C@@H](C)[C@H]4[C@H](CC5=CC=CC=C5)NC(=O)[C@@]24OC(=O)O/C=C/[C@@](C)(O)C1=O negative positive 599 1930 [M+H]+ [M+FA-H]- ``` ```text CCCCCC(O)CC(O)CC(=O)OC(CCCCC)CC(O)CC(=O)OC(CCCCC)CC(O)CC(=O)OC(CCCCC)CC(O)CC(O)=O negative positive 639 1534 [M-H2O+H]+ [M-H]- ``` ```text CCCCCC(=O)OC(CC(=O)[O-])C[N+](C)(C)C negative positive 1410 7422 [M+H]+ [M-H]- ``` ```text CCCCCCCCCC(=O)OC(CC(=O)[O-])C[N+](C)(C)C negative positive 1411 30856 [M+H]+ [M-H]- ``` ```text C[C@H]1C/C=C/[C@H]2[C@@H]3O[C@]3(C)[C@@H](C)[C@H]4[C@H](CC5=CC=CC=C5)NC(=O)[C@@]24OC(=O)O/C=C/[C@@](C)(O)C1=O negative positive 1487 1930 [M-H2O+H]+ [M+FA-H]- ``` ```text COC(=O)CCC(C)C1CCC2C3CC[C@@H]4C[C@H](O)CC[C@]4(C)C3C[C@H](O)[C@]12C negative positive 1492 24881 [M-H2O+H]+ [M-H]- ``` ```text COc1ccc(-c2oc3cc(O)cc(O)c3c(=O)c2O)cc1O negative positive 1584 11203 [M+H]+ [M-H]- ``` -------------------------------- ### Setup for Plotting Comparison Violinplots Source: https://github.com/matchms/ms2deepscore/blob/main/notebooks/model_benchmarking/Comparison_to_single_ion_mode_models.ipynb Initializes a figure and axes for creating violin plots to compare score distributions across multiple panels. It sets up shared x-axes and specific height ratios for subplots. ```python from scipy.stats import gaussian_kde import matplotlib.pyplot as plt import numpy as np from ms2deepscore.utils import create_evenly_spaced_bins from matplotlib import pyplot as plt from ms2deepscore.benchmarking.CalculateScoresBetweenAllIonmodes import CalculateScoresBetweenAllIonmodes from ms2deepscore.utils import create_evenly_spaced_bins import pandas as pd def get_predictions_per_bin(predictions_and_tanimoto_scores, average_per_inchikey_pair: pd.DataFrame, tanimoto_bins: np.ndarray): """Compute average loss per Tanimoto score bin Parameters ---------- average_per_inchikey_pair Precalculated average (prediction or loss) per inchikey pair ref_score_bins Bins for the reference score to evaluate the performance of scores. in the form [(0.0, 0.1), (0.1, 0.2) ...] """ average_predictions = average_per_inchikey_pair.to_numpy() sorted_bins = sorted(tanimoto_bins, key=lambda b: b[0]) bins = [bin_pair[0] for bin_pair in sorted_bins] bins.append(sorted_bins[-1][1]) digitized = np.digitize(predictions_and_tanimoto_scores.tanimoto_df, bins, right=True) predictions_per_bin = [] for i, bin_edges in tqdm(enumerate(sorted_bins), desc="Selecting available inchikey pairs per bin"): row_idxs, col_idxs = np.where(digitized == i+ 1) predictions_in_this_bin = average_predictions[row_idxs, col_idxs] predictions_in_this_bin_not_nan = predictions_in_this_bin[~np.isnan(predictions_in_this_bin)] predictions_per_bin.append(predictions_in_this_bin_not_nan) return predictions_per_bin def plot_comparison_violinplot_three_panels( list_a, list_b, nr_of_bins: int ): bins = create_evenly_spaced_bins(nr_of_bins) bin_labels = [f"{a:.1f}–<{b:.1f}" for (a, b) in bins] n_panels = 3 assert len(list_a) == n_panels and len(list_b) == n_panels, f"Expected {n_panels} sets of scores in each input" fig, axes = plt.subplots( 2, n_panels, figsize=(5 * n_panels, 8), sharex='col', gridspec_kw={'height_ratios': [1, 4]}, constrained_layout=True ) ``` -------------------------------- ### Performing Library Search for Positive/Negative Examples Source: https://github.com/matchms/ms2deepscore/blob/main/notebooks/ms2deepscore_2_preprint/positive_vs_negative_examples.ipynb This function searches a library using provided embeddings and spectra. It's used here to compare different combinations of training and validation data. ```python library_search(combined_training_embeddings, combined_training_spectra, neg_validation_embeddings, neg_validation_spectra) ``` ```python library_search(combined_training_embeddings, combined_training_spectra, pos_validation_embeddings, pos_validation_spectra) ``` ```python library_search(neg_training_embeddings, neg_training_spectra, pos_validation_embeddings, pos_validation_spectra) ``` -------------------------------- ### Get Tanimoto Score Between Spectra Pairs Source: https://github.com/matchms/ms2deepscore/blob/main/notebooks/ms2deepscore_2_preprint/positive_vs_negative_examples.ipynb Calculates the Tanimoto score between pairs of spectra using pre-computed embeddings. Requires the `get_tanimoto_score_between_spectra` function from `ms2deepscore.benchmarking.calculate_scores_for_validation`. ```python from ms2deepscore.benchmarking.calculate_scores_for_validation import get_tanimoto_score_between_spectra def get_pairs_tanimoto_score(spectrum_pairs): tanimoto_scores = [] ref_spectra = [spectra[0] for spectra in spectrum_pairs] query_spectra = [spectra[1] for spectra in spectrum_pairs] tanimoto_scores = get_tanimoto_score_between_spectra(ref_spectra, query_spectra) return np.diagonal(tanimoto_scores) ``` -------------------------------- ### Perform Library Search and Get Predictions Source: https://github.com/matchms/ms2deepscore/blob/main/notebooks/ms2deepscore_2_preprint/positive_vs_negative_examples.ipynb Executes the library search function with combined training embeddings and positive validation embeddings to obtain prediction indices and scores. ```python pos_in_combined_idx, pos_in_combined_predictions = library_search(combined_training_embeddings, pos_validation_embeddings) ``` -------------------------------- ### Calculate Tanimoto Scores for Other Pairs Source: https://github.com/matchms/ms2deepscore/blob/main/notebooks/ms2deepscore_2_preprint/positive_vs_negative_examples.ipynb Use this function to compute Tanimoto scores for a given set of spectral pairs. This is useful for evaluating similarity between spectra, especially when dealing with negative examples or dissimilar compounds. ```python tanimoto_scores_other_pairs = get_pairs_tanimoto_score(other_pairs) ``` -------------------------------- ### Get Unique InChIKeys Function Source: https://github.com/matchms/ms2deepscore/blob/main/notebooks/ms2deepscore_2_preprint/positive_vs_negative_examples.ipynb This Python function `get_unique_inchikeys` extracts the first 14 characters of each InChIKey from a list of spectra and returns a set of unique InChIKeys. It's used to process spectral data for analysis. ```python def get_unique_inchikeys(spectra): inchikey_list = [] for spectrum in spectra: inchikey = spectrum.get("inchikey")[:14] inchikey_list.append(inchikey) return set(inchikey_list) inchikeys_pos_val = get_unique_inchikeys(pos_validation_spectra) inchikeys_neg_val = get_unique_inchikeys(neg_validation_spectra) inchikeys_pos_train = get_unique_inchikeys(pos_training_spectra) inchikeys_neg_train = get_unique_inchikeys(neg_training_spectra) ``` -------------------------------- ### Print Hello World Source: https://github.com/matchms/ms2deepscore/blob/main/notebooks/ms2deepscore_2_preprint/Training_spectra_filtering.ipynb A basic Python script to print 'hello'. ```python print("hello") ``` -------------------------------- ### Load Spectra from MGF Source: https://github.com/matchms/ms2deepscore/blob/main/notebooks/ms2deepscore_2_preprint/Mean_prediction_for_exact_matches.ipynb Loads spectral data from a MGF file. This is used to get the list of validation spectra. ```python from matchms.importing.load_from_mgf import load_from_mgf val_spectra = list(load_from_mgf(os.path.join(results_dir, "./positive_validation_spectra.mgf"))) print(len(val_spectra)) ``` -------------------------------- ### Download Model, Spectra, and Annotations Source: https://github.com/matchms/ms2deepscore/blob/main/notebooks/case_studies_ms2deepscore_2/Urine_case_study/visualize_embedding_umap.ipynb Downloads necessary files for the case study, including the MS2DeepScore model, spectra, and annotations. It checks if files already exist to avoid re-downloading. ```python import requests import os from tqdm import tqdm def download_file(link, file_name): response = requests.get(link, stream=True) if os.path.exists(file_name): print(f"The file {file_name} already exists, the file won't be downloaded") return total_size = int(response.headers.get('content-length', 0)) with open(file_name, "wb") as f, tqdm(desc="Downloading file", total=total_size, unit='B', unit_scale=True, unit_divisor=1024,) as bar: for chunk in response.iter_content(chunk_size=1024): if chunk: f.write(chunk) bar.update(len(chunk)) # Update progress bar by the chunk size model_file_name = "ms2deepscore_model.pt" case_study_spectra_file_name = "case_study_spectra.mgf" ms2query_annotations = "ms2query_annotations.csv" download_file("https://zenodo.org/records/14290920/files/settings.json?download=1", "ms2deepscore_settings.json") download_file("https://zenodo.org/records/14290920/files/ms2deepscore_model.pt?download=1", model_file_name) download_file("https://zenodo.org/records/14535374/files/cleaned_spectra_pos_neg_with_numbering.mgf?download=1", case_study_spectra_file_name) download_file("https://zenodo.org/records/14535374/files/ms2query_annotations.csv?download=1", ms2query_annotations) ``` -------------------------------- ### Download Data Files Source: https://github.com/matchms/ms2deepscore/blob/main/notebooks/case_studies_ms2deepscore_2/Urine_case_study/Create_molecular_networks.ipynb Downloads essential files for the case study, including the MS2DeepScore model and spectral data, from Zenodo. It checks if files already exist to avoid redundant downloads. ```python import requests import os from tqdm import tqdm def download_file(link, file_name): response = requests.get(link, stream=True) if os.path.exists(file_name): print(f"The file {file_name} already exists, the file won't be downloaded") return total_size = int(response.headers.get('content-length', 0)) with open(file_name, "wb") as f, tqdm(desc="Downloading file", total=total_size, unit='B', unit_scale=True, unit_divisor=1024,) as bar: for chunk in response.iter_content(chunk_size=1024): if chunk: f.write(chunk) bar.update(len(chunk)) # Update progress bar by the chunk size model_file_name = "ms2deepscore_model.pt" case_study_spectra_file_name = "case_study_spectra.mgf" download_file("https://zenodo.org/records/14290920/files/settings.json?download=1", "ms2deepscore_settings.json") download_file("https://zenodo.org/records/14290920/files/ms2deepscore_model.pt?download=1", model_file_name) download_file("https://zenodo.org/records/14535374/files/cleaned_spectra_pos_neg_with_numbering.mgf?download=1", case_study_spectra_file_name) ``` -------------------------------- ### Create and Activate Conda Environment Source: https://github.com/matchms/ms2deepscore/blob/main/README.md Use this snippet to create a new Anaconda environment with Python 3.12 and activate it for MS2DeepScore installation. ```bash conda create --name ms2deepscore python=3.12 conda activate ms2deepscore ``` -------------------------------- ### Load Spectra and Models for Benchmarking Source: https://github.com/matchms/ms2deepscore/blob/main/notebooks/model_benchmarking/EmbeddingEvaluator_benchmarking.ipynb Loads positive and negative test spectra from MGF files and initializes the MS2DeepScore model and embedding evaluator. Ensure the necessary model and spectra files are downloaded prior to execution. ```python import os from ms2deepscore.MS2DeepScore import MS2DeepScore from ms2deepscore.models.load_model import load_model from matchms.importing.load_from_mgf import load_from_mgf from ms2deepscore.utils import load_pickled_file from ms2deepscore.models.load_model import load_model, load_embedding_evaluator # Validation spectra pos_test_spectra = list(tqdm(load_from_mgf(pos_test_spectra_file_name))) neg_test_spectra = list(tqdm(load_from_mgf(neg_test_spectra_file_name))) ms2ds_model = load_model(model_file_name) ms2ds_embedding_evaluator = load_embedding_evaluator(embedding_model) ``` -------------------------------- ### Load Case Study Spectra from MGF Source: https://github.com/matchms/ms2deepscore/blob/main/notebooks/case_studies_ms2deepscore_2/Blood_case_study/Create_molecular_networks.ipynb Loads the previously saved MGF file containing the case study spectra. Requires the 'matchms' library. ```python from matchms.importing.load_from_mgf import load_from_mgf import os all_spectra = list(load_from_mgf(os.path.join(case_study_spectra_folder, "Blood plasma case study", "blood_case_study_spectra.mgf"))) ``` -------------------------------- ### Load Spectra from MGF Files Source: https://github.com/matchms/ms2deepscore/blob/main/notebooks/model_benchmarking/Modified_cosine_heatmap.ipynb Loads positive and negative test spectra from MGF files using match.ms. It utilizes tqdm for progress visualization during loading. ```python from matchms.importing import load_from_mgf from tqdm import tqdm import os pos_test = list(tqdm(load_from_mgf(pos_test_spectra_file_name))) neg_test = list(tqdm(load_from_mgf(neg_test_spectra_file_name))) ``` -------------------------------- ### Importing Libraries Source: https://github.com/matchms/ms2deepscore/blob/main/notebooks/ms2deepscore_2_preprint/Compare_newer_with_older_gnps_spectra.ipynb Imports necessary libraries for data manipulation and spectral analysis. ```python import pandas as pd import numpy as np ``` -------------------------------- ### Assigning Labels to Score Objects Source: https://github.com/matchms/ms2deepscore/blob/main/notebooks/model_benchmarking/Comparison_to_single_ion_mode_models.ipynb Assigns descriptive labels to score objects, likely for use in plot legends. This is a setup step before plotting. ```python scores_neg_model.neg_vs_neg_scores.label = "negative single ion mode model" neg_neg_scores_neg_model.neg_vs_neg_scores.label = "dual ion mode model" ``` -------------------------------- ### Download Spectral Data Source: https://github.com/matchms/ms2deepscore/blob/main/notebooks/model_benchmarking/optimize_sampling_algorithm_settings.ipynb Downloads negative and positive training spectra files in .mgf format if they do not already exist. Uses requests for downloading and tqdm for progress indication. ```python import requests import os from tqdm import tqdm def download_file(link, file_name): response = requests.get(link, stream=True) if os.path.exists(file_name): print(f"The file {file_name} already exists, the file won't be downloaded") return total_size = int(response.headers.get('content-length', 0)) with open(file_name, "wb") as f, tqdm(desc="Downloading file", total=total_size, unit='B', unit_scale=True, unit_divisor=1024,) as bar: for chunk in response.iter_content(chunk_size=1024): if chunk: f.write(chunk) bar.update(len(chunk)) # Update progress bar by the chunk size neg_train_spectra_file_name = "neg_training_spectra.mgf" pos_train_spectra_file_name = "pos_training_spectra.mgf" download_file("https://zenodo.org/records/13934470/files/negative_training_spectra.mgf?download=1", neg_train_spectra_file_name) download_file("https://zenodo.org/records/13934470/files/positive_training_spectra.mgf?download=1", pos_train_spectra_file_name) ``` -------------------------------- ### Create MS2DeepScore Workflow Source: https://github.com/matchms/ms2deepscore/blob/main/notebooks/case_studies_ms2deepscore_2/Blood_case_study/Create_molecular_networks.ipynb Sets up a workflow for computing MS2DeepScore similarity scores. Requires pre-defined model and input MGF file. Logging level can be adjusted. ```python from matchms.Pipeline import Pipeline, create_workflow from ms2deepscore import MS2DeepScore workflow = create_workflow( query_filters=[["require_minimum_number_of_peaks", {"n_required": 3}]] score_computations=[ [MS2DeepScore, {"model": model}], ], ) pipeline = Pipeline(workflow) pipeline.logging_level = "ERROR" # To define the verbosety of the logging report = pipeline.run("blood_case_study_spectra.mgf") ``` -------------------------------- ### Get shape of a NumPy array Source: https://github.com/matchms/ms2deepscore/blob/main/notebooks/ms2deepscore_2_preprint/compare_with_cosine.ipynb This code snippet retrieves the dimensions (shape) of a NumPy array, commonly used to understand the size of data matrices. ```python combined_true_values.shape ``` ```python Result: (4130, 1009) ``` -------------------------------- ### Filter Spectra by InchIKey Prefix Source: https://github.com/matchms/ms2deepscore/blob/main/notebooks/model_benchmarking/Modified_cosine_heatmap.ipynb Filters a list of spectra to exclude those whose InchIKey starts with 'FTXGVGWYYREYSV'. This is a preliminary step before selecting unique spectra. ```python pos_test_spectra = [] for spectrum in pos_test: if spectrum.get("inchikey")[:14] != "FTXGVGWYYREYSV": pos_test_spectra.append(spectrum) ``` -------------------------------- ### Initialize RDKit and Plot Spectrum/Molecule (Positive Mode) Source: https://github.com/matchms/ms2deepscore/blob/main/notebooks/case_studies_ms2deepscore_2/Rumex_case_study/case_study_mol_network_Rumex.ipynb Initializes RDKit and plots a spectrum and its corresponding molecule image for a given positive ion mode feature ID. Requires 'spectra' and 'merged_data' to be pre-defined. ```python from rdkit import Chem from rdkit.Chem import Draw ionmode_feature_id = "pos_2289" spectrum = [spectrum for spectrum in spectra if spectrum.get("ionmode_feature_id") == ionmode_feature_id][0] smiles = merged_data[merged_data["ionmode_feature_id"] == ionmode_feature_id]["smiles"].values[0] fig, ax = spectrum.plot(figsize=(4, 3)) mol = Chem.MolFromSmiles(smiles) # Display molecule image Draw.MolToImage(mol) ``` -------------------------------- ### Download Model and Spectra Files Source: https://github.com/matchms/ms2deepscore/blob/main/notebooks/model_benchmarking/Modified_cosine_heatmap.ipynb Downloads necessary files for MS2 DeepScore benchmarking, including model weights, settings, and test spectra. It checks if files already exist to avoid redundant downloads. ```python import requests import os from tqdm import tqdm def download_file(link, file_name): response = requests.get(link, stream=True) if os.path.exists(file_name): print(f"The file {file_name} already exists, the file won't be downloaded") return total_size = int(response.headers.get('content-length', 0)) with open(file_name, "wb") as f, tqdm(desc="Downloading file", total=total_size, unit='B', unit_scale=True, unit_divisor=1024,) as bar: for chunk in response.iter_content(chunk_size=1024): if chunk: f.write(chunk) bar.update(len(chunk)) # Update progress bar by the chunk size model_file_name = "ms2deepscore_model.pt" neg_test_spectra_file_name = "neg_testing_spectra.mgf" pos_test_spectra_file_name = "pos_testing_spectra.mgf" download_file("https://zenodo.org/records/14290920/files/settings.json?download=1", "ms2deepscore_settings.json") download_file("https://zenodo.org/records/14290920/files/ms2deepscore_model.pt?download=1", model_file_name) download_file("https://zenodo.org/records/13934470/files/negative_testing_spectra.mgf?download=1", neg_test_spectra_file_name) download_file("https://zenodo.org/records/13934470/files/positive_testing_spectra.mgf?download=1", pos_test_spectra_file_name) ``` -------------------------------- ### Initialize RDKit and Plot Spectrum/Molecule (Positive Mode) Source: https://github.com/matchms/ms2deepscore/blob/main/notebooks/case_studies_ms2deepscore_2/Rumex_case_study/case_study_mol_network_Rumex.ipynb Initializes RDKit and plots a spectrum and its corresponding molecule image for a given positive ion mode feature ID. Requires 'spectra' and 'merged_data' to be pre-defined. ```python from rdkit import Chem from rdkit.Chem import Draw ionmode_feature_id = "pos_475" spectrum = [spectrum for spectrum in spectra if spectrum.get("ionmode_feature_id") == ionmode_feature_id][0] smiles = merged_data[merged_data["ionmode_feature_id"] == ionmode_feature_id]["smiles"].values[0] fig, ax = spectrum.plot(figsize=(4, 3)) mol = Chem.MolFromSmiles(smiles) # Display molecule image Draw.MolToImage(mol) ``` -------------------------------- ### Load Spectra from MGF File Source: https://github.com/matchms/ms2deepscore/blob/main/notebooks/case_studies_ms2deepscore_2/Rumex_case_study/case_study_mol_network_Rumex.ipynb Loads spectral data from a specified MGF file using matchms. `tqdm` is used for progress visualization during loading. Ensure the file path is correct. ```python from matchms.importing import load_from_mgf from tqdm import tqdm case_study_spectra_file_name = "./results/combined_neg_pos_rumex_spectra.mgf" spectra = list(tqdm(load_from_mgf(case_study_spectra_file_name))) ``` -------------------------------- ### Sampled Array Reshaping Source: https://github.com/matchms/ms2deepscore/blob/main/notebooks/ms2deepscore_2_preprint/compare_with_cosine.ipynb Reshapes a sampled array using sorted indices, similar to the previous example. This is useful for comparing sampled data against the original structure. ```python sampled_neg_pos_both_mode.flatten()[sorted_idx].reshape(sampled_neg_pos_both_mode.shape) ``` -------------------------------- ### Load Spectra from MGF File Source: https://github.com/matchms/ms2deepscore/blob/main/notebooks/ms2deepscore_2_preprint/Find_bias_in_bad_predictions.ipynb Loads spectral data from a specified MGF file and prints the total number of spectra loaded. Requires the matchms library. ```python from matchms.importing.load_from_mgf import load_from_mgf val_spectra = list(load_from_mgf("./positive_validation_spectra.mgf")) print(len(val_spectra)) ``` -------------------------------- ### Load Spectral Data from MGF Files Source: https://github.com/matchms/ms2deepscore/blob/main/notebooks/case_studies_ms2deepscore_2/Rumex_case_study/case_study_mol_network_Rumex.ipynb Loads positive and negative ion mode spectra from MGF files using matchms. Ensure the file paths are correct. ```python from matchms.importing import load_from_mgf spectra_neg = list(load_from_mgf("./Rumex_case_study/Rumex_neg_20230806_all_samples_GNPS_fbmn_quant.mgf")) spectra_pos = list(load_from_mgf("./Rumex_case_study/Rumex_20230806_all_samples_2_GNPS_fbmn_quant_quant.mgf")) ``` -------------------------------- ### Filter Spectra by InChIKey Prefix Source: https://github.com/matchms/ms2deepscore/blob/main/notebooks/model_benchmarking/EmbeddingEvaluator_benchmarking.ipynb This code snippet filters a list of positive spectra, excluding those whose InChIKey starts with 'FTXGVGWYYREYSV'. This is a preprocessing step before further analysis. ```python pos_spectra = [] for spectrum in pos_test_spectra: if spectrum.get("inchikey")[:14] != "FTXGVGWYYREYSV": pos_spectra.append(spectrum) ``` -------------------------------- ### Define Model and Spectra File Paths Source: https://github.com/matchms/ms2deepscore/blob/main/notebooks/model_benchmarking/Compare balanced cross ion mode sampling.ipynb Sets up the file paths for the MS2DeepScore models trained with different sampling strategies and the test spectra file. Ensure these paths are correct for your environment. ```python from matchms.importing import load_from_mgf from tqdm import tqdm normal_model_file_name = "/lustre/BIF/nobackup/jonge094/ms2deepscore/data/library_22_07_2025/trained_models/both_mode_ionmode_precursor_mz_10000_layers_500_embedding_2025_08_18_14_48_59/ms2deepscore_model.pt" balanced_model_file_name = "/lustre/BIF/nobackup/jonge094/ms2deepscore/data/library_22_07_2025/trained_models/both_mode_ionmode_precursor_mz_10000_layers_500_embedding_2025_08_24_00_04_17/ms2deepscore_model.pt" test_spectra_file = "/lustre/BIF/nobackup/jonge094/ms2deepscore/data/library_22_07_2025/trained_models/test_merged_and_cleaned_libraries_1.mgf" ``` -------------------------------- ### Execute Evaluation and Get Shapes Source: https://github.com/matchms/ms2deepscore/blob/main/notebooks/ms2deepscore_2_preprint/EmbeddingEvaluator.ipynb Applies the spectrum selection and MSE calculation functions to the validation data. This snippet also prints the shape of the MS2DS scores matrix. ```python selected_all_val_spectra = select_one_spectrum_per_inchikey(selected_all_val_spectra) predicted_mse, ms2ds_scores, tanimoto_scores, val_MSEs, val_MAEs = calculate_MSE(selected_all_val_spectra, ms2ds_model, ms2ds_embedding_evaluator) ms2ds_scores.shape ``` -------------------------------- ### Progress Bar Output for Balanced Selection with Different Settings Source: https://github.com/matchms/ms2deepscore/blob/main/notebooks/model_benchmarking/optimize_sampling_algorithm_settings.ipynb Shows the progress of balanced sampling for inchikey pairs per bin when using different `max_pair_resampling` settings. This output confirms the sampling process is running. ```text Balanced sampling of inchikey pairs (per bin): 100%|█████████████████████████████████████████████████████████████████████████████████████████| 188085/188085 [00:05<00:00, 32025.95it/s] Balanced sampling of inchikey pairs (per bin): 100%|█████████████████████████████████████████████████████████████████████████████████████████| 188085/188085 [00:05<00:00, 31595.47it/s] Balanced sampling of inchikey pairs (per bin): 100%|█████████████████████████████████████████████████████████████████████████████████████████| 188085/188085 [00:04<00:00, 38599.67it/s] Balanced sampling of inchikey pairs (per bin): 100%|█████████████████████████████████████████████████████████████████████████████████████████| 188085/188085 [00:05<00:00, 31645.88it/s] Balanced sampling of inchikey pairs (per bin): 100%|█████████████████████████████████████████████████████████████████████████████████████████| 188085/188085 [00:05<00:00, 31566.31it/s] Balanced sampling of inchikey pairs (per bin): 100%|█████████████████████████████████████████████████████████████████████████████████████████| 188085/188085 [00:05<00:00, 31702.47it/s] Balanced sampling of inchikey pairs (per bin): 100%|█████████████████████████████████████████████████████████████████████████████████████████| 188085/188085 [00:05<00:00, 31898.22it/s] Balanced sampling of inchikey pairs (per bin): 100%|█████████████████████████████████████████████████████████████████████████████████████████| 188085/188085 [00:05<00:00, 32297.54it/s] Balanced sampling of inchikey pairs (per bin): 100%|█████████████████████████████████████████████████████████████████████████████████████████| 188085/188085 [00:05<00:00, 32580.08it/s] Balanced sampling of inchikey pairs (per bin): 100%|█████████████████████████████████████████████████████████████████████████████████████████| 188085/188085 [00:05<00:00, 32427.66it/s] ``` -------------------------------- ### Initialize Pair Generators Dictionary Source: https://github.com/matchms/ms2deepscore/blob/main/notebooks/model_benchmarking/optimize_sampling_algorithm_settings.ipynb Initializes an empty dictionary to store pair generators, likely for organizing results from different sampling settings. This is a setup step before performing balanced sampling. ```python pair_generators_standard_order = {} ```