### Installation Source: https://github.com/iomega/spec2vec/blob/master/README.rst Instructions for cloning the repository, setting up the development environment, and installing the package in editable mode. ```console git clone https://github.com/iomega/spec2vec.git cd spec2vec conda env create --file conda/environment-dev.yml conda activate spec2vec-dev pip install --editable . ``` -------------------------------- ### Install spec2vec using pip Source: https://github.com/iomega/spec2vec/blob/master/readthedocs/index.rst Installs spec2vec using pip. Note that the Conda installation is recommended for proper RDKit integration with matchms. ```console pip install spec2vec ``` -------------------------------- ### Install spec2vec using Anaconda Source: https://github.com/iomega/spec2vec/blob/master/README.rst Recommended installation method using Anaconda, including environment creation and activation. ```console conda create --name spec2vec python=3.13 conda activate spec2vec conda install --channel bioconda --channel conda-forge spec2vec ``` -------------------------------- ### Install spec2vec using Conda Source: https://github.com/iomega/spec2vec/blob/master/readthedocs/index.rst Installs spec2vec in a new virtual environment using Conda, ensuring proper dependency management. ```console conda create --name spec2vec python=3.8 conda activate spec2vec conda install --channel nlesc --channel bioconda --channel conda-forge spec2vec ``` -------------------------------- ### Train a new word2vec model from scratch Source: https://github.com/iomega/spec2vec/blob/master/README.rst Example of processing a large dataset of reference spectra to train a word2vec model. Spectra are converted to documents, and a new model is trained using default parameters unless specified. ```python from matchms import SpectrumProcessor from matchms.filtering.default_pipelines import DEFAULT_FILTERS from matchms.importing import load_from_mgf from spec2vec import SpectrumDocument from spec2vec.model_building import train_new_word2vec_model # Load spectra from MGF spectra = list(load_from_mgf("reference_spectrums.mgf")) # Add some default filters. You can add more filters functions like require min. number of peaks processor = SpectrumProcessor(DEFAULT_FILTERS) # Apply filter pipeline spectra_cleaned, _ = processor.process_spectra(spectra) spectra_cleaned = [s for s in spectra_cleaned if s is not None] # Create spectrum documents reference_documents = [SpectrumDocument(s, n_decimals=2) for s in spectra_cleaned] # Train your reference model model_file = "references.model" model = train_new_word2vec_model(reference_documents, iterations=[10, 20, 30], filename=model_file, workers=2, progress_logger=True) ``` -------------------------------- ### Calculate spectral similarities using a pre-trained word2vec model Source: https://github.com/iomega/spec2vec/blob/master/README.rst Example of calculating similarities between mass spectra using a trained word2vec model. It demonstrates handling unknown peaks by specifying 'allowed_missing_percentage'. ```python import gensim from matchms import calculate_scores from spec2vec import Spec2Vec # query_spectra loaded from files using https://matchms.readthedocs.io/en/latest/api/matchms.importing.load_from_mgf.html query_spectra = list(load_from_mgf("query_spectrums.mgf")) query_spectra_cleaned, _ = processor.process_spectra(query_spectra) # Omit spectra that didn't qualify for analysis query_spectra_cleaned = [s for s in query_spectra_cleaned if s is not None] # Import pre-trained word2vec model (see code example above) model_file = "references.model" model = gensim.models.Word2Vec.load(model_file) # Define similarity_function spec2vec_similarity = Spec2Vec(model=model, intensity_weighting_power=0.5, allowed_missing_percentage=5.0) # Calculate scores on all combinations of reference spectra and queries scores = calculate_scores(reference_documents, query_spectra_cleaned, spec2vec_similarity) # Find the highest scores for a query spectrum of interest best_matches = scores.scores_by_query(query_spectra_cleaned[0], sort=True)[:10] # Return highest scores print([x[1] for x in best_matches]) ``` -------------------------------- ### Running existing tests Source: https://github.com/iomega/spec2vec/blob/master/CONTRIBUTING.md Command to run existing tests before making code changes. ```bash python setup.py test ``` -------------------------------- ### Bumping version for a new release Source: https://github.com/iomega/spec2vec/blob/master/CONTRIBUTING.md Command to bump the version number for a new release. ```bash bump2version ``` -------------------------------- ### Testing Source: https://github.com/iomega/spec2vec/blob/master/README.rst Command to run tests, including coverage analysis. ```console pytest ``` -------------------------------- ### Linting Source: https://github.com/iomega/spec2vec/blob/master/README.rst Command to run the linter for code quality checks. ```console prospector ``` -------------------------------- ### Train a word2vec model Source: https://github.com/iomega/spec2vec/blob/master/readthedocs/index.rst Processes reference spectra, converts them to documents, and trains a new word2vec model using spec2vec. Includes spectrum processing pipeline and model training parameters. ```python import os from matchms.filtering import add_losses from matchms.filtering import add_parent_mass from matchms.filtering import default_filters from matchms.filtering import normalize_intensities from matchms.filtering import reduce_to_number_of_peaks from matchms.filtering import require_minimum_number_of_peaks from matchms.filtering import select_by_mz from matchms.importing import load_from_mgf from spec2vec import SpectrumDocument from spec2vec.model_building import train_new_word2vec_model def spectrum_processing(s): """This is how one would typically design a desired pre- and post- processing pipeline.""" s = default_filters(s) s = add_parent_mass(s) s = normalize_intensities(s) s = reduce_to_number_of_peaks(s, n_required=10, ratio_desired=0.5, n_max=500) s = select_by_mz(s, mz_from=0, mz_to=1000) s = add_losses(s, loss_mz_from=10.0, loss_mz_to=200.0) s = require_minimum_number_of_peaks(s, n_required=10) return s # Load data from MGF file and apply filters spectrums = [spectrum_processing(s) for s in load_from_mgf("reference_spectrums.mgf")] # Omit spectrums that didn't qualify for analysis spectrums = [s for s in spectrums if s is not None] # Create spectrum documents reference_documents = [SpectrumDocument(s, n_decimals=2) for s in spectrums] model_file = "references.model" model = train_new_word2vec_model(reference_documents, model_file, iterations=[10, 20, 30], workers=2, progress_logger=True) ``` -------------------------------- ### Calculate scores on all combinations of reference spectrums and queries Source: https://github.com/iomega/spec2vec/blob/master/readthedocs/index.rst This code snippet demonstrates how to calculate scores between reference spectrums and query spectrums using the spec2vec model. It then finds and prints the top 10 best matches for a specific query spectrum. ```python # Calculate scores on all combinations of reference spectrums and queries scores = calculate_scores(reference_documents, query_spectrums, spec2vec) # Find the highest scores for a query spectrum of interest best_matches = scores.scores_by_query(query_documents[0], sort=True)[:10] # Return highest scores print([x[1] for x in best_matches]) ``` -------------------------------- ### Derive spec2vec similarity scores Source: https://github.com/iomega/spec2vec/blob/master/readthedocs/index.rst Calculates similarities between mass spectrums using a pre-trained word2vec model. Demonstrates the use of `allowed_missing_percentage` to handle peaks unknown to the model. ```python import gensim from matchms import calculate_scores from spec2vec import Spec2Vec # query_spectrums loaded from files using https://matchms.readthedocs.io/en/latest/api/matchms.importing.load_from_mgf.html query_spectrums = [spectrum_processing(s) for s in load_from_mgf("query_spectrums.mgf")] # Omit spectrums that didn't qualify for analysis query_spectrums = [s for s in query_spectrums if s is not None] # Import pre-trained word2vec model (see code example above) model_file = "references.model" model = gensim.models.Word2Vec.load(model_file) # Define similarity_function spec2vec = Spec2Vec(model=model, intensity_weighting_power=0.5, allowed_missing_percentage=5.0) ``` -------------------------------- ### Remove spec2vec environment Source: https://github.com/iomega/spec2vec/blob/master/README.rst Command to remove the spec2vec conda environment. ```console conda env remove --name spec2vec ``` -------------------------------- ### Remove spec2vec package Source: https://github.com/iomega/spec2vec/blob/master/README.rst Command to remove the spec2vec package from the active conda environment. ```console conda remove spec2vec ``` === COMPLETE CONTENT === This response contains all available snippets from this library. No additional content exists. Do not make further requests.