### Installation

Source: https://github.com/iomega/spec2vec/blob/master/README.rst

Instructions for cloning the repository, setting up the development environment, and installing the package in editable mode.

```console
git clone https://github.com/iomega/spec2vec.git
cd spec2vec
conda env create --file conda/environment-dev.yml
conda activate spec2vec-dev
pip install --editable .
```

--------------------------------

### Install spec2vec using pip

Source: https://github.com/iomega/spec2vec/blob/master/readthedocs/index.rst

Installs spec2vec using pip. Note that the Conda installation is recommended for proper RDKit integration with matchms.

```console
pip install spec2vec
```

--------------------------------

### Install spec2vec using Anaconda

Source: https://github.com/iomega/spec2vec/blob/master/README.rst

Recommended installation method using Anaconda, including environment creation and activation.

```console
conda create --name spec2vec python=3.13
conda activate spec2vec
conda install --channel bioconda --channel conda-forge spec2vec
```

--------------------------------

### Install spec2vec using Conda

Source: https://github.com/iomega/spec2vec/blob/master/readthedocs/index.rst

Installs spec2vec in a new virtual environment using Conda, ensuring proper dependency management.

```console
conda create --name spec2vec python=3.8
conda activate spec2vec
conda install --channel nlesc --channel bioconda --channel conda-forge spec2vec
```

--------------------------------

### Train a new word2vec model from scratch

Source: https://github.com/iomega/spec2vec/blob/master/README.rst

Example of processing a large dataset of reference spectra to train a word2vec model. Spectra are converted to documents, and a new model is trained using default parameters unless specified.

```python
from matchms import SpectrumProcessor
from matchms.filtering.default_pipelines import DEFAULT_FILTERS
from matchms.importing import load_from_mgf
from spec2vec import SpectrumDocument
from spec2vec.model_building import train_new_word2vec_model

# Load spectra from MGF
spectra = list(load_from_mgf("reference_spectrums.mgf"))

# Add some default filters. You can add more filters functions like require min. number of peaks
processor = SpectrumProcessor(DEFAULT_FILTERS)

# Apply filter pipeline
spectra_cleaned, _ = processor.process_spectra(spectra)
spectra_cleaned = [s for s in spectra_cleaned if s is not None]

# Create spectrum documents
reference_documents = [SpectrumDocument(s, n_decimals=2) for s in spectra_cleaned]

# Train your reference model
model_file = "references.model"
model = train_new_word2vec_model(reference_documents, iterations=[10, 20, 30], filename=model_file,
                                 workers=2, progress_logger=True)
```

--------------------------------

### Calculate spectral similarities using a pre-trained word2vec model

Source: https://github.com/iomega/spec2vec/blob/master/README.rst

Example of calculating similarities between mass spectra using a trained word2vec model. It demonstrates handling unknown peaks by specifying 'allowed_missing_percentage'.

```python
import gensim
from matchms import calculate_scores
from spec2vec import Spec2Vec

# query_spectra loaded from files using https://matchms.readthedocs.io/en/latest/api/matchms.importing.load_from_mgf.html
query_spectra = list(load_from_mgf("query_spectrums.mgf"))
query_spectra_cleaned, _ = processor.process_spectra(query_spectra)

# Omit spectra that didn't qualify for analysis
query_spectra_cleaned = [s for s in query_spectra_cleaned if s is not None]

# Import pre-trained word2vec model (see code example above)
model_file = "references.model"
model = gensim.models.Word2Vec.load(model_file)

# Define similarity_function
spec2vec_similarity = Spec2Vec(model=model, intensity_weighting_power=0.5,
                                 allowed_missing_percentage=5.0)

# Calculate scores on all combinations of reference spectra and queries
scores = calculate_scores(reference_documents, query_spectra_cleaned, spec2vec_similarity)

# Find the highest scores for a query spectrum of interest
best_matches = scores.scores_by_query(query_spectra_cleaned[0], sort=True)[:10]

# Return highest scores
print([x[1] for x in best_matches])
```

--------------------------------

### Running existing tests

Source: https://github.com/iomega/spec2vec/blob/master/CONTRIBUTING.md

Command to run existing tests before making code changes.

```bash
python setup.py test
```

--------------------------------

### Bumping version for a new release

Source: https://github.com/iomega/spec2vec/blob/master/CONTRIBUTING.md

Command to bump the version number for a new release.

```bash
bump2version <major|minor|patch>
```

--------------------------------

### Testing

Source: https://github.com/iomega/spec2vec/blob/master/README.rst

Command to run tests, including coverage analysis.

```console
pytest
```

--------------------------------

### Linting

Source: https://github.com/iomega/spec2vec/blob/master/README.rst

Command to run the linter for code quality checks.

```console
prospector
```

--------------------------------

### Train a word2vec model

Source: https://github.com/iomega/spec2vec/blob/master/readthedocs/index.rst

Processes reference spectra, converts them to documents, and trains a new word2vec model using spec2vec. Includes spectrum processing pipeline and model training parameters.

```python
import os
from matchms.filtering import add_losses
from matchms.filtering import add_parent_mass
from matchms.filtering import default_filters
from matchms.filtering import normalize_intensities
from matchms.filtering import reduce_to_number_of_peaks
from matchms.filtering import require_minimum_number_of_peaks
from matchms.filtering import select_by_mz
from matchms.importing import load_from_mgf
from spec2vec import SpectrumDocument
from spec2vec.model_building import train_new_word2vec_model

def spectrum_processing(s):
    """This is how one would typically design a desired pre- and post-
    processing pipeline."""
    s = default_filters(s)
    s = add_parent_mass(s)
    s = normalize_intensities(s)
    s = reduce_to_number_of_peaks(s, n_required=10, ratio_desired=0.5, n_max=500)
    s = select_by_mz(s, mz_from=0, mz_to=1000)
    s = add_losses(s, loss_mz_from=10.0, loss_mz_to=200.0)
    s = require_minimum_number_of_peaks(s, n_required=10)
    return s

# Load data from MGF file and apply filters
spectrums = [spectrum_processing(s) for s in load_from_mgf("reference_spectrums.mgf")]

# Omit spectrums that didn't qualify for analysis
spectrums = [s for s in spectrums if s is not None]

# Create spectrum documents
reference_documents = [SpectrumDocument(s, n_decimals=2) for s in spectrums]

model_file = "references.model"
model = train_new_word2vec_model(reference_documents, model_file, iterations=[10, 20, 30],
                                 workers=2, progress_logger=True)
```

--------------------------------

### Calculate scores on all combinations of reference spectrums and queries

Source: https://github.com/iomega/spec2vec/blob/master/readthedocs/index.rst

This code snippet demonstrates how to calculate scores between reference spectrums and query spectrums using the spec2vec model. It then finds and prints the top 10 best matches for a specific query spectrum.

```python
# Calculate scores on all combinations of reference spectrums and queries
scores = calculate_scores(reference_documents, query_spectrums, spec2vec)

# Find the highest scores for a query spectrum of interest
best_matches = scores.scores_by_query(query_documents[0], sort=True)[:10]

# Return highest scores
print([x[1] for x in best_matches])
```

--------------------------------

### Derive spec2vec similarity scores

Source: https://github.com/iomega/spec2vec/blob/master/readthedocs/index.rst

Calculates similarities between mass spectrums using a pre-trained word2vec model. Demonstrates the use of `allowed_missing_percentage` to handle peaks unknown to the model.

```python
import gensim
from matchms import calculate_scores
from spec2vec import Spec2Vec

# query_spectrums loaded from files using https://matchms.readthedocs.io/en/latest/api/matchms.importing.load_from_mgf.html
query_spectrums = [spectrum_processing(s) for s in load_from_mgf("query_spectrums.mgf")]

# Omit spectrums that didn't qualify for analysis
query_spectrums = [s for s in query_spectrums if s is not None]

# Import pre-trained word2vec model (see code example above)
model_file = "references.model"
model = gensim.models.Word2Vec.load(model_file)

# Define similarity_function
spec2vec = Spec2Vec(model=model, intensity_weighting_power=0.5,
                    allowed_missing_percentage=5.0)
```

--------------------------------

### Remove spec2vec environment

Source: https://github.com/iomega/spec2vec/blob/master/README.rst

Command to remove the spec2vec conda environment.

```console
conda env remove --name spec2vec
```

--------------------------------

### Remove spec2vec package

Source: https://github.com/iomega/spec2vec/blob/master/README.rst

Command to remove the spec2vec package from the active conda environment.

```console
conda remove spec2vec
```

=== COMPLETE CONTENT === This response contains all available snippets from this library. No additional content exists. Do not make further requests.